Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

name these chunks #1

Merged
merged 1 commit into from
Sep 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pres/datascience101.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Find the lever you can push on to change behaviours that helps with business goa


## Visualisation
```{r}
```{r datascience101-1}
plotly::ggplotly({
ggplot(datasaurus_dozen,aes(x,y))+
geom_point()+
Expand Down
2 changes: 1 addition & 1 deletion pres/glmworkshop.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -191,4 +191,4 @@ What sort of things can we do to prevent changes creeping into our analysis that
- Feature analysis in-depth
- Candidate model evaluations
- Code
- Reproducibility info
- Reproducibility info
8 changes: 4 additions & 4 deletions pres/logisticregressions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ summary(x_s)
## Scaling variables
Use `caret` to scale multiple variables simultaneously and get a reusable scaling model for applying to test data, and eventually production data.

```{r, echo=TRUE}
```{r logisticregressions-1, echo=TRUE}
transformations<-preProcess(training_x)
scaledVars<-predict(transformations,training_x)
knitr::kable(t(summary(scaledVars)))
Expand Down Expand Up @@ -265,7 +265,7 @@ summary(steppedmodel)
## `broom`
Use `broom` to make tidy versions of model outputs.

```{r}
```{r logisticregressions-2}
library(broom)

# Coefficients
Expand All @@ -275,15 +275,15 @@ knitr::kable(tidy(steppedmodel))
## `broom`
Use `broom` to make tidy versions of model outputs.

```{r}
```{r logisticregressions-3}
# Fitted data
knitr::kable(head(augment(steppedmodel)))
```

## `broom`
Use `broom` to make tidy versions of model outputs.

```{r}
```{r logisticregressions-4}
# Key statistics
knitr::kable(glance(steppedmodel))
```
Expand Down
6 changes: 3 additions & 3 deletions pres/moderndatascience.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ library(datasauRus)


## Visualisation
```{r}
```{r moderndatascience-1}
ggplot(datasaurus_dozen,aes(x,y))+
geom_point()+
facet_wrap(~dataset) +
Expand Down Expand Up @@ -86,7 +86,7 @@ samples %>%

# Feature engineering and reduction
## recipes
```{r}
```{r moderndatascience-2}
library(recipes)

train %>%
Expand Down Expand Up @@ -143,7 +143,7 @@ initial_lm %>%
```

## yardstick
```{r}
```{r moderndatascience-3}
test %>%
bake(fe_cleaner, .) %>%
add_predictions(initial_lm) %>%
Expand Down
38 changes: 19 additions & 19 deletions pres/stats101.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ myFile<-"Predicting Age.xlsx"

## Our data
We have the results from our survey.
```{r}
```{r stats101-1}
basedata<-setDT(read_excel(myFile))
results<-basedata[Age<100&Age>Experience,]
```
Expand All @@ -64,14 +64,14 @@ DT::datatable(results,style="bootstrap",options=list(dom = "lftir" ))
```

## Age Distribution
```{r}
```{r stats101-2}
ggplot(results)+ggopts+
aes(x = Age, y=..count..)+
geom_histogram(bins = 10, fill="white", colour="white")
```

## Experience Distribution
```{r}
```{r stats101-3}
vegalite %>%
add_data(results) %>%
encode_x("Experience","quantitative") %>%
Expand All @@ -80,7 +80,7 @@ vegalite %>%
```

## Field Distribution
```{r}
```{r stats101-4}
vegalite %>%
add_data(results) %>%
encode_x("Field","ordinal") %>%
Expand All @@ -90,7 +90,7 @@ vegalite %>%


## All data
```{r}
```{r stats101-5}
vegalite %>%
add_data(results) %>%
encode_x("Experience","quantitative") %>%
Expand All @@ -102,15 +102,15 @@ vegalite %>%

# Sampling

```{r}
```{r stats101-6}
testPer<-.75
inTest<-sample(1:nrow(results),nrow(results)*testPer)
training<-results[inTest,]
holdout<-results[!inTest,]
```

## Training
```{r}
```{r stats101-7}
vegalite %>%
add_data(training) %>%
encode_x("Experience","quantitative") %>%
Expand All @@ -120,7 +120,7 @@ vegalite %>%
```

## Holdout
```{r}
```{r stats101-8}
vegalite %>%
add_data(holdout) %>%
encode_x("Experience","quantitative") %>%
Expand All @@ -134,7 +134,7 @@ vegalite %>%
## One size fits all
We could take some measure of central tendency to predict the age of attendees.

```{r}
```{r stats101-9}
averages<-training[,.(Mean=floor(mean(Age))
,Median=floor(median(Age))
,Mode=Mode(Age)
Expand All @@ -143,7 +143,7 @@ knitr::kable(averages)
```

## Results
```{r}
```{r stats101-10}
holdout[,colnames(averages):=averages]
holdout.m<-melt(holdout, measure.vars = c("Age",colnames(averages)))
vegalite %>%
Expand All @@ -155,7 +155,7 @@ vegalite %>%
```

## Assessing results
```{r}
```{r stats101-11}
holdout.lse<-melt(holdout, measure.vars = colnames(averages))
holdout.lse[,Error:=(Age-value)^2]
knitr::kable(holdout.lse[,.(LSE=sum(Error)), variable][order(LSE)])
Expand All @@ -164,13 +164,13 @@ knitr::kable(holdout.lse[,.(LSE=sum(Error)), variable][order(LSE)])
# Line of best fit
$y=mx+c$

```{r}
```{r stats101-12}
expLM<-lm(Age~Experience, training)
summary(expLM)
```

## Model
```{r}
```{r stats101-13}
training[,expLMres:=expLM$fitted]
ggplot(training, aes(x=Experience, y=Age))+
geom_point()+
Expand All @@ -179,7 +179,7 @@ ggplot(training, aes(x=Experience, y=Age))+
```

## Results
```{r}
```{r stats101-14}
holdout[,expLMres:=predict(expLM,holdout)]
holdout.m<-melt(holdout, measure.vars = c("Age","expLMres"))
vegalite %>%
Expand All @@ -191,7 +191,7 @@ vegalite %>%
```

## Assessing results
```{r}
```{r stats101-15}
holdout.lse<-melt(holdout, measure.vars = c("expLMres",colnames(averages)))
holdout.lse[,Error:=(Age-value)^2]
knitr::kable(holdout.lse[,.(LSE=sum(Error)), variable][order(LSE)])
Expand All @@ -202,13 +202,13 @@ knitr::kable(holdout.lse[,.(LSE=sum(Error)), variable][order(LSE)])
- $y=m_1 x_1 +...+ m_n x_n +c$
- Contrasts

```{r}
```{r stats101-16}
fieldLM<-lm(Age~Experience + Field, training)
summary(fieldLM)
```

## Model
```{r}
```{r stats101-17}
training[,fieldLMres:=fieldLM$fitted]
ggplot(training, aes(x=Experience, y=Age, group=Field, colour=Field))+
geom_point()+
Expand All @@ -218,7 +218,7 @@ ggplot(training, aes(x=Experience, y=Age, group=Field, colour=Field))+
```

## Results
```{r}
```{r stats101-18}
holdout[,fieldLMres:=predict(fieldLM,holdout)]
holdout.m<-melt(holdout, measure.vars = c("Age","fieldLMres"))
vegalite %>%
Expand All @@ -230,7 +230,7 @@ vegalite %>%
```

## Assessing results
```{r}
```{r stats101-19}
holdout.lse<-melt(holdout, measure.vars = c("fieldLMres","expLMres",colnames(averages)))
holdout.lse[,Error:=(Age-value)^2]
knitr::kable(holdout.lse[,.(LSE=sum(Error)), variable][order(LSE)])
Expand Down