Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
gregridgeway committed Sep 4, 2018
1 parent 1f15218 commit e41f917
Show file tree
Hide file tree
Showing 3 changed files with 151 additions and 147 deletions.
129 changes: 66 additions & 63 deletions 01 Intro to R.Rmd
Expand Up @@ -31,6 +31,9 @@ output:






<!-- A function for automating the numbering and wording of the exercise questions -->
```{r echo=FALSE}
.counterExercise <- 0
Expand Down Expand Up @@ -157,9 +160,9 @@ paste0(my.states)
What does the `nchar()` function do? The `paste()` function? Does it make a difference to use `sep=""` or `collapse=","`? What about `paste0()`?

## Exercises
`r exNum("Print all even numbers less than 100")`
`r exNum("What is the mean of even numbers less than 100")`
`r exNum('Have R put in alphabetical order \x60c("WA","DC","CA","PA","MD","VA","OH")\x60')`
`r .exNum("Print all even numbers less than 100")`
`r .exNum("What is the mean of even numbers less than 100")`
`r .exNum('Have R put in alphabetical order \x60c("WA","DC","CA","PA","MD","VA","OH")\x60')`

# Assignment of values to variables

Expand Down Expand Up @@ -215,11 +218,11 @@ state.names[i[1:3]] # show me those three states
Note that in the last example we used square brackets within square brackets. First, we asked R to give us the indices of the first three states in alphabetical order and that was `r i[1:3]`. Then R took those three values and plugged them into the second set of square brackets to show you the state names in those positions in the collection.

## Exercises
`r exNum("What's the last state in the \x60state.names\x60?")`
`r exNum('Pick out states that begin with "M" using their indices')`
`r exNum("Pick out states where you have lived")`
`r exNum("What's the last state in alphabetical order?")`
`r exNum("What are the last three states in alphabetical order?")`
`r .exNum("What's the last state in the \x60state.names\x60?")`
`r .exNum('Pick out states that begin with "M" using their indices')`
`r .exNum("Pick out states where you have lived")`
`r .exNum("What's the last state in alphabetical order?")`
`r .exNum("What are the last three states in alphabetical order?")`


# Logical values and operations
Expand Down Expand Up @@ -305,9 +308,9 @@ sum(my.states %in% c("CA","OR","WA","AK","HI"))
Note in the last line we used `sum()` to count for how many of the elements in `my.states` did `%in%` evaluate to be `TRUE`.

## Exercises
`r exNum("Report \x60TRUE\x60 or \x60FALSE\x60 for each state depending on if you have lived there")`
`r exNum("With \x60a <- 1:100\x60, pick out odd numbers between 50 and 75")`
`r exNum("Use greater than less than signs to get all state names that begin with M")`
`r .exNum("Report \x60TRUE\x60 or \x60FALSE\x60 for each state depending on if you have lived there")`
`r .exNum("With \x60a <- 1:100\x60, pick out odd numbers between 50 and 75")`
`r .exNum("Use greater than less than signs to get all state names that begin with M")`

# Sampling
The function `sample()` randomly shuffles a collection of values.
Expand All @@ -328,10 +331,10 @@ table(a)
max(table(a)) # find out which value appears most frequently
```
## Exercises
`r exNum("Use \x60sample()\x60 to estimate the probability of rolling a 6")`
`r exNum("Use \x60sample()\x60 to estimate the probability that the sum of two die equal 7")`
`r exNum("Use \x60sample()\x60 to select randomly five states without replacement")`
`r exNum("Use \x60sample()\x60 to select randomly 1000 states with replacement")`
`r .exNum("Use \x60sample()\x60 to estimate the probability of rolling a 6")`
`r .exNum("Use \x60sample()\x60 to estimate the probability that the sum of two die equal 7")`
`r .exNum("Use \x60sample()\x60 to select randomly five states without replacement")`
`r .exNum("Use \x60sample()\x60 to select randomly 1000 states with replacement")`
+ Tabulate how often each state was selected
+ Which state was selected the least? Make R do this for you

Expand Down Expand Up @@ -439,12 +442,12 @@ ls()
Assuming you are using R Studio, you can also see the objects stored in memory by clicking on the Environment tab.

## Exercises
`r exNum('Fix \x60state.list\x60 so that "DC" is in "other" rather than "east"')`. Here are a few hints
`r .exNum('Fix \x60state.list\x60 so that "DC" is in "other" rather than "east"')`. Here are a few hints
+ access "other" using `$`
+ combine things using `c()`
+ assign values using `<-`
+ remove values using `[]` with a negative index or using a logical statement
`r exNum("Print out east and central states together sorted")`
`r .exNum("Print out east and central states together sorted")`


# Functions
Expand Down Expand Up @@ -480,9 +483,9 @@ IQR
You can see that it computes the 0.25 quantile and the 0.75 quantile and uses `diff()` to compute their difference.

## Exercises
`r exNum('Make a function \x60is.island(x)\x60 returns \x60TRUE\x60 if \x60x\x60 is an island')`. Islands are "HI", "FM", "MH", "PW", "AS", "GU", "MP", "PR", "VI", "UM". Borrow the template I used for `give.first.and.last()`. Then try using the `%in%` operator
`r exNum("Count how many islands are within each region. Use an \x60sapply()\x60 (or two) and your new \x60is.island()\x60 function")`
`r exNum("Which components of \x60b\x60 having missing values? Use \x60is.na()\x60")`. `b` was defined earlier
`r .exNum('Make a function \x60is.island(x)\x60 returns \x60TRUE\x60 if \x60x\x60 is an island')`. Islands are "HI", "FM", "MH", "PW", "AS", "GU", "MP", "PR", "VI", "UM". Borrow the template I used for `give.first.and.last()`. Then try using the `%in%` operator
`r .exNum("Count how many islands are within each region. Use an \x60sapply()\x60 (or two) and your new \x60is.island()\x60 function")`
`r .exNum("Which components of \x60b\x60 having missing values? Use \x60is.na()\x60")`. `b` was defined earlier

# Matrices and apply()

Expand Down Expand Up @@ -600,9 +603,9 @@ with(chicagoCrime, sort(table(Primary.Type[District==10])))
Much easier to read and understand!

## Exercises
`r exNum("Display three randomly selected rows")`
`r exNum("Count \x60NA\x60s in each column")`
`r exNum("Look up \x60Location.Description\x60, \x60Block\x60, \x60Beat\x60, and \x60Ward\x60 for those missing \x60Latitude\x60")`
`r .exNum("Display three randomly selected rows")`
`r .exNum("Count \x60NA\x60s in each column")`
`r .exNum("Look up \x60Location.Description\x60, \x60Block\x60, \x60Beat\x60, and \x60Ward\x60 for those missing \x60Latitude\x60")`

# For loops
Sometimes we need to have R repeat certain tasks multiple times, such as marching through each row of a dataset and modifying values. For loops accomplish this. Later in this course we will be using Google Maps to extract information about addresses. So we might need to iterate through every row in the dataset, check whether the latitude and longitude are missing, and if missing try to retrieve the latitude and longitude from Google Maps. The last crime in the dataset missing coordinates is in row 9954.
Expand Down Expand Up @@ -697,11 +700,11 @@ chicagoCrime$google.maps.url <- paste("https://www.google.com/maps/place/",
This took `r timeWithoutForLoop[3]` seconds. That's `r round(time4ForLoop[3]/timeWithoutForLoop[3],1)` times faster than the for loop.

## Exercises
`r exNum('Use a for loop to create a variable \x60Coordinates\x60 that looks like "(X.Coordinate,Y.Coordinate)"')`
`r .exNum('Use a for loop to create a variable \x60Coordinates\x60 that looks like "(X.Coordinate,Y.Coordinate)"')`
+ Use `paste()` with the `X.Coordinate` and `Y.Coordinate` variables
+ Remember the `sep=` option in `paste()`
+ You might find using the `with()` function to simplify your code and avoid having a lot of `chicagoCrime$`s
`r exNum("Redo the previous exercise without using a for loop and compare computation time")`
`r .exNum("Redo the previous exercise without using a for loop and compare computation time")`

# More tabulating, aggregating, and breaking statistics down by group
The variable `Arrest` indicates whether someone was arrested for the crime. Here are the first 10 values.
Expand Down Expand Up @@ -750,8 +753,8 @@ barplot(a$`(Arrest == "true")`,
```

## Exercises
`r exNum('How many assaults occurred in the street? (\x60Location.Description=="STREET"\x60)')`. Try using `subset()` even though there are other ways
`r exNum("What percentage of assaults occurred in the street by Ward?")`
`r .exNum('How many assaults occurred in the street? (\x60Location.Description=="STREET"\x60)')`. Try using `subset()` even though there are other ways
`r .exNum("What percentage of assaults occurred in the street by Ward?")`

# Plotting Data

Expand Down Expand Up @@ -809,14 +812,14 @@ text(ifelse(tab<80, 180, tab-5), # x-coord of text,
adj=1) # right justify text
```

# Exercises
`r exNum("Make a barplot indicating how many states are in each region. Use \x60state.list\x60")`
`r exNum("Identify the beat with the most crimes")`
`r exNum("Identify the beat with the most domestic violence incidents")`
`r exNum("Part 1 crimes are homicide, robbery, assault, arson, burglary, theft, sex offense, motor vehicle theft. Calculate the number of Part 1 crimes in Chicago")`
## Exercises
`r .exNum("Make a barplot indicating how many states are in each region. Use \x60state.list\x60")`
`r .exNum("Identify the beat with the most crimes")`
`r .exNum("Identify the beat with the most domestic violence incidents")`
`r .exNum("Part 1 crimes are homicide, robbery, assault, arson, burglary, theft, sex offense, motor vehicle theft. Calculate the number of Part 1 crimes in Chicago")`

# Solutions to the exercises
1. `r exerciseQuestions[1]`
1. `r .exerciseQuestions[1]`
```{r comment=""}
(1:49)*2
```
Expand All @@ -825,22 +828,22 @@ or
seq(2,98,by=2)
```

2. `r exerciseQuestions[2]`
2. `r .exerciseQuestions[2]`
```{r comment=""}
mean((1:49)*2)
```

3. `r exerciseQuestions[3]`
3. `r .exerciseQuestions[3]`
```{r comment=""}
sort(c("WA","DC","CA","PA","MD","VA","OH"))
```

4. `r exerciseQuestions[4]`
4. `r .exerciseQuestions[4]`
```{r comment=""}
state.names[51]
```

5. `r exerciseQuestions[5]`
5. `r .exerciseQuestions[5]`
```{r comment=""}
state.names[c(7,8,21,24,28,32,35,46)]
```
Expand All @@ -853,13 +856,13 @@ Here's another possible answer that uses `substring` (which we haven't covered y
state.names[substring(state.names, 1, 1)=="M"]
```

6. `r exerciseQuestions[6]`
6. `r .exerciseQuestions[6]`
Of course, these may vary depending on where you have lived.
```{r comment=""}
state.names[c(1, 4, 10, 26)]
```

7. `r exerciseQuestions[7]`
7. `r .exerciseQuestions[7]`
```{r comment=""}
sort(state.names)[51]
```
Expand All @@ -868,29 +871,29 @@ or
rev(sort(state.names))[1]
```

8. `r exerciseQuestions[8]`
8. `r .exerciseQuestions[8]`
```{r comment=""}
rev(sort(state.names))[1:3]
```

9. `r exerciseQuestions[9]`
9. `r .exerciseQuestions[9]`
```{r comment=""}
my.states <- c("PA", "NJ", "NY", "MD", "DE", "MA", "RI", "CT", "ME", "LA", "IN")
state.names %in% my.states
```

10. `r exerciseQuestions[10]`
10. `r .exerciseQuestions[10]`
```{r comment=""}
a <- 1:100
a[a %% 2==1 & a>50 & a<75]
```

11. `r exerciseQuestions[11]`
11. `r .exerciseQuestions[11]`
```{r comment=""}
state.names[state.names>"LZ" & state.names<"N"]
```

12. `r exerciseQuestions[12]`
12. `r .exerciseQuestions[12]`
```{r comment=""}
a <- sample(1:6, size=100000, replace=TRUE)
table(a)[6]/length(a)
Expand All @@ -904,20 +907,20 @@ Or
mean(a==6)
```

13. `r exerciseQuestions[13]`
13. `r .exerciseQuestions[13]`
```{r comment=""}
dice1 <- sample(1:6, size=1000, replace=TRUE)
dice2 <- sample(1:6, size=1000, replace=TRUE)
doubleroll <- dice1 + dice2
mean(doubleroll==7) # should be close to 1/6 or 0.1666...
```

14. `r exerciseQuestions[14]` (Answers will vary)
14. `r .exerciseQuestions[14]` (Answers will vary)
```{r comment=""}
sample(state.names, size=5, replace=FALSE)
```

15. `r exerciseQuestions[15]`
15. `r .exerciseQuestions[15]`
+ Tabulate how often each state was selected (Answers will vary)
```{r comment=""}
a <- sample(state.names, size=1000, replace=TRUE)
Expand All @@ -929,7 +932,7 @@ table(a)
sort(table(a))[1]
```

16. `r exerciseQuestions[16]`
16. `r .exerciseQuestions[16]`
```{r comment=""}
state.list$east <- state.list$east[state.list$east!="DC"]
state.list$other <- c(state.list$other, "DC")
Expand All @@ -942,7 +945,7 @@ state.list$other <- c(state.list$other, "DC")
state.list
```

17. `r exerciseQuestions[17]`
17. `r .exerciseQuestions[17]`
```{r comment=""}
sort(c(state.list$east, state.list$central))
```
Expand All @@ -951,15 +954,15 @@ Or
with(state.list, sort(c(east, central)))
```

18. `r exerciseQuestions[18]`
18. `r .exerciseQuestions[18]`
```{r comment=""}
is.island <- function(x)
{
return(x %in% c("HI", "FM", "MH", "PW", "AS", "GU", "MP", "PR", "VI", "UM"))
}
```

19. `r exerciseQuestions[19]`
19. `r .exerciseQuestions[19]`

First, this `lapply()` asks each state if they are an island.
```{r comment=""}
Expand All @@ -970,7 +973,7 @@ Now we want to count up how many `TRUE`s there are in each component, so wrap th
sapply(lapply(state.list, is.island), sum)
```

20. `r exerciseQuestions[20]`
20. `r .exerciseQuestions[20]`
```{r comment=""}
sapply(lapply(b, is.na), any)
```
Expand All @@ -980,12 +983,12 @@ b <- list(0:9, c("A","B","C"), c(TRUE,FALSE,NA))
sapply(b, function(x) any(is.na(x)))
```

21. `r exerciseQuestions[21]`
21. `r .exerciseQuestions[21]`
```{r comment=""}
chicagoCrime[sample(1:nrow(chicagoCrime), size=3),]
```

22. `r exerciseQuestions[22]`
22. `r .exerciseQuestions[22]`
```{r comment=""}
sapply(lapply(chicagoCrime, is.na), sum)
```
Expand All @@ -994,7 +997,7 @@ Or
sapply(chicagoCrime, function(x) sum(is.na(x)))
```

23. `r exerciseQuestions[23]`
23. `r .exerciseQuestions[23]`
```{r comment=""}
i <- is.na(chicagoCrime$Latitude)
# Let's just show the first 5 rows
Expand All @@ -1007,7 +1010,7 @@ subset(chicagoCrime, is.na(chicagoCrime$Latitude),
select=c("Location.Description","Block","Beat","Ward"))[1:5,]
```

24. `r exerciseQuestions[24]`
24. `r .exerciseQuestions[24]`
```{r comment=""}
system.time(
for (i in 1:nrow(chicagoCrime))
Expand All @@ -1028,33 +1031,33 @@ for (i in 1:nrow(chicagoCrime))
}
)
```
25. `r exerciseQuestions[25]`
25. `r .exerciseQuestions[25]`
```{r comment=""}
system.time(
chicagoCrime$coords3 <- with(chicagoCrime,
paste0("(", X.Coordinate, ",",Y.Coordinate,")"))
)
```

26. `r exerciseQuestions[26]`
26. `r .exerciseQuestions[26]`
```{r comment=""}
with(subset(chicagoCrime, Primary.Type=="ASSAULT"),
sum(chicagoCrime$Location.Description=="STREET"))
```

27. `r exerciseQuestions[27]`
27. `r .exerciseQuestions[27]`
```{r comment=""}
aggregate((Location.Description=="STREET")~Ward,
data=subset(chicagoCrime, Primary.Type=="ASSAULT"),
mean)
```

28. `r exerciseQuestions[28]`
28. `r .exerciseQuestions[28]`
```{r comment=""}
barplot(sapply(state.list, length))
```

29. `r exerciseQuestions[29]`
29. `r .exerciseQuestions[29]`
```{r comment=""}
names(rev(sort(table(chicagoCrime$Beat)))[1])
```
Expand All @@ -1063,13 +1066,13 @@ Or
names(which.max(table(chicagoCrime$Beat)))
```

30. `r exerciseQuestions[30]`
30. `r .exerciseQuestions[30]`
```{r comment=""}
with(subset(chicagoCrime, Description=="DOMESTIC BATTERY SIMPLE"),
names(which.max(table(Beat))))
```

31. `r exerciseQuestions[31]`
31. `r .exerciseQuestions[31]`
```{r comment=""}
sum(chicagoCrime$Primary.Type %in% c("HOMICIDE", "ROBBERY", "ASSAULT", "ARSON",
"BURGLARY", "THEFT", "SEX OFFENSE",
Expand Down

0 comments on commit e41f917

Please sign in to comment.