# lgatto/SC-ICS-Proposal

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
848 lines (611 sloc) 18 KB

These exercices were prepared by the participants in the R Consortium-funded Software Carpentry instructor training. For background and more details on the methods illustrated here, see the Instructor Training curriculum, in particular sections Novices and Formative Assessment for formative assessment and Cognitive Load for faded examples.

# Multiple choice questions

Multiple choice questions are a form of formative assessment taking place during the teaching and learning that inform both the instructor and the students what to focus on. The (wrong) answers are not picked at random, but are designed to highlight specific points of misunderstanding that will need to be re-explained if students incorrectly choose that answer.

### Creating R functions

Assume the following function definition:

```display <- function(a = 1, b = 2, c = 3) {
result <- c(a, b, c)
names(result) <- c("a", "b", "c")  # This names each element of the vector
return(result)
}```

What would the result be of the following function call (please note, that is is considered bad form to combine named and positional arguments this way):

`display(c = 77, 5)`

a) (correct)

``` a  b  c
5  2 77 ```

b) (values are assigned in order disregarding the names)

``` a  b  c
77  5 3 ```

c) (named parameter is used correctly, then wrongly assumed that the second argument will be passes to the second parameter)

``` a  b  c
1  5 77 ```

d) (thinking that you cannot add positional arguments after named arguments)

``` a  b  c
1  2 77 ```

### What is the assignment operator in R?

1. `=` - a right answer
2. '<-' - a right answer
3. `<` - a wrong but almost plausible answer
4. 1 and 2, but 1 is preferred - Learner has understood that there is more than one assignment operator, but has not appreciated style guide
5. 1 and 2, but 2 is prefferred - Learner has understood that there is more than one assignment operator, and has appreciated style guide
6. `==` - Learner is familiar with some commands in R, but has confused logical test with assignment operator

### What would the expected output be this hypothical function

`average(c(1,2,4))`
1. `2` (median: possibly correct but not specified)
2. `7/3` (average is a generic term often assumed to be the mean)
3. insufficient information to answer (correct)

## What will the following return (in R)?

Make a prediction without running it:

`NA == NA`
1. `TRUE` - The terms are equal because they are both NA (incorrect)
2. `FALSE` - The terms are unequal because R considers each NA to be unique (incorrect)
3. `NA` - R cannot tell whether the missing values represented by the NAs are equal or unequal (correct)

### Data types

With the data frame, cats below, we run the command:

`rbind(cats, c('tabby', '4.0', TRUE)) `

What happens to the weight column?

```    coat weight likes_string
1 calico    2.1         TRUE
2  black    5.0        FALSE
3  tabby    3.2         TRUE```

1. Returns an error
2. No error, converts the string `4.0` to a numeric type
3. No error, converts the column weight to a character type
4. No error, the current values in the weight column stays the same, the last row remains a character

### Unix shell

In unix shell if you are currently located in the folder `/home/project/experiment_1/run_1` how would you navigate to the folder `r/home/project`:

``````a) cd
b) cd ../../
c) cd ../
``````

### Git

You have been working on your code, in particular a file called `analysis.r`. After a while, you look at your repository's state using the `git status` command. In the section "Changes not staged for commit", it says `modified: analysis.r`. What is the next step you'll want to do?

1. Run `git commit` to save changes (missed move to staging area)
2. Make changes to another file, everything is saved (mistaken notice for last commit)
3. Run `git add` to make git aware of your changes (correct)

### Subsetting

Consider a `data.frame` object in R, named `x`. How would you access the values from the second column, in the 3rd and 5th rows:

1. `x[c(3, 5), 2]`
2. `x[2, c(3, 5)]` -- switching rows and columns
3. `x[c(3, 5), ]` -- forget to specify column
4. `x[(3, 5), 2]` -- forget to specify rows as a vector using `c` function

### Scoping

What will running the following R code return:

```x <- 3
get_x <- function(y ){
return(x)
}

get_x(4)```
1. It will produce an error (a reasonable guess because x isn't defined in the function)
2. `3` (correct - it will find `x` defined in the global environment)
3. `4` (guessing from the value y passed to the function)

### Data types

What class of object does the following R command create?

`x <- sum(c(NA, 3.6, 5, TRUE))`
1. integer - they missed the non-integer number, or don't know the precedence
2. numeric - (CORRECT)
3. logical - the TRUE is converted to a number
4. list - not the constructor for a list, R tries to convert different object classes to a single type in a vector
5. character - not understanding that TRUE or NA are special words when not enclosed in quotes.

### Distributions

Students are presented witha figure showing a Poisson distribution.

Q: What does this figure tell us about these data?

1. The data are non-parametric - Students misunderstand key terms
2. The data follow a skewed normal distribution - Students recognise that the dat are not normally distrbuted but don't recognise the Poisson distribution
3. The data follow a Poisson distribution - This is correct
4. The data follow a normal distribution - Students do not recognise a normal distribution

## R-squared

What does the r-squared coefficient represent?

1. the percentage of the variation in the response variable explained by the linear model
2. the strength of the linear association between two variables
3. the value of the slope in a linear regression
4. the coefficient of correlation

### Exploring data frames

For a dataframe `my.df` with seven rows and four columns, which of the following statements is true?

1. `length(rbind(my.df, my.df)) == 8` - switching rows and columns
2. `nrow(my.df) == length(my.df)` - df as list of column vectors
3. `ncol(my.df) == length(my.df)` - correct
4. `dim(rbind(my.df, my.df)) == dim(cbind(my.df, my.df))` - very close

Given the dataframe cats of cats:

```       coat      weight     likes_to_eat
1  calico    2.1        FISH
2  black     5.0        FISH
3  pink      3.2        COW
4  green     6.6        COW
5  pink      9.0        FISH
6  tabby     6.8        MILK```

What is the command to select a subset of the dataframe on what pink cats like to eat?

1. `cats[c(3,5),]` OK but not applicable for larger dataframes (you have to know all the row numbers for your selector)
2. `cats[which(cats\$coat=="pink"),]` OK but selecting all the columns (indicating potential problem with selecting rows/columns?)
3. `cats["pink",]` incorrect selector for rows (indicating potential problem with selectors)
4. `cats[cats\$coat=="pink","likes_to_eat"]` best answer, most specific and succint
5. more than one are correct
6. `[4]`

### Recycling vectors in R

Q: What is the correct answer for `x + y` when:

```x <- c(1, 2, 3, 4)
y <- c(1, 2)```
1. `2 4 4 6` (correct)
2. `2 3 4 5` (added 1 to positions 3 and 4 in x)
3. `2 4 3 4` (forgot to recycle)
4. `2 4` (did not recycle the shorter vector to the longer)

### Starting with Data

Using the `download.file` function example:

`download.file("https://ndownloader.figshare.com/files/2292169", "data/portal_data_joined.csv")`

In which directory might we find the file we have saved?

1. `data` (Correct)
2. `R`
3. `R/Data`
4. the working directory
5. `~user`

## create dataframe

Say you want to exclude unhappy entries from the dataframe below.

```age <- c(1, 4, 10)
color <- c("red", "blue", "red")
weight_kg <- c(5, 9, 8)
happy <- c(TRUE, FALSE, TRUE)

df <- data.frame(age, color, weight_kg, happy)```

Which of the following commands achieve this?

1. `df[df\$happy == TRUE]` ## Misunderstanding of row/col selection
2. `df[df\$happy == TRUE, ]` ## Correct, but confused with logical
3. `df[df\$happy, ]` ## Correct
4. `df[df\$happy == FALSE, ]` ## Mixed up logicals
5. `df[!df\$happy, ]` ## Mixed up logicals
6. `df[df\$happy != FALSE, ]` ## Correct answer, but confusing

## Dimensions of a dataframe

How to find the dimention of a data frame, called `x`?

1. `length(x)` - Problem with the understanding of data frame multiple dimentions
2. `str(x)` - Not the correct command to address the question, even though the answer can be found with that command
3. `dim(x)`
4. `colnames(x)` - The learner did not understand the question / the learner has not idea how to address the question

## `dplyr` and data frames

Which of the following dplyr statements will return the columns called `name` and `phone` from the `students` data frame?

1. `students %>% select(c(name, phone))`
2. `students %>% select(name, phone)`
3. `students %>% select(“name”, “phone”)`
4. `students %>% select(c(“name”, “phone”))`

Misconceptions identified above:

1. Columns must be presented in vector of unquoted column names
2. Correct
3. Column names must be quoted
4. Columns must be presented in a character vector

From the instructor training curriculum:

According to cognitive load theory, searching for a solution strategy is an extra burden on top of applying that strategy. We can therefore accelerate learning by giving learners worked examples that show them a problem and a detailed step-by-step solution, followed by a series of faded examples. The first of these presents a nearly-complete use of the same problem-solving strategy just demonstrated with a small number of blanks for the learner

## Temperature conversion

#### Calculate fahrenheit to kelvin

```fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}```

#### Calculate Celsius to kelvin

```celsius_to_kelvin <- function(temp) {
kelvin <- ____
return(kelvin)
}```

#### Calculate Kelvin to Celsius

```celsius_to_kelvin <- function(temp) {
____
}```

#### Bonus questions

Chain functions to go from fahrenheit to celsius

```fahr_to_celsius <- function(temp) {
____
}```

## More temperature conversion

```fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}```
```kelvin_to_celsius <- function(temp) {
celsius <- ____ - 273.15
return(______)
}```
```fahr_to_celsius <- function(____) {
___ <- fahr_to_kelvin(____)
result <- ___________(temp_k)
return(result)
}```

Write a function which can calculate both Celsius and Kelvin given the temperature in Farenheit and return both results.

## Adding a row to a `data.frame`

Add a new `cat` to the cats `data.frame`, a 9 year-old 3.3 kg tortoiseshell cat, which hates string. Be careful, because one of the variables is a factor, and there are no tortoiseshell cats in the `data.frame` yet.

```> cats
coat weight likes_string age
1 calico    2.1         TRUE 4
2  black    5.0        FALSE 5
3  tabby    3.2         TRUE 8````
```str(cats)
levels(cats\$_____)
levels(cats\$_____) <- c(cats\$_____, ___________)
cats <- rbind(cats, list(_____,_____,____,_____))```

#### Diagnostic question

`length(levels(cats\$coat)) == 4`

## Vector manipuation and functions

Here we have a numerical vector

`x <- c(1, 4, 5, 6)`

Find the sum of all elements

`sum(x)`

Extract the second and third element

`x[c(2, 3)]`

Now extract the first and fourth

`x[___]`

Now extract and sum any elements

```f <- function(a, ind) {
y <- ____   # extract elements
return(___(y))
}```

#### Solution

```f <- function(a, ind) {
y <- a[c(ind)]
return(sum(y))
}```

## Plotting with `gglot2`

Full example

```library(ggplot2)
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() ```

```ggplot(data = diamonds, aes(x = carat, y = price,____)) +
geom_point() ```

Solution

```ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
geom_point() ```

The color alone is difficult to see, so now use ggplot's faceting features to separate by cut

```ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() + facet_wrap(______)```

Solution

```ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() + facet_wrap( ~ cut)```

```ggplot(data = diamonds, aes(____)) +
______ + facet_wrap( ~ cut)```

Solution

```ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point() + facet_wrap( ~ cut)```

Now plot price by depth, with the color mapping to diamond color

`ggplot(____) +_____`

Solution

`ggplot(data = diamonds, aes(x = depth, y = price, color = color)) + geom_point() `

## Unit conversions

Complete example: Fahrenheit to kelvin

```fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
fahr_to_kelvin(212)```

Missing parts

```# Convert 1 yard to 0.9144 meters
meters2yards <- function(argument goes here) {
yards <- _code goes here!_
return(yards)
}

meters2yards()```

Problem statement only

``````Write a function to convert atmospheres to pounds per square inch
``````

## More unit conversion

Full function `fahr_to_kelvin(26) => 269.82`

```fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}```

Parital code for `fahr_to_celsius(26) => -3.3`

```fahr_to_celsius <-function(temp) {
celsius <- (BLANK - 32)/1.8
return(celsius)
}```

How does the output change is you input different temperatures? Why?

Function `celsius_to_kelvin(-3.3) => 269.82`

```BLANK <- function(temp) {
kelvin <- BLANK
BLANK # What happens if you leave this line blank? Does the function still work?
}```

Write your own `kelvin_to_fahr(269.82) = > 26`

`kelvin_to_fahr <- BLANK`

## Faded examples for a lookup table pattern

1. Begin with a vector of character values

2. Create ordered vector of unique values (a)

3. Create ordered vector of new values to replace unique values (b)

4. assign (a) as names to (b)

5. subset result by the vector to re-label

6. Change `input <- c("2", "2", "3")` to `c("two", "two", "three")`

```numbers <- c("1", "2", "3")
words <- c("one", "two", "three")
names(words) <- numbers
words[input]```
1. Translate `input <- c("three", "two", "three")` to `c("tres", "dos", "tres")`
```english <- c("one", "two", "three")
spanish <- c("uno",____________)
names(spanish) <- ________
______[input]```
1. Score `input <- c("jack", "king")` to `c(11, 13)`
```cards <- ____________________
scores <- ____________________
____(scores) <- _______________
______ [ ________ ]```

Diagnostic question: (determine if student understands how character subsetting works in R) What does this return?

`c(A = 1, B = 2, C = 3) ["B"]`

## `ggplot` faded example

`ggplot(surveys_complete) + geom_point(mapping = aes(x = weight, y = hindfoot_length))`

Create a plot of GDPpercapita vs life expentancy using Gapminder dataset

`ggplot(______) + geom_point(mapping=aes(x=______, y=______))`

`ggplot(______) + geom_point(mapping=aes(x=______, y=______, ________)`

Reduce overplotting of all groups by setting alpha parameter equal to 0.5

`ggplot(______) + geom_point(_____________________)`

Add a layer with linear model (smooth function) using the same aestetics

`ggplot(______) + geom_point(_____________________), ___________`

Instead of plotting a smooth line per continent, how to just plot a smooth line for the entire dataset.

`ggplot(______) + ______________ + ______________`

## Subsetting and conditionals

Assumes learner knows about `&`, `|`, `==`, `!=`.

#### Vectors

``````x <- c("d", "a", "b", "c", "c", "d")
``````

Get values that are equal to `"d"`

`x[x == "d"]`

Get values that are equal to either `"d"` or `"a"`

`x[x == "d" | _____ ]`

Return a vector with all elements except `"c"`

`x[ ______ ]`

#### data.frame

`y <-  read.csv("https://ndownloader.figshare.com/files/2292169")`

Which days were recorded in December?

```dec_days <- y[y\$month  == 12, "day"]
barplot(table(dec_days))```

What species were recorded in Dec and that are female?

`y[ ______ , ]`

```spp_dec <- y[y\$month == 12 & y\$sex == "F", "species"]
barplot(table(spp_dec))```

Bonus question: Why does the barplot include all of the species?

## Data conversion

Linear conversion

```miles_to_kilometers <- function(miles) {
return(miles / 1.609344)
}```

Example 1: Complete the functions

```feet_to_miles <- function(feet) {
return(____ / 5280)
}```
```kilometers_to_parsecs <- function(kilometers) {
return(____)
}```

Example 2: Nested function

```feet_to_kilometers <- function(feet) {
miles <- feet_to_miles(feet)
return(miles_to_kilometers(miles))
}```
```feet_to_parsecs <- function(feet) {
miles <- feet_to_miles(feet)
____
return(____(kilometers))
}```

## Intersection, joining, union of data frames

Given these data

```library(dplyr)
set.seed(1000)

df1 <- data_frame(
x = LETTERS[1:10],
y = 1:10
) %>% sample()

df2 <- data_frame(
x = LETTERS[11:16],
y = 11:16
)
df2 <- bind_rows(df2, df1[1:4, ]) %>% sample()```

Function returning rows in common (intersection)

```df_in_common <- function(df1, df2) {
same_x <- df1\$x %in% df2\$x
same_y <- df1\$y %in% df2\$y
same_both <- same_x + same_y == 2
df1[same_both, ]
}```

Function returning rows not in common (anti-join)

```df_not_in_common <- function(df1, df2) {
same_x <- ___ %in% ___
same_y <- ___ %in% ___
rows_not_the_same <- ___ + ___ != ___
df1[___, ]
}```

Function returning all unique rows from both (union)

```df_union <- function(df1, df2) {
all_rows <- ___
dup <- which(duplicated(___))
all_rows[___, ___]
}```