-
Notifications
You must be signed in to change notification settings - Fork 7.2k
/
lesson.yaml
112 lines (90 loc) · 5.98 KB
/
lesson.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
- Class: meta
Course: R Programming
Lesson: Missing Values
Author: Nick Carchedi
Type: Standard
Organization: JHU Biostat
Version: 2.2.0
- Class: text
Output: Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness.
- Class: text
Output: In R, NA is used to represent any value that is 'not available' or 'missing' (in the statistical sense). In this lesson, we'll explore missing values further.
- Class: cmd_question
Output: Any operation involving NA generally yields NA as the result. To illustrate, let's create a vector c(44, NA, 5, NA) and assign it to a variable x.
CorrectAnswer: x <- c(44, NA, 5, NA)
AnswerTests: omnitest(correctExpr='x <- c(44, NA, 5, NA)')
Hint: Assign the vector c(44, NA, 5, NA) to a variable x. The NA must uppercase.
- Class: cmd_question
Output: Now, let's multiply x by 3.
CorrectAnswer: x * 3
AnswerTests: any_of_exprs('x * 3', '3 * x')
Hint: Try x * 3.
- Class: text
Output: Notice that the elements of the resulting vector that correspond with the NA values in x are also NA.
- Class: cmd_question
Output: To make things a little more interesting, lets create a vector containing 1000 draws from a standard normal distribution with y <- rnorm(1000).
CorrectAnswer: y <- rnorm(1000)
AnswerTests: omnitest(correctExpr='y <- rnorm(1000)')
Hint: The function rnorm() generates random numbers from a normal distribution. Type y <- rnorm(1000).
- Class: cmd_question
Output: Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).
CorrectAnswer: z <- rep(NA, 1000)
AnswerTests: omnitest(correctExpr='z <- rep(NA, 1000)')
Hint: Type z <- rep(NA, 1000) to generate a vector of 1000 NAs.
- Class: cmd_question
Output: Finally, let's select 100 elements at random from these 2000 values (combining y and z) such that we don't know how many NAs we'll wind up with or what positions they'll occupy in our final vector -- my_data <- sample(c(y, z), 100).
CorrectAnswer: my_data <- sample(c(y, z), 100)
AnswerTests: omnitest(correctExpr='my_data <- sample(c(y, z), 100)')
Hint: The sample() function draws a random sample from the data provided as its first argument (in this case c(y, z)) of the size specified by the second argument (100). The command my_data <- sample(c(y, z), 100) will give us what we want.
- Class: cmd_question
Output: Let's first ask the question of where our NAs are located in our data. The is.na() function tells us whether each element of a vector is NA. Call is.na() on my_data and assign the result to my_na.
CorrectAnswer: my_na <- is.na(my_data)
AnswerTests: omnitest(correctExpr='my_na <- is.na(my_data)')
Hint: Assign the result of is.na(my_data) to the variable my_na.
- Class: cmd_question
Output: Now, print my_na to see what you came up with.
CorrectAnswer: my_na
AnswerTests: omnitest(correctExpr='my_na')
Hint: Type my_na to view its contents.
- Class: text
Output: Everywhere you see a TRUE, you know the corresponding element of my_data is NA. Likewise, everywhere you see a FALSE, you know the corresponding element of my_data is one of our random draws from the standard normal distribution.
- Class: cmd_question
Output: In our previous discussion of logical operators, we introduced the `==` operator as a method of testing for equality between two objects. So, you might think the expression my_data == NA yields the same results as is.na(). Give it a try.
CorrectAnswer: my_data == NA
AnswerTests: omnitest(correctExpr='my_data == NA')
Hint: Try my_data == NA to see what happens.
- Class: text
Output: The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder for a quantity that is not available. Therefore the logical expression is incomplete and R has no choice but to return a vector of the same length as my_data that contains all NAs.
- Class: text
Output: Don't worry if that's a little confusing. The key takeaway is to be cautious when using logical expressions anytime NAs might creep in, since a single NA value can derail the entire thing.
- Class: text
Output: So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and FALSE for every numeric value, we can compute the total number of NAs in our data.
- Class: text
Output: The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs.
- Class: cmd_question
Output: Let's give that a try here. Call the sum() function on my_na to count the total number of TRUEs in my_na, and thus the total number of NAs in my_data. Don't assign the result to a new variable.
CorrectAnswer: sum(my_na)
AnswerTests: omnitest(correctExpr='sum(my_na)')
Hint: Use sum(my_na) to count the number NAs in the data.
- Class: cmd_question
Output: Pretty cool, huh? Finally, let's take a look at the data to convince ourselves that everything 'adds up'. Print my_data to the console.
CorrectAnswer: my_data
AnswerTests: omnitest(correctExpr='my_data')
Hint: Print my_data to the console.
- Class: cmd_question
Output: Now that we've got NAs down pat, let's look at a second type of missing value -- NaN, which stands for 'not a number'. To generate NaN, try dividing (using a forward slash) 0 by 0 now.
CorrectAnswer: 0/0
AnswerTests: omnitest(correctExpr='0/0')
Hint: Try 0/0.
- Class: cmd_question
Output: Let's do one more, just for fun. In R, Inf stands for infinity. What happens if you subtract Inf from Inf?
CorrectAnswer: Inf - Inf
AnswerTests: omnitest(correctExpr='Inf - Inf')
Hint: Type Inf - Inf. Can you guess the result?
- Class: mult_question
Output: "Would you like to receive credit for completing this course on
Coursera.org?"
CorrectAnswer: NULL
AnswerChoices: Yes;No
AnswerTests: coursera_on_demand()
Hint: ""