# Subsetting

#### Quiz
1. vector に対して positive integer, negative integer, logical vector, character vector で subsetting した結果
1. `[` と `[[` と `$` の違い（list）
1. `drop = FALSE`　はいつ使うか
1. matrix x に対して，`x[] <- 0` と `x <- 0` の違い
1. categorical vector を relabel するときの，named vector の使い方

## Data types

### Atomic vectors

In [1]:
x <- c(2.1, 4.2, 3.3, 5.4)

* 正の整数

In [2]:
x[c(3,1)]

In [3]:
x[order(x)]

In [4]:
x[c(1,1)]

In [5]:
x[c(2.1, 2.9)]

* 負の整数

In [6]:
x[-c(3,1)]

In [7]:
x[c(-1, 2)]

ERROR: Error in x[c(-1, 2)]: only 0's may be mixed with negative subscripts


* 論理ベクトル

In [8]:
x[c(T, T, F, F)]

In [9]:
x[x > 3]

In [10]:
x[c(T, F)]

In [11]:
x[c(T, F, T, F)]

In [12]:
x[c(T, T, NA, F)]

* 空

In [13]:
x[]

* ゼロ

In [16]:
print(x[0])

numeric(0)


* 文字列ベクトル

In [18]:
(y <- setNames(x, letters[1:4]))

In [19]:
y[c("d", "c", "a")]

In [20]:
y[c("a", "a", "a")]

In [21]:
z <- c(abc = 1, def = 2)
z[c("a", "d")]

### List

* 基本的に stomic vector と同じ
* `[`: list
* `[[`, `$`: list の構成要素を返す

### Matrices and arrays

* Multiple vector

In [22]:
a <- matrix(1:9, nrow = 3)
colnames(a) <- c("A", "B", "C")
a[1:2, ]

A,B,C
1,4,7
2,5,8


In [23]:
a[c(T,F,T), c("B", "A")]

B,A
4,1
6,3


In [24]:
a[0, -2]

A,C


* Single vector

In [26]:
(vals <- outer(1:5, 1:5, FUN = "paste", sep = ","))

0,1,2,3,4
11,12,13,14,15
21,22,23,24,25
31,32,33,34,35
41,42,43,44,45
51,52,53,54,55


In [27]:
vals[c(4, 15)]

* Matrix

In [29]:
vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
select <- matrix(ncol = 2, byrow = TRUE, c(
    1, 1,
    3, 1,
    2, 4
))
vals[select]

### Data frames

* single vector: list
* two vector: matrices

In [31]:
df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])

df[df$x == 2,]

Unnamed: 0,x,y,z
2,2,2,b


In [32]:
df[c(1,3),]

Unnamed: 0,x,y,z
1,1,3,a
3,3,1,c


In [33]:
df[c("x", "z")]

Unnamed: 0,x,z
1,1,a
2,2,b
3,3,c


In [34]:
df[, c("x", "z")]

Unnamed: 0,x,z
1,1,a
2,2,b
3,3,c


In [35]:
str(df["x"])

'data.frame':	3 obs. of  1 variable:
 $ x: int  1 2 3


In [36]:
str(df[, "x"])

 int [1:3] 1 2 3


### S3 objects

* atomic vector, array, list

### S4 objects

* additional subsetting operator:  `@`, `slot()`

### Excercises

`1`. 間違い探し 

In [8]:
head(mtcars)
str(mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


In [1]:
mtcars[mtcars$cyl = 4, ]

ERROR: Error in parse(text = x, srcfile = src): <text>:1:19: unexpected '='
1: mtcars[mtcars$cyl =
                      ^


In [5]:
mtcars[mtcars$cyl == 4, ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


In [2]:
mtcars[-1:4, ]

ERROR: Error in xj[i]: only 0's may be mixed with negative subscripts


In [6]:
mtcars[-(1:4), ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
Merc 450SLC,15.2,8,275.8,180,3.07,3.78,18.0,0,0,3,3


In [3]:
mtcars[mtcars$cyl <= 5]

ERROR: Error in `[.data.frame`(mtcars, mtcars$cyl <= 5): undefined columns selected


In [9]:
mtcars[mtcars$cyl <= 5, ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


In [4]:
mtcars[mtcars$cyl == 4 | 6, ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [10]:
mtcars[mtcars$cyl == 4 | mtcars$cyl == 6, ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1


`2`. x <- 1:5; x[NA] の結果が NA 5つなのはなぜか

In [11]:
x <- 1:5
x[NA]

In [12]:
x[NA_real_]

[1] NA

`3`. `upper.tri()` の挙動は今まで説明したsubsetting rule で説明できるか

* TRUE の要素が返る?

In [13]:
x <- outer(1:5, 1:5, FUN = "*")
x

0,1,2,3,4
1,2,3,4,5
2,4,6,8,10
3,6,9,12,15
4,8,12,16,20
5,10,15,20,25


In [14]:
x[upper.tri(x)]

In [15]:
upper.tri(x)

0,1,2,3,4
False,True,True,True,True
False,False,True,True,True
False,False,False,True,True
False,False,False,False,True
False,False,False,False,False


## Subsetting operators

`4`. `mtcars[1:20]` がエラーなのはなぜか

In [16]:
mtcars[1:20]

ERROR: Error in `[.data.frame`(mtcars, 1:20): undefined columns selected


In [17]:
mtcars[1:20,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


`5`. `diag()` の実装

In [2]:
x <- outer(1:5, 1:5, FUN = "*")
diag(x)

In [9]:
my.diag <- function(m) {
    n <- nrow(m)
    res <- c()
    for(i in 1:n) {
        res <- c(res, m[i,i])
    }
    return(res)
}
my.diag(x)

`6`. `df[is.na(df)] <- 0` の説明

In [17]:
df <- data.frame(a = c(1,2,3,NA,5), b = c(2,3,NA,5,6), c = c(3,NA,5,6,7))
df

Unnamed: 0,a,b,c
1,1.0,2.0,3.0
2,2.0,3.0,
3,3.0,,5.0
4,,5.0,6.0
5,5.0,6.0,7.0


In [18]:
is.na(df)

a,b,c
False,False,False
False,False,True
False,True,False
True,False,False
False,False,False


In [16]:
df[is.na(df)] <- 0
df

Unnamed: 0,a,b,c
1,1,2,3
2,2,3,0
3,3,0,5
4,0,5,6
5,5,6,7


## Subsetting and assignment

## Applications