## GNU R
- R 是著名的统计学语言，应用广泛
- 大部分统计学算法都有 R 语言的实现
  - 非常适合快速试验
- `dplyr`: apt install r-cran-dplyr

## 熟悉 R 语言中的关系代数操作
- 集合基本运算：交、并、差，笛卡尔积

In [1]:
A <- data.frame(ID=c(1,2), name=c("Wang", "Li"))
A

ID,name
1,Wang
2,Li


In [2]:
B <- data.frame(ID=c(2,3), name=c("Li", "Zhang"))
B

ID,name
2,Li
3,Zhang


In [3]:
AB <- rbind(A,B)
AB

ID,name
1,Wang
2,Li
2,Li
3,Zhang


In [4]:
unique(AB)

Unnamed: 0,ID,name
1,1,Wang
2,2,Li
4,3,Zhang


In [5]:
merge(A, B)

ID,name
2,Li


In [6]:
require(dplyr)
setdiff(A, B)

Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

“Column `name` joining factors with different levels, coercing to character vector”

ID,name
1,Wang


In [7]:
AxB <- merge(A, B, by=NULL)
AxB

ID.x,name.x,ID.y,name.y
1,Wang,2,Li
2,Li,2,Li
1,Wang,3,Zhang
2,Li,3,Zhang


## 关系代数运算
- 投影，选择，连接，除法

In [8]:
A$name

In [9]:
subset(A, name=="Li")

Unnamed: 0,ID,name
2,2,Li


In [10]:
A$GPA = c(3,4)
A

ID,name,GPA
1,Wang,3
2,Li,4


In [11]:
B$age = c(19, 20)
B

ID,name,age
2,Li,19
3,Zhang,20


In [12]:
# 内连接
merge(A, B)

ID,name,GPA,age
2,Li,4,19


In [13]:
# 左连接
merge(A, B, all.x=TRUE)

ID,name,GPA,age
1,Wang,3,
2,Li,4,19.0


In [14]:
# 右连接
merge(A, B, all.y=TRUE)

ID,name,GPA,age
2,Li,4.0,19
3,Zhang,,20


In [15]:
# 外连接
merge(A, B, all=TRUE)

ID,name,GPA,age
1,Wang,3.0,
2,Li,4.0,19.0
3,Zhang,,20.0


## 关系代数之外的常用运算
- GroupBy, SortBy

In [19]:
AxB

ID.x,name.x,ID.y,name.y
1,Wang,2,Li
2,Li,2,Li
1,Wang,3,Zhang
2,Li,3,Zhang


In [16]:
AxB %>% group_by(name.x) %>% summarise(n=n())

name.x,n
Li,2
Wang,2


In [17]:
order(AxB$name.x)

In [18]:
AxB[order(AxB$name.x),]

Unnamed: 0,ID.x,name.x,ID.y,name.y
2,2,Li,2,Li
4,2,Li,3,Zhang
1,1,Wang,2,Li
3,1,Wang,3,Zhang
