<h2>Table_of_Contents</h2>

- [R data structure with I/O](#R_data_structure_with_I/O)
    - [Data structure](#Data_structure)
        - [Scalar](#Scalar)
        - [Vector](#Vector)
        - [Matrix](#Matrix)
        - [Array](#Array)
        - [Data frame](#Data_frame)
        - [List](#List)
    - [Manipulate the data](#Manipulate_the_data)
- [References](#References)

<h3>R_data_structure_with_I/O</h3>

<h4>Data_structure</h4>

<center><img src="img/r_Data_structure.png" ></center> 

<li>Save only with the same data type: scalar, vector, matrix, array</li>
<li>Different types of data types can be stored: data frames, lists</li>

<h5>Scalar</h5>

In [4]:
# Scalar
c(1)

[Back to the top](#Table_of_Contents)

<h5>Vector</h5>

In [9]:
# Vector (c) - Numeric data type
num <- c(1,2,3,4) # c(1:4)
num
# View(num) # only working in RStudio

In [10]:
num.T <- t(num) # Transpose column vector to row vector
num.T
# View(numT) # only working in RStudio

0,1,2,3
1,2,3,4


In [12]:
num %*% num.T # Vector multiplication (4*1 X 1*4) = 4*4

0,1,2,3
1,2,3,4
2,4,6,8
3,6,9,12
4,8,12,16


In [13]:
num.T %*% num # Vector multiplication (1*4 X 4*1) = 1*1

0
30


In [14]:
# character, logical data type
c("M","F","F","M")
c(TRUE, FALSE, FALSE, TRUE)

[Back to the top](#Table_of_Contents)

<h5>Matrix</h5>

In [18]:
m <- 1:12
print(m)

# Matrix (Matrix 4x3)
mtx <- matrix(m, nrow=4)
print(mtx)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12


In [19]:
mtx[3,2]

In [1]:
mat.ex = matrix(c(1,2,3,4,5,6), nrow = 2)
dim(mat.ex)
mat.ex

0,1,2
1,3,5
2,4,6


It is easy to construct diagonal matrices and extract diagonals from matrices in R. The same command is use.

In [6]:
a = matrix(1:9, nrow = 3)
diag(a)

diag(c(1,2,3))

# you can even change the diagonal of the matrix
# Here I'll change it to 1s
a
diag(a) = 1
a

0,1,2
1,0,0
0,2,0
0,0,3


0,1,2
1,4,7
2,5,8
3,6,9


0,1,2
1,4,7
2,1,8
3,6,1


Basic math with matrices

Matrix math is a bit cumbersome! Three key strokes for the multiplication symbol…oh well.

In [1]:
a = matrix(c(1, 2, 3, 2, 1, 2, 2, 2, 1), nrow = 3)
a
# transpose
t(a)
# multiplication
a%*%a
# inverse 
solve(a)

0,1,2
1,2,2
2,1,2
3,2,1


0,1,2
1,2,3
2,1,2
2,2,1


0,1,2
11,8,8
10,9,8
10,10,11


0,1,2
-0.4285714,0.2857143,0.2857143
0.5714286,-0.7142857,0.2857143
0.1428571,0.5714286,-0.4285714


Extracting parts of a matrix is similar to other programs.

In [2]:
a = matrix(c(1:9), nrow = 3)
a
# First row
a[1,]
# First column
a[,1]
# Second and third entry of the third row
a[3, 2:3]

0,1,2
1,4,7
2,5,8
3,6,9


[Back to the top](#Table_of_Contents)

<h5>Array</h5>

In [17]:
# Array (Array 2x3x2)
arr <- array(m, c(2,3,2))
print(arr)

, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12



In [2]:
array.ex = array(c(1:24), c(4, 3, 2))
dim(array.ex)
array.ex

[Back to the top](#Table_of_Contents)

<h5>Data_frame</h5>

In [20]:
# Data Frame
var1 <- c(1,2,3,4)
var2 <- factor(c("M","F","F","M"))
df = data.frame(id = var1, sex = var2)
str(df)

'data.frame':	4 obs. of  2 variables:
 $ id : num  1 2 3 4
 $ sex: Factor w/ 2 levels "F","M": 2 1 1 2


In [4]:
col1 = c(1:10)
col2 = c(21:30)
col3 = rep(c("a1", "b1"), each = 5)
data.frame.ex = data.frame(col1, col2, col3)
data.frame.ex
dim(data.frame.ex)
names(data.frame.ex)
# you can pull columns of a data frame out using the column name
data.frame.ex$col1
# you can also add a new column to a pre-existing data frame
data.frame.ex$col4 = rep(c("m", "f"), 5)
data.frame.ex

col1,col2,col3
1,21,a1
2,22,a1
3,23,a1
4,24,a1
5,25,a1
6,26,b1
7,27,b1
8,28,b1
9,29,b1
10,30,b1


col1,col2,col3,col4
1,21,a1,m
2,22,a1,f
3,23,a1,m
4,24,a1,f
5,25,a1,m
6,26,b1,f
7,27,b1,m
8,28,b1,f
9,29,b1,m
10,30,b1,f


In [5]:
# you can easily convert a matrix to a data.frame and add names
df.ex2 = matrix(1:18, ncol = 3)
df.ex2 = data.frame(df.ex2)
names(df.ex2) = c("col1", "col2", "col3")
df.ex2

col1,col2,col3
1,7,13
2,8,14
3,9,15
4,10,16
5,11,17
6,12,18


[Back to the top](#Table_of_Contents)

<h5>List</h5>

In [21]:
# list
v1 <- c(1,2,3,4)
v2 <- matrix(1:12, nrow=4)
v3 <- array(1:12, c(2,3,2))
v4 <- data.frame(id = c(1,2,3,4), sex = c("M","F","F","M"))
lt <- list (v1, v2, v3, v4)
str(lt)

List of 4
 $ : num [1:4] 1 2 3 4
 $ : int [1:4, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 $ : int [1:2, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
 $ :'data.frame':	4 obs. of  2 variables:
  ..$ id : num [1:4] 1 2 3 4
  ..$ sex: Factor w/ 2 levels "F","M": 2 1 1 2


In [23]:
print(lt)

[[1]]
[1] 1 2 3 4

[[2]]
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

[[3]]
, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12


[[4]]
  id sex
1  1   M
2  2   F
3  3   F
4  4   M



In [3]:
list.ex = list()
list.ex$a = c(1:10)
list.ex$b = c(45:70)
list.ex$c = factor(c("a1", "b1", "c1"))
list.ex$d = as.character(c("bat", "cat", "bird", "dog"))
length(list.ex)
list.ex
# you can pull out one element of the list using $
list.ex$d

[Back to the top](#Table_of_Contents)

<h3>Manipulate_the_data</h3>

Two libraries, tidyr and dplyr libraries provide really great functions for manipulating data frames. A couple of quick examples are in the following. Typically I use it to convert between wide and long data formats.

In [9]:
# install.packages("dplyr")
# install.packages("tidyr")
# install.packages("Lahman")
library(dplyr)
library(tidyr)
library(Lahman)  #I'm using a data set from this library

also installing the dependencies 'ellipsis', 'glue', 'lifecycle', 'rlang', 'tibble', 'tidyselect', 'vctrs', 'pillar'




  There are binary versions available but the source versions are later:
       binary source needs_compilation
tibble  3.1.1  3.1.2              TRUE
pillar  1.6.0  1.6.1             FALSE

  Binaries will be installed


"package 'dplyr' is in use and will not be installed"

package 'ellipsis' successfully unpacked and MD5 sums checked
package 'glue' successfully unpacked and MD5 sums checked


"restored 'glue'"

package 'lifecycle' successfully unpacked and MD5 sums checked
package 'rlang' successfully unpacked and MD5 sums checked


"restored 'rlang'"

package 'tibble' successfully unpacked and MD5 sums checked


"restored 'tibble'"

package 'tidyselect' successfully unpacked and MD5 sums checked


"restored 'tidyselect'"

package 'vctrs' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\kimji\AppData\Local\Temp\RtmpQDn5MV\downloaded_packages


installing the source package 'pillar'

"installation of package 'pillar' had non-zero exit status"also installing the dependencies 'glue', 'rlang', 'tibble', 'pillar', 'dplyr', 'tidyselect', 'cpp11'




  There are binary versions available but the source versions are later:
       binary source needs_compilation
tibble  3.1.1  3.1.2              TRUE
pillar  1.6.0  1.6.1             FALSE

  Binaries will be installed


"packages 'dplyr', 'tidyr' are in use and will not be installed"

package 'glue' successfully unpacked and MD5 sums checked


"restored 'glue'"

package 'rlang' successfully unpacked and MD5 sums checked


"restored 'rlang'"

package 'tibble' successfully unpacked and MD5 sums checked


"restored 'tibble'"

package 'tidyselect' successfully unpacked and MD5 sums checked


"restored 'tidyselect'"

package 'cpp11' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\kimji\AppData\Local\Temp\RtmpQDn5MV\downloaded_packages


installing the source package 'pillar'

"installation of package 'pillar' had non-zero exit status"

package 'Lahman' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\kimji\AppData\Local\Temp\RtmpQDn5MV\downloaded_packages


"package 'Lahman' was built under R version 3.6.3"

In [10]:
# Use this Batting data (this is in the Lahman library)
dim(Batting)
names(Batting)

players = group_by(Batting, playerID)
#players looks the same, but has more info, as the grouping has been defined
dim(players)
dim(Batting)
head(players)
head(Batting)
# Now I can create easy summaries, over players...
games = summarise(players, total = sum(G))

# dplyr adds the %>% function, which serves as a "pipe", piping the output from one
#  command into the input of the next.  Really cleans up code.  This does what the above code did
games.using.dplyr = Batting %>%
  group_by(playerID) %>%
  summarise(total = sum(G))

playerID,yearID,stint,teamID,lgID,G,AB,R,H,X2B,...,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP
abercda01,1871,1,TRO,,1,4,0,0,0,...,0,0,0,0,0,,,,,0
addybo01,1871,1,RC1,,25,118,30,32,6,...,13,8,1,4,0,,,,,0
allisar01,1871,1,CL1,,29,137,28,40,4,...,19,3,1,2,5,,,,,1
allisdo01,1871,1,WS3,,27,133,28,44,10,...,27,1,1,0,2,,,,,0
ansonca01,1871,1,RC1,,25,120,29,39,11,...,16,6,2,2,1,,,,,0
armstbo01,1871,1,FW1,,12,49,9,11,2,...,5,0,1,0,1,,,,,0


playerID,yearID,stint,teamID,lgID,G,AB,R,H,X2B,...,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP
abercda01,1871,1,TRO,,1,4,0,0,0,...,0,0,0,0,0,,,,,0
addybo01,1871,1,RC1,,25,118,30,32,6,...,13,8,1,4,0,,,,,0
allisar01,1871,1,CL1,,29,137,28,40,4,...,19,3,1,2,5,,,,,1
allisdo01,1871,1,WS3,,27,133,28,44,10,...,27,1,1,0,2,,,,,0
ansonca01,1871,1,RC1,,25,120,29,39,11,...,16,6,2,2,1,,,,,0
armstbo01,1871,1,FW1,,12,49,9,11,2,...,5,0,1,0,1,,,,,0


[Back to the top](#Table_of_Contents)

<h2>References</h2>

Intoduction of the data structure in R, please refer to the link below: <br>
[Ch02_01.R 데이터 처리(데이터구조)01](https://youtu.be/DJZGU6DieNs) <br>
[Ch02_02.R 데이터 처리(벡터)02](https://youtu.be/PsizlmG1ZQ0) <br>
[Ch02_03.R 데이터 처리(행렬과 벡터)03](https://youtu.be/OuT9jIr2Or4) <br>
[Ch02_04.R 데이터 처리(데이터프레임과 리스트)04](https://youtu.be/bvvKJpTlP-s) <br>
[R 데이터 구조 (Data Structure in R) : scala, vector, factor, matrix, array, dataframe, list](https://rfriend.tistory.com/14?category=601862)
[Matrices, lists, arrays and data frames](https://jeanettemumford.org/R-tutorial/01-getting-started/)