<a href="https://colab.research.google.com/github/lisphilar/r_language/blob/main/01_data_types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# R language: data types
As data types, R langunage has vectors, lists, dataframes and matrixes. Lists can accept different types for elements. Dataframe is a two-dimensional data structure. Matrixes are for numeric values.

R version:

In [1]:
print(R.version.string)

[1] "R version 4.3.1 (2023-06-16)"


## Types of vectors
R language has the following types for vectors.

- string
- double
- integer
- logical
- factor

### Strings

In [2]:
# Python: strings = ['A', 'B', 'C']
strings <- c("A", "B", "C")

In [3]:
strings

In [4]:
# Python: type(strings)
print(class(strings))
print(typeof(strings))

[1] "character"
[1] "character"


### Doubles

In [5]:
doubles <- c(1.0, 1.1, 1.2)

In [6]:
doubles

In [7]:
print(class(doubles))
print(typeof(doubles))

[1] "numeric"
[1] "double"


With NAs. Note that NAs are different from NULLs, which mean empty place folders. `print` function does not show NULL values.

In [8]:
doubles_with_NAs <- c(1.0, NA, 1.2, NA)
print(doubles_with_NAs)
print(typeof(doubles_with_NAs))
print(class(doubles_with_NAs))

[1] 1.0  NA 1.2  NA
[1] "double"
[1] "numeric"


In [9]:
print(c(NULL, 1.0))

[1] 1


### Integers
Note that "L" is required to use integer type.

In [10]:
integers <- c(1L, 2L, 3L)

In [11]:
integers

In [12]:
print(class(integers))
print(typeof(integers))

[1] "integer"
[1] "integer"


In [13]:
typeof(c(1, 2, 3))

We can use `type.conver` to convert types.

In [14]:
typeof(type.convert(c(1, 2, 3), as.is="integer"))

### Logical values

In [15]:
logicals <- c(TRUE, FALSE, TRUE)

In [16]:
logicals

In [17]:
print(class(logicals))
print(typeof(logicals))

[1] "logical"
[1] "logical"


### Factors
Vector of factors can be created with a string vector and `factor` function.

In [18]:
severities <- factor(c("severe", "mild"), levels=c("mild", "moderate", "severe"), ordered=TRUE)

In [19]:
print(severities)

[1] severe mild  
Levels: mild < moderate < severe


In [20]:
print(class(severities))
print(typeof(severities))

[1] "ordered" "factor" 
[1] "integer"


We can check if elements of a new vector are lager than the defined factor vector. `FALSE` will be returned for undefined values.

In [21]:
c("mild", "severe", "severe", "moderate", "undefined") > "moderate"

In [22]:
c("mild", "severe", "severe", "moderate", "undefined") > "undefined"

## Element numbers

Element numbers start from 1, not 0.

In [23]:
vector1 <- c(1, 2, 3, 4)
print(vector1)
print(vector1[1])

[1] 1 2 3 4
[1] 1


In [24]:
print(vector1)
print(vector1[2:3])

[1] 1 2 3 4
[1] 2 3


Minus numbers mean exclusion. This is different from that of Python.

In [25]:
print(vector1)
print(vector1[-3])

[1] 1 2 3 4
[1] 1 2 4


With a logical vector.

In [26]:
print(vector1)
print(vector1[c(TRUE, FALSE, FALSE, TRUE)])

[1] 1 2 3 4
[1] 1 4


We can omit TRUE there.

In [27]:
print(vector1)
print(vector1[c(TRUE, FALSE, FALSE)])

[1] 1 2 3 4
[1] 1 4


## Lists

In [28]:
list1 <- list(strings="A", doubles=1.2, integers=1L, logicals=TRUE, nas=NA, nulls=NULL)
list1

In [29]:
print(class(list1))
print(typeof(list1))

[1] "list"
[1] "list"


In [30]:
list1[2]

In [31]:
list1$doubles

In [32]:
list1[c("doubles", "logicals")]

## Dataframes

### Create a dataframe
We can create a dataframe with vectors and `data.frame` function.

In [33]:
names <- c("John Doe", "Jane Doe", "Steve Graves")
temperatures <- c(98.1, 98.6, 101.4)
flu_statuses <- c(FALSE, FALSE, TRUE)
genders <- factor(c("MALE", "FEMALE", "MALE"))
blood_types <- factor(c("O", "AB", "A"), levels = c("A", "B", "AB", "O"))
symptoms <- factor(c("SEVERE", "MILD", "MODERATE"), levels = c("MILD", "MODERATE", "SEVERE"), ordered = TRUE)

In [34]:
pt_df <- data.frame(names, temperatures, flu_statuses, genders, blood_types, symptoms, stringsAsFactors=FALSE)
pt_df

names,temperatures,flu_statuses,genders,blood_types,symptoms
<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>
John Doe,98.1,False,MALE,O,SEVERE
Jane Doe,98.6,False,FEMALE,AB,MILD
Steve Graves,101.4,True,MALE,A,MODERATE


In [35]:
print(class(pt_df))
print(typeof(pt_df))

[1] "data.frame"
[1] "list"


### Element selection

In [36]:
pt_df$names

In [37]:
pt_df[c("genders", "symptoms")]

genders,symptoms
<fct>,<ord>
MALE,SEVERE
FEMALE,MILD
MALE,MODERATE


In [38]:
pt_df[c(1, 2), c("genders", "symptoms")]

Unnamed: 0_level_0,genders,symptoms
Unnamed: 0_level_1,<fct>,<ord>
1,MALE,SEVERE
2,FEMALE,MILD


In [39]:
pt_df[1, 2]

In [40]:
pt_df[c(1, 3), c(2, 4)]

Unnamed: 0_level_0,temperatures,genders
Unnamed: 0_level_1,<dbl>,<fct>
1,98.1,MALE
3,101.4,MALE


In [41]:
pt_df[1,]

Unnamed: 0_level_0,names,temperatures,flu_statuses,genders,blood_types,symptoms
Unnamed: 0_level_1,<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>
1,John Doe,98.1,False,MALE,O,SEVERE


In [42]:
pt_df[, 2]

In [43]:
pt_df

names,temperatures,flu_statuses,genders,blood_types,symptoms
<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>
John Doe,98.1,False,MALE,O,SEVERE
Jane Doe,98.6,False,FEMALE,AB,MILD
Steve Graves,101.4,True,MALE,A,MODERATE


In [44]:
pt_df[,]

names,temperatures,flu_statuses,genders,blood_types,symptoms
<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>
John Doe,98.1,False,MALE,O,SEVERE
Jane Doe,98.6,False,FEMALE,AB,MILD
Steve Graves,101.4,True,MALE,A,MODERATE


In [45]:
pt_df[-1,]

Unnamed: 0_level_0,names,temperatures,flu_statuses,genders,blood_types,symptoms
Unnamed: 0_level_1,<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>
2,Jane Doe,98.6,False,FEMALE,AB,MILD
3,Steve Graves,101.4,True,MALE,A,MODERATE


### Add new columns

In [46]:
pt_df$celcius <- (pt_df$temperatures - 32) * 5 / 9
pt_df[c("names", "temperatures", "celcius")]

names,temperatures,celcius
<chr>,<dbl>,<dbl>
John Doe,98.1,36.72222
Jane Doe,98.6,37.0
Steve Graves,101.4,38.55556


### Using CSV files

In [47]:
pt_df

names,temperatures,flu_statuses,genders,blood_types,symptoms,celcius
<chr>,<dbl>,<lgl>,<fct>,<fct>,<ord>,<dbl>
John Doe,98.1,False,MALE,O,SEVERE,36.72222
Jane Doe,98.6,False,FEMALE,AB,MILD,37.0
Steve Graves,101.4,True,MALE,A,MODERATE,38.55556


In [48]:
write.csv(pt_df, file="pt.csv", row.names=FALSE)

In [49]:
pt_new <- read.csv("pt.csv", stringsAsFactors=FALSE, header=TRUE)
pt_new$symptoms <- factor(pt_new$symptoms, levels=c("MILD", "MODERATE", "SEVERE"), ordered=TRUE)
pt_new

names,temperatures,flu_statuses,genders,blood_types,symptoms,celcius
<chr>,<dbl>,<lgl>,<chr>,<chr>,<ord>,<dbl>
John Doe,98.1,False,MALE,O,SEVERE,36.72222
Jane Doe,98.6,False,FEMALE,AB,MILD,37.0
Steve Graves,101.4,True,MALE,A,MODERATE,38.55556


### Use sample data
Sample datasets are listed on [R言語 サンプルデータ一覧](https://www.trifields.jp/r-sample-data-491).


In [50]:
titanic_raw <- data.frame(Titanic)
titanic_raw[1:5,]

Unnamed: 0_level_0,Class,Sex,Age,Survived,Freq
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<fct>,<dbl>
1,1st,Male,Child,No,0
2,2nd,Male,Child,No,0
3,3rd,Male,Child,No,35
4,Crew,Male,Child,No,0
5,1st,Female,Child,No,0


In [51]:
str(titanic_raw)

'data.frame':	32 obs. of  5 variables:
 $ Class   : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
 $ Age     : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
 $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Freq    : num  0 0 35 0 0 0 17 0 118 154 ...


Create a dataframe for each person by converting "Freq" column.

Ref.  
[Qiita: Rのサンプルデータで遊ぶ①](https://qiita.com/0_u0/items/450e985e88469b4bee4c)

In [52]:
titanic_df <- data.frame(lapply(titanic_raw, function(i){rep(i, titanic_raw[, 5])}))[-5]
titanic_df[1:4,]

Unnamed: 0_level_0,Class,Sex,Age,Survived
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<fct>
1,3rd,Male,Child,No
2,3rd,Male,Child,No
3,3rd,Male,Child,No
4,3rd,Male,Child,No


In [53]:
str(titanic_df)

'data.frame':	2201 obs. of  4 variables:
 $ Class   : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age     : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 1 1 ...
 $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...


In [54]:
summary(titanic_df)

  Class         Sex          Age       Survived  
 1st :325   Male  :1731   Child: 109   No :1490  
 2nd :285   Female: 470   Adult:2092   Yes: 711  
 3rd :706                                        
 Crew:885                                        

In [55]:
air_raw <- data.frame(AirPassengers)
str(air_raw)

'data.frame':	144 obs. of  1 variable:
 $ AirPassengers: Time-Series  from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...


In [56]:
summary(air_raw)

 AirPassengers  
 Min.   :104.0  
 1st Qu.:180.0  
 Median :265.5  
 Mean   :280.3  
 3rd Qu.:360.5  
 Max.   :622.0  

## Matrix

### Create a matrix

Specifing the number of rows,

In [57]:
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow=2, byrow=FALSE)
m1

0,1,2
1,3,5
2,4,6


Specifing the number of columns,

In [58]:
m2 <- matrix(c(1, 2, 3, 4, 5, 6), ncol=2, byrow=FALSE)
m2

0,1
1,4
2,5
3,6


By row,

In [59]:
m3 <- matrix(c(1, 2, 3, 4, 5, 6), nrow=2, byrow=TRUE)
m3

0,1,2
1,2,3
4,5,6


In [60]:
print(class(m3))
print(typeof(m3))

[1] "matrix" "array" 
[1] "double"


### Element selection

In [61]:
m3[,]

0,1,2
1,2,3
4,5,6


In [62]:
m3[1, 1]

In [63]:
m3[1,]

In [64]:
m3[,2]

## Saving ofjects to RData file

In [65]:
save(pt_df, m3, file="data_types.RData")

Loading:

In [66]:
load(file="data_types.RData")

### Saving all objects

In [67]:
ls()

In [68]:
rm(m1, m3)

In [69]:
save.image(file="all.RData")

In [70]:
rm(list=ls())

In [71]:
ls()

In [72]:
load(file="all.RData")

In [73]:
ls()