# Data type

## Integer

In [1]:
class(1L)

In [2]:
is.integer(1L)

## Numeric

In [3]:
class(1)

In [4]:
is.numeric(10)

## Character

**Note**: In R '' and "" will return the same datatype: character


In [5]:
class("1")

In [6]:
class('1')

In [7]:
is.character("falsdkj")

## Logical

In [8]:
class(TRUE)

In [9]:
1L == "1"

In [10]:
1.0 == "1.0"

In [11]:
is.logical(FALSE)

## Type casting

In [12]:
a <- as.numeric(1.2L)

In [13]:
class(a)

In [14]:
as.logical(0)

## Assign variable

Note: variable in R is mutable

In [15]:
a = 12; a

In [16]:
b <- 12; b

In [17]:
a = 123; a

In [20]:
rm(a) # Using to remove memory allocation of variable

“object 'a' not found”


## Environment

In [21]:
ls() # list variable in workspace

In [22]:
getwd()

## Summary

- You can use the R programming language to perform statistical computation, data visualization, and predictive analysis. 

- The four most common data types in R are integer, numeric, character, and logical. 

- You can use the class() function and the is.integer(), is.numeric(), is.character(), and is.logical() functions to determine the data type. 

- You can convert some data types to other data types using the as.integer(), as.numeric(), as.character(), and as.logical() functions. 

- R provides math operators that you can use to perform calculations on your data. 

- Using variables in your calculations and providing them with descriptive names can help shorten your code and make it easier to read. 

- You can control the order of operations using parenthesis. 

- The R development environment includes the R console, R script files, and workspaces.  

- Two important tools for working with R code are RStudio and Jupyter. 

- RStudio features, like syntax highlighting and auto code completion, make writing code easier. 

- The main components of RStudio include the File Editor, Console, Workspace, and File, Plots, and Packages Explorer. 

- A Jupyter Notebook is made up of cells that can contain code, markdown files, or raw text.  

- An all-in-one Jupyter Notebook contains narration, code, data, and plots, images, or videos. 

## Vector

In [23]:
c(1, 2, 3) / 100

In [24]:
vec = c(1, 2, 3); vec;

In [25]:
c(-1:-99)[1:3]

In [26]:
c(0:1000)

In [27]:
# To retrieve a vector without an item, you can use negative 
# indexing. For example, the following returns a vector slice
# **without the first item**.
vec[-1]

In [28]:
c(1, "fas")

In [29]:
vec[0] # R is 1-indexing language

In [30]:
vec[100] # return <NA> if out of index;
class(vec[100]) # <NA> in numeric

### Conditional index 

In [31]:
vec[vec > 1] # Index like pandas

In [32]:
class(c(1, 2, NA, "flksadj"))

In [33]:
vector = c(1, 2, 3, TRUE)

In [34]:
vector[vector < 3]

In [35]:
NA * 132290

## Describe vector

In [36]:
summary(vector_number)

ERROR: Error in summary(vector_number): object 'vector_number' not found


In [37]:
vector_number <- c(1, 2, 2,3, 3, 65); vector_number

In [38]:
head(vector_number, n=3); tail(vector_number, n=2); vector

In [39]:
vector_character <- c("flksdajf", "flkdssja", 'hello', "hello", "fj1"); vector_character

In [40]:
summary(vector_character)

   Length     Class      Mode 
        5 character character 

In [41]:
vector_hybrid <- c("flkkasj", 1, 1.0, 1L, 12, 231); vector_hybrid

In [42]:
summary(vector_hybrid)

   Length     Class      Mode 
        6 character character 

Summary vector:
- number: return descriptive statistic for number like max, min, median,... <- Because quantitative data
- character: return length, class, mode <- Because qualitative data.
- hybrid: automatic convert to character vector :)))

In [43]:
c("flkkasj", 1, 1.0, 1L, 12, 231) == c("flkkasj", "1", "1.0", "1L", "12", "231")

In [44]:
class(c("flkkasj", 1, 1.0, 1L, 12, 231)) == class(c("flkkasj", "1", "1.0", "1L", "12", "231"))

In [45]:
factor(vector_number)

In [46]:
vector_factor = factor(vector_character); vector_factor

In [47]:
class(vector_factor)

In [48]:
summary(vector_factor) # Summary a factor vector will display elements frequency

In [49]:
as.numeric("2.0")

In [50]:
2 == 2.0

In [51]:
"2.0" == 2.0

In [52]:
1L == "1"

## List

In [53]:
lst = list("hello", 1.2, 1, c(1, 2, 4, 4.1)); lst

In [54]:
class(lst)

Note: In R __list__ can create a collection with __multiple data type__ inside, but __vector__ only __one datatype__, because vector will indeed the same datatype after putting value inside c() but list() will keep all the same.

## Named list

In [55]:
student <- list(name = "Alice",
              age = 20,
              score = c(10, 9, 6))

In [56]:
# Accessing value using [], $ (one at once) and c() (multiple at once)
student["name"]; student$name; student[c("name", "age")] 

In [57]:
student$age <- 21

In [58]:
student

In [59]:
# Remove key-value
student$name <- NULL; student

In [60]:
# [] and [[]] both used for indexing but [] returns sub-set
# [[]] returns element
student[1]; student["age"]; student[["age"]]

## Array

In [61]:
vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

In [62]:
# if dimension (a, b) with a * b != length(vector) => it will auto-repeat
# the old elements, e.g:
# (4, 3) with 4 * 3 != 9 => (1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3)
arr <- array(vector, dim=c(4, 3)); arr

0,1,2
1,5,9
2,6,1
3,7,2
4,8,3


In [63]:
# accessing row and column
arr[1, ]; arr[, 1]

## Matrices

In [64]:
a <- c(1, 2, 3, 4, 5, 6)

In [65]:
# matrix is more strict than array when creating a redundant (row, col)
# matrix() will throw an error if #vector != nrow * ncol
# byrow=TRUE will put elements as row ordering.
A <- matrix(a, nrow = 3, ncol= 2, byrow=TRUE); A

0,1
1,2
3,4
5,6


In [66]:
A[1:2, 2]

## DataFrame

In [67]:
df <- data.frame(index = c(1:9),
                name=c("Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Theta", "Sigma", "Omega", "Mu"),
                score=c(7.8, 10, 3.1, 2.9, 9, 7.6, 8.2, 8, 5));df

index,name,score
<int>,<chr>,<dbl>
1,Alpha,7.8
2,Beta,10.0
3,Gamma,3.1
4,Delta,2.9
5,Epsilon,9.0
6,Theta,7.6
7,Sigma,8.2
8,Omega,8.0
9,Mu,5.0


In [68]:
head(df, 5); tail(df, 7)

Unnamed: 0_level_0,index,name,score
Unnamed: 0_level_1,<int>,<chr>,<dbl>
1,1,Alpha,7.8
2,2,Beta,10.0
3,3,Gamma,3.1
4,4,Delta,2.9
5,5,Epsilon,9.0


Unnamed: 0_level_0,index,name,score
Unnamed: 0_level_1,<int>,<chr>,<dbl>
3,3,Gamma,3.1
4,4,Delta,2.9
5,5,Epsilon,9.0
6,6,Theta,7.6
7,7,Sigma,8.2
8,8,Omega,8.0
9,9,Mu,5.0


In [69]:
names(df); df[]

index,name,score
<int>,<chr>,<dbl>
1,Alpha,7.8
2,Beta,10.0
3,Gamma,3.1
4,Delta,2.9
5,Epsilon,9.0
6,Theta,7.6
7,Sigma,8.2
8,Omega,8.0
9,Mu,5.0


In [70]:
# Delete rows
df <- df[1:6, ]; df

Unnamed: 0_level_0,index,name,score
Unnamed: 0_level_1,<int>,<chr>,<dbl>
1,1,Alpha,7.8
2,2,Beta,10.0
3,3,Gamma,3.1
4,4,Delta,2.9
5,5,Epsilon,9.0
6,6,Theta,7.6


In [71]:
# Delete column
df[["score"]] <- NULL; df

Unnamed: 0_level_0,index,name
Unnamed: 0_level_1,<int>,<chr>
1,1,Alpha
2,2,Beta
3,3,Gamma
4,4,Delta
5,5,Epsilon
6,6,Theta


In [72]:
# Distinguish df["name"] and df[["name"]] in R
class(df["name"]) # return a sub-dataframe
class(df[["name"]]) # return a element-datatype vector

## Summary

- A vector is a string of numbers, characters, or logical data.  

- Factors (also known as categorical variables) are variables that take on a limited number of different values that can be nominal or ordinal. 

- You can use R to perform operations on a vector, such as sorting the items, finding the smallest or largest number, or performing arithmetic on its values. 

- Lists can store different types of data, unlike vectors, which can only store data of a single type. 

- An array is a single or multidimensional structure containing data of the same type (strings, characters, or integers)  

- A matrix is like an array but must be two-dimensional and can be arranged by columns or rows. 

- The main difference between a data frame and other data structures, like a list, is that each variable has a vector of elements of the same type.

## Conditional statements

In [73]:
age <- 18
name = "alice"
if (age > 18 & name == "adam") {
    print("you are adult")
} 
else {
    print("you are child")
}

ERROR: Error in parse(text = x, srcfile = src): <text>:6:1: unexpected 'else'
5: } 
6: else
   ^


In [74]:
# %in% is equivalent to \in
if (5 %in% a) {
    print("there is a number 5 in list")
}

[1] "there is a number 5 in list"


In [75]:
for (num in a) {
    print(num)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6


In [76]:
count <- 1
while (count <= 5) {
    count <- count + 1
    print(count)
}

[1] 2
[1] 3
[1] 4
[1] 5
[1] 6


## Function

In [77]:
# Function in R is first-order so it can be assigned to variable
plus2 <- function(x) {
    x + 2
}

In [78]:
plus2(29)

In [79]:
remove_number_2 <- function(x) {
    x[x != 2]
}

In [80]:
remove_number_2(c(1:30))

In [81]:
good_rating <- function(score=8) {
    ifelse(score >= 8, "good", "not good")
}

In [82]:
good_rating(); good_rating(7)

In [83]:
# Using function inside functioni
pass_or_not <- function(df, student_name, threshold=5) {
    score <- ifelse(student_name %in% df[["name"]], df[df[["name"]] == student_name, "score"], return("Student name not in df"))
    ifelse(score <= threshold | is.character(score), "not passed", "passed")
}

In [84]:
pass_or_not(df, "Alpha")

“'x' is NULL so the result will be NULL”


ERROR: Error in ans[ypos] <- rep(yes, length.out = len)[ypos]: replacement has length zero


## Global and local variables

In [85]:
func <- function() {
    global_number <<- "global"
    local_number <- "local"
}

In [86]:
func()

In [87]:
global_number; local_number

ERROR: Error in eval(expr, envir, enclos): object 'local_number' not found


## String operations

In [88]:
text <- readLines("./text.txt");

“incomplete final line found on './text.txt'”


In [89]:
nchar(text[0])
text

In [90]:
# function in R is only working in there scope, 
# doesn't change state
nchar(text[1]); 

In [91]:
char_list <- strsplit(text[1], " "); char_list; class(char_list); 
word_list <- unlist(char_list); word_list; class(word_list)

In [92]:
splitted_text = chartr(" ", "-", text[1]); splitted_text
#doesn't like julia depending on the function you 
class(text[1])

In [93]:
sorted_list <- sort(word_list); sorted_list

In [94]:
paste(sorted_list, collapse = " ")

In [95]:
sub_string <- substr(text[1], start=4, stop=100); sub_string

In [96]:
library(stringr)

ERROR: Error in library(stringr): there is no package called ‘stringr’


In [97]:
str_sub(text[1], -100, -1)

ERROR: Error in str_sub(text[1], -100, -1): could not find function "str_sub"


## Debugging

In [98]:
## Handling error
tryCatch(
    for(i in 1:3) {
        print(i + "a")
    },
    error = function(e) print("Can't concatenate number to character.")
)

[1] "Can't concatenate number to character."


In [99]:
## Handling warning
tryCatch(as.integer("A"),
        warning = function(e)
             print("warning.")
)



## Summary

- If statements use comparison and logical operators to test conditions in code. 

- For loops perform an operation for each item in a list, vector, or data frame column. 

- While loops perform an operation until a condition is no longer true. 

- Functions can be pre-defined or user-defined. 

- In user-defined functions, you can control the return value of a function, add logic using if statements, and call other functions. 

- You can define global variables using the <<- variable assignment operator. 

- You can use R functions to manipulate the characters in a string, split a string into a vector, and retrieve specific substrings from within a string. 

- Regular expressions are used to match patterns in strings and text. 

- You can convert dates from one format to another using the as.POSIXct() and as.Date() functions. 

- You can perform operations on Data objects using functions, like Sys.Date(), Sys.Time(), date(), and as.Date(). 

- You can intercept errors in R code and provide custom error and warning handling using tryCatch() statements.

# Accessing the built-in dataset in R

In [107]:
data()

Package,Item,Title
<chr>,<chr>,<chr>
datasets,AirPassengers,Monthly Airline Passenger Numbers 1949-1960
datasets,BJsales,Sales Data with Leading Indicator
datasets,BJsales.lead (BJsales),Sales Data with Leading Indicator
datasets,BOD,Biochemical Oxygen Demand
datasets,CO2,Carbon Dioxide Uptake in Grass Plants
datasets,ChickWeight,Weight versus age of chicks on different diets
datasets,DNase,Elisa assay of DNase
datasets,EuStockMarkets,"Daily Closing Prices of Major European Stock Indices, 1991-1998"
datasets,Formaldehyde,Determination of Formaldehyde
datasets,HairEyeColor,Hair and Eye Color of Statistics Students


In [111]:
df = CO2 #CO2 is name of a dataset in `datasets` package, you can import directly in working space

In [123]:
# Access row and column
df[1, c("Plant", "Type")]

Unnamed: 0_level_0,Plant,Type
Unnamed: 0_level_1,<ord>,<fct>
1,Qn1,Quebec


In [135]:
df[,1:3]K

Unnamed: 0_level_0,Plant,Type,Treatment
Unnamed: 0_level_1,<ord>,<fct>,<fct>
1,Qn1,Quebec,nonchilled
2,Qn1,Quebec,nonchilled
3,Qn1,Quebec,nonchilled
4,Qn1,Quebec,nonchilled
5,Qn1,Quebec,nonchilled
6,Qn1,Quebec,nonchilled
7,Qn1,Quebec,nonchilled
8,Qn2,Quebec,nonchilled
9,Qn2,Quebec,nonchilled
10,Qn2,Quebec,nonchilled
