# R Nuts & Bolts

Variable assigning is done with <-

In [1]:
# assigning variables
x<-1
print (x) # explicit printing

[1] 1


x <- 5 x is a vector [1] where 5 is the first element

Initializing a vector with ints from 10 to 30 using the ":" char

In [2]:
x<-10:30
x

Basic or Atomic classes of objects
- Character
- numeric (real numbers)
- integer 
- complex
- logical (True/False)

most basic object in R is a vector. Can be created with vector() function
A vector can only contain objects from one same class

in R all numbers are double precision i.e. 2 = 2.00
to specify an integer 2L


In [3]:
i <- 2L
i

# inf represents infinity
q <- Inf
q 

R atribures
metadata describing the objects content, e.g. column names in data frames describe the data they contain
attributes can be accessed using attributes()

In [4]:
# c() function is used to create vectors
c(1, 2, 3, 4)
c(T, F)

vector can be used to create vectors

In [6]:
x <- vector("numeric", 10L)
x

In some cases different classes of R objects can be mixed, leading to coersion

In [7]:
y <- c(1.7, "a") ## character 
y <- c(TRUE, 2) ## numeric
y <- c("a", TRUE) ## character
y

coersion can be implicit (see above) or explicit, i.e. the numeric and char vector was converted to char because numbers can easily be represented as strings 

#### Explicit coersion
Objects can be explicitly coerced from one class to another using the as.* functions, if available.

#### if coersion isn't possible NA's will be returned

In [8]:
x <- 0:6 
class(x)
# casting (coerion'nig) to numeric
as.numeric(x)
# casting to logical
as.logical(x)
# casting to char
as.character(x)

In [9]:
# if coersion isn't possible NA's will be returned

x <- c("a", "b", "c") 
as.numeric(x)

“NAs introduced by coercion”

### Matrices 
Vectors with a dimension attribute

In [10]:
m <- matrix(nrow = 2, ncol = 3) 
m

0,1,2
,,
,,


In [11]:
dim (m)

In [12]:
attributes(m)

In [13]:
# matrices are constructed row-wyse, top-to-bottom

m <- matrix(1:6, nrow = 2, ncol = 3)
m

0,1,2
1,3,5
2,4,6


In [14]:
# can also get a matrix from a vectors by adding a dimmension attribute

m <- 1:10
m

In [15]:
dim(m) <- c(2,5)
m

0,1,2,3,4
1,3,5,7,9
2,4,6,8,10


Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

In [16]:
x <- 1:5
y <- 1:10
cbind(x, y)

x,y
1,1
2,2
3,3
4,4
5,5
1,6
2,7
3,8
4,9
5,10


In [17]:
rbind(x, y)

0,1,2,3,4,5,6,7,8,9,10
x,1,2,3,4,5,1,2,3,4,5
y,1,2,3,4,5,6,7,8,9,10


## Lists
Lists are a very important data type in R. Lists can be explicitly created using the list() function, which takes an arbitrary number of arguments.


In [19]:
x <- list(1, "a", TRUE, 1 + 4i) 
x

In [21]:
# empty list with specified size
x <- vector("list", length = 5) 
x

### Factors
represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label.

Factor objects can be created with the factor() function.

In [22]:
 x <- factor(c("yes", "yes", "no", "yes", "no")) 
x

In [23]:
 ## See the underlying representation of factor 
 unclass(x) 

In [26]:
attr(x,"levels")

The order of the levels of a factor can be set using the levels argument to factor(). This can be important in linear modelling because the first level is used as the baseline level.

In [27]:
x <- factor(c("yes", "yes", "no", "yes", "no")) 
 x ## Levels are put in alphabetical order 

In [29]:
# oerdring leves in alphabetic order using levels()
x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
x

### Missing Values
Missing values are denoted by NA or NaN for q undefined mathematical operations.
- is.na() is used to test objects if they are NA
- is.nan() is used to test for NaN
- NA values have a class also, so there are integer NA, character NA, etc.
- A NaN value is also NA but the converse is not true


In [1]:
## Create a vector with NAs in it 
x <- c(1, 2, NA, 10, 3)
## Return a logical vector indicating which elements are NA
is.na(x)

In [2]:
# Return a logical vector indicating which elements are NaN 
is.nan(x) 

In [4]:
## Now create a vector with both NA and NaN values 
x <- c(1, 2, NaN, NA, 4) 
is.na(x)

In [5]:
is.nan(x)

### Data Frames 
- used to store tabular data
- package dplyr has an optimized set of functions designed to work efficiently with data frames
- list where every element of the list has to have the same length
- Each element of the list is as a column and the length of each element is the number of rows. 
- data frames can store different classes of objects in each column (unlike matrices -  Matrices must have every element be the same class (e.g. all integers or all numeric)
- data frames have a special attribute called row.names which indicate information about each row
- usually created by reading in a dataset using the read.table() or read.csv(
- can also be created explicitly with the data.frame()
- can also be coerced from other types of objects like lists. 
- Data frames can be converted to a matrix by calling data.matrix()

In [6]:
# data.frame (element 1, element 2, element n)
x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
x

foo,bar
1,True
2,True
3,False
4,False


In [7]:
nrow(x)

In [8]:
ncol(x)

### Names
R objects can have names, which is very useful for writing readable code and self-describing objects. 

In [9]:
# example of assigning names to an integer vector
x <- 1:3
names(x)

NULL

In [11]:
names(x) <- c("New York", "Seattle", "Los Angeles")
x

Lists can also have names, which is often very useful

In [12]:
x <- list("Los Angeles" = 1, Boston = 2, London = 3)
x 

In [13]:
names (x)

Lists can also have names, which is often very useful.

In [14]:
x <- list("Los Angeles" = 1, Boston = 2, London = 3)
x 

In [15]:
names (x)

Matrices can have both column and row names.

In [16]:
 m <- matrix(1:4, nrow = 2, ncol = 2) 
dimnames(m) <- list(c("a", "b"), c("c", "d")) 
m

Unnamed: 0,c,d
a,1,3
b,2,4


Column names and row names can be set separately using the colnames() and rownames() functions.

In [17]:
colnames(m) <- c("h", "f") 
rownames(m) <- c("x", "z") 
m

Unnamed: 0,h,f
x,1,3
z,2,4


- For data frames, there is a separate function for setting the row names, the row.names()
- data frames do not have column names, they just have names (like lists):
    to set the column names of a data frame just use the names()
    
| Object     | Set column names | Set down names |
|------------|------------------|----------------|
| data frame | names()          | row.names()    |
| matrix     | colnames()       | rownames()     |

# Summary
There are a variety of different builtin-data types in R: 
- atomic classes: numeric, logical, character, integer, complex
- vectors, lists
- factors
- missing values
- data frames and matrices

All R objects can have attributes that help to describe what is in the object. 

Perhaps the most useful attribute is names, such as column and row names in a data frame, or simply names in a vector or list. 

Attributes like dimensions are also important as they can modify the behavior of objects, like turning a vector into a matrix