# Introduction to R

Puneet Sharma [puneet.988@gmail.com](puneet.988@gmail.com)
 
All the lectures of R using [Jupyter notebook](https://www.jupyter.org) are available at [https://github.com/puneet988/R-tutorials](https://github.com/puneet988/R-tutorials)

Please execute the cell containing the code using Shift+Enter to see the result.


## R data types

Just like python and MATLAB, R is a dynamic language developed largely for statistical computing.

There is no need to define the ```type``` of variable. Data type is automatically assigned.

In [None]:
a <- 4.2; b <- 'Hello!'

In [None]:
print(a); print(b)

To print the type of variable, use typeof

In [None]:
print(typeof(a)); print(typeof(b))

Basic datatypes are

* String/Character
* Number
  - Double
  - Complex
* Boolean/Logical

A number whether integer or float is always represented as double.

In [None]:
a <- 20; typeof(a)

For explicit requirement of integer, add suffix L

In [None]:
b <- 20L; typeof(b)

## Handling undefined values

Undefined values are basically represented using
* NULL
* NA
* NaN

All of three work differently

NULL which is a null object is used when there is no value present. If there is some value present in the vector or matrix and the value is not usable (fill_value), we use NA or NaN.

NA or NaN are missing value indicator.

In [None]:
print(class(NULL)); print(class(NA)); print(class(NaN))

NA comes when there is no TRUE or FALSE i.e. logical indeterminacy. It can also come for missing
value.

NaN means 0/0

Mathematical operations are just like in python

* \* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Multiplication
* / &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Division
* \+ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Addition
* \- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Subtraction
* ^ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Exponent
* %% &nbsp; &nbsp; &nbsp; Modulus
* %/% &nbsp; &nbsp; Integer division

Relational operators are same as in python 

Logical operators are as follows

* !     &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NOT
* &     &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Element wise AND
* &&    &nbsp; &nbsp; &nbsp; &nbsp; AND
* |     &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Element wise OR
* ||    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; OR
* %in% &nbsp; &nbsp; in the set 

## Data structures

In R there are 6 types of data structures

* Vectors
* Lists
* Matrices
* Arrays
* Factors
* Data Frames

### Vectors

To create a vector, we use ```c()``` function. It basically concatenates things together like a list in python.

In [None]:
x <- c(1, 2, 3, 4.3, 'hello', TRUE, FALSE); print(x)

As we can see, a vector can have any data type value be it number, character or boolean. But we notice something. All the elements in the vector are coerced to character type because the vector contains string ```"hello"```. This is the effect of implicit coercion. 

For strictly making a numeric vector, use ```vector()``` function

In [None]:
x <- vector("numeric", length=20); print(x)

We can use such a vector to preallocate a vector which can be used for appending values from a for loop which is faster than appending values to an empty vector since every time a value is appended in an empty vector, R makes a copy of it thus slowing the whole process.

**Coercion** - Objects like vectors, data frames etc. can be coerced to different classses using as.class function

In [None]:
x <- c(1,2,3,4); class(x)

In [None]:
y <- as.character(x); class(y)

In [None]:
y <- as.logical(x); class(y)

### Matrices

Matrix is same as a vector except it has an additional attribute of dimension. It is a two dimensional data structure.

In [None]:
a <- matrix(c(6,2,6,8,3,2,6,8,0), nrow=3, ncol=3); print(a); attributes(a)

Matrices start filling row wise. Whereas in python, a matrix starts filling columnwise.

In R, we can pass the names of rows and columns.

In [None]:
a <- matrix(c(6,2,6,8,3,2,6,8,0), nrow=3, ncol=3, 
            dimnames = list(c('a','b','c'), c('x','y','z'))); print(a)

In [None]:
print(colnames(a)); print(rownames(a))

To access the elements of a matrix, use square brackets.

In [None]:
print(a)

In [None]:
print(a[2,2]); print(a[c(2,3),c(1,2)]) # select 2 row and 2 column element. 
                                       # select rows 2 & 3 and columns 1 and 2 

In [None]:
print(a[2,]); print(a[,2]) # select 2 row
                           # select 2 column

But ```a[2,] or a[,2]``` gives a vector. To avoid this or to get a matrix, use drop = FALSE.

In [None]:
print(a[2,]); dim(a[2,])

In [None]:
print(a[2,,drop = FALSE]); dim(a[2,,drop = FALSE])

Specific indexing can also be done.

In [None]:
a[c(1,2,4,6)]

You can also do indexing using logical vectors.

In [None]:
print(a[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)])

To transpose a matrix use ```t(a)```

To combine vectors or matrices, use rbind or cbind.

Dimension of matrix can also be changed (reshape)

In [None]:
dim(a) <- c(1,9); print(a)

### Lists

List in R can hold elements of different types. There is no coercion. A list can contain numeric, characters, boolean, matrices, vectors, arrays, lists etc. 

To create list use ```list()``` argument.

In [None]:
list_data <- list('green','yellow',1,2,3,c(4,5,6)); print(list_data)

To give names to each element in the list, use ```names()``` argument.

In [None]:
names(list_data) <- c("A","B","C","D","E","F"); print(list_data)

In [None]:
print(list_data$A); print(list_data$B)  ### Access first and second value of list

In [None]:
print(list_data[1]); print(list_data[[1]]) 
### Access first label + first value and access first value 

To merge two or more lists, use ```c()```

In [None]:
a <- list(1,2,3,4); b <- list(5,6,7,8); c <- c(a,b); print(c)

Some predefined lists in R

In [None]:
print(letters); print(LETTERS); print(month.abb); print(month.name)

Arrays, Factors and Dataframes will be covered in next notebook.