# <center> Introduction to R </center>

___

## What is R?

* R is derived from '**S**' language (originally developed by John Chambers and colleagues at AT&T Bell Labs in the late 1970s and early 1980s) and developed by **R**os Ihaka and **R**obert Gentleman in 1992.
* It is an open-source software environment developed for statistical computing, data analytics and scientific research.

## Why R?

* It is free and comes with an easily installable set of packages to help with areas relating from data access, cleaning to analysis and reporting. 
* It also has one of the best Integrated development environment (IDE).

## What is R Studio?

* `R Studio` is an IDE for R that has several advantages over other the default R interfaces.



___

# <center> R Objects </center>

Every programming language provides some data structures to hold data. Data stored in memory is called by a variable/object name.

- Object names must start with a letter, and can only contain letters, numbers, _ and .
- Object names are case senitive.

### Assignment Operator 

Unlike other languages in which '=' is the assignment operator, R users prefer "<-" as the assignment operator. 

This works well for compatibilty with the initial version of S-Plus and some special scenarios.

In [56]:
x <- 5
x 

Every object is an instance of a class. Class defines some basic properties of the object. We can check the class of an object using `class()` function.

In [57]:
class(x)

It is important to note that there is `no concept of scalar` in R. In the above code `x` is a numeric `vector`.

## Vectors

A vector can only contain objects of the **same class**. <i>When objects of different classes are concatenated together, then the objects are coerced to same class. (More on this later)</i>

Vectors can be created in two ways 

i. `vector()` constructor

This creates a empty vector object

In [60]:
vector(mode='integer', length=6)

vector(mode='logical', length=6)

ii. `c()` concatenate function

In [63]:
c(1, 2, 36)

We know `Vector` is the most basic object in R which holds values of same class types. But what are the different class types?

### Classes

The most basic class types are :

1. Character or String
2. Numeric (real numbers)
3. Integer
4. Logical
5. Complex _(you will rarely use this)_

Let's look into each one and check its class by calling the pre-defined `class` function in R.

i. Character

In [5]:
c <- c('alpha', 'beta', 'gamma')
c
class(c)

ii. Numeric (real numbers)

In [6]:
n <- c(0.5,0.6)
n
class(n)

There is another way of creating integer vectors in a sequence by using `:` operator.

In [7]:
10:50

#### `seq()` function

While this will create a sequence in series. There is also a `seq()` function which can be used to create custom sequence with gaps.

In [68]:
seq(10, 50, 5)

![](img/num_vectors.png)

iii. Integer

Creating Integer vectors requires us to add `L` suffix to the integer. Otherwise, R reads it as numeric.


In [69]:
i <- c(1L, 2L, 3L)
i
class(i)

iv. Logical (True/False)

In [10]:
l <- c(TRUE, FALSE) 
# or
l <- c(T, F)
l
class(l)

### Type Casting

Now lets address the question, what happens if we create vector with different types.

Let's see a few examples.

In [11]:
print(c(1, T))
print(c(1, F))

x <- c(1, F)
class(x)

[1] 1 1
[1] 1 0


Here, both numeric and logical got converted to numeric.

In [72]:
x <- c(1L, F)
class(x)

Now, logical got converted to integer.

##### What do you think will happen, if we concatenate numeric and integer?


In [13]:
x <- c(1L, 1.1)
class(x)


Again, both integer and numeric got converted to numeric. 

##### What if we concatenate numeric with character?

In [75]:
x <- c('x', 1.1)
class(x)

Oops, character it is. There seems to be a hierarchy. 

##### The coersion rule is  `character <- complex <- numeric <- integer <- logical`. 

If you think about it, this rule makes complete sense. 

**What we saw here is called `implicit coercion`, where R picks a sensible class for the vector and makes the coercion. 

Sometimes, you might want to coerce a vector to a specific class. This is called `explicit coercion` and R provides `as.*` commands for it where `*` is replaced by the class to which you want to coerce.**

![](img/r_types.png)

Let's do an example


In [15]:
x <- as.integer(1:5)
x
class(x)

When you attempt explicit coercion, you are taking the responsibility from R to do appropriate type conversions.

Let's see an example.


In [16]:
x <- as.integer(c(1L, 1.1))
x
class(x)


As we can see, R would have coerced the integer 1L to numeric but we asked r to coerce 1.1 to integer. R did just that and we lost the decimal part of the value. 

If this is what you want, which might be the case sometimes then please go ahead and make the change but be careful. You can always keep a backup and create a new variable when attempting coercion to mitigate any risk.


The most basic object type or data structure available in R is a Vector. There are various `special cases of vectors` as well :

1. Factors
2. Lists
3. Matrices
4. Dataframes

## 1. Factors

Factors are another type of vectors which hold `only categorical data` (eg. Gender, Qualification).

![](img/r_factors.png)

In [17]:
factor(c('Male', 'Female'))

This might seem ordinary. But, this is very important object type and will be used extensively in the data modelling process. 

Let us see an example with more values in the factor variable.


In [18]:
f <- factor(c('Male', 'Female', 'Female', 'Female', 'Female', 'Male'))
f

**Levels** denotes all unique values in alphabetical order.

In [19]:
table(f)

f
Female   Male 
     4      2 

Internally, R holds factors as integer values. This property is very useful when dealing with datasets. We can view the underlying integer values using `unclass()` function.

In [20]:
unclass(f)

We can view the attribute value seperately using `levels()` function. 

In [21]:
levels(f) 

We can also alter the levels by creating a character vector and assigning it to the levels attribute.

In [83]:
levels(f) <- c('Male', 'Female')
f

## 2. Lists

Lists are special kind of vector that can contain objects of `different classes and different length`.

We can create lists using `list()` constructor. Unlike vectors, list elements are denoted by [[ ]] double bracket.

![](img/r_lists.png)

In [105]:
l <- list(a = 1:3, b = T, c = 'a')
l

In [24]:
class(l[[1]])

In [25]:
class(l[1])

## 3. Matrices

Matrices are another type of vector with `same class and same length`. 

It also has a dimension attribute of length 2, defining (# of rows, # of columns). We can create matrices by using `matrix()` constructor. Like vectors, matrices require every element to be of the same class. 

![](img/r_matrices.png)


In [109]:
m = matrix(data = 1:6, nrow = 2, ncol = 3)
m


0,1,2
1,3,5
2,4,6


In [27]:
dim(m)

Matrices are constructed column-wise (top-down, left-right). We can alter this property by using the function argument `byrow`.

In [28]:
m = matrix(data = 1:6, nrow = 2, ncol = 3, byrow = T)
m


0,1,2
1,2,3
4,5,6


It is possible to join two vectors of similar class and length together using `rbind` for joining by row and `cbind` for joining by column.

In [113]:
x <- 1:3 ; y <- 7:9

rbind(x, y)
cbind(x, y)

0,1,2,3
x,1,2,3
y,7,8,9


x,y
1,7
2,8
3,9


## 4. Data Frames

Data frames are also a special case of vector with `same length and different class type`. It is used to store tabular data. 

**Data Frames can be seen as collection of lists elements of same length.**

![](img/r_dataframes.png)


In [30]:
df <- data.frame(Id = 5:8, Purchase = c(T, T, F, F))
df

Id,Purchase
<int>,<lgl>
5,True
6,True
7,False
8,False


Like matrices, we can use `dim()` function for data frames also. We can use `nrow` for returning number of rows and `ncol` for returning number of columns.

In [31]:
dim(df)

Apart from dimension, we can also get the row and column names using `rownames()` and `colnames()` function.

In [32]:
rownames(df) 

colnames(df)


___


## Slice / Subset

Once we have data in objects, it is also important to learn how to extract specific information from the object. This is called subsetting or slicing as we are trying to extract a slice of information from the entire data.

![](img/r_slice.png)

In [134]:
v <- 1:5
l <- list(a = 1:3, b = T, c = 'x')
m <- matrix(1:8, nrow = 4, dimnames = list(c('x', 'y', 'z', 'u'), c('a', 'b')))
df <- as.data.frame(m)

In [135]:
v

In [136]:
l

In [137]:
m

Unnamed: 0,a,b
x,1,5
y,2,6
z,3,7
u,4,8


In [138]:
df

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,<int>,<int>
x,1,5
y,2,6
z,3,7
u,4,8


There are 3 ways of subsetting or slicing objects.

#### i. by index : `start:end` or `index` position.

In [139]:
v[ 4 ]

In [140]:
v[ 4:5 ]

In [141]:
l[ 1 ] # Indexing with [] returns a list object.

In [142]:
l[[1]] # Indexing with [[]] returns a vector object.

In [143]:
l[ 1:2 ]

In [147]:
m[2, ]
df[2, ]

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,<int>,<int>
y,2,6


In [148]:
m[2, 1:2]
df[2, 1:2]

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,<int>,<int>
y,2,6


In [149]:
m[2, 2]
df[2, 2]

In [152]:
df[, 2]

#### ii. by value : `row_name` or `column_name` matrices and data frames.

In [160]:
m

Unnamed: 0,a,b
x,1,5
y,2,6
z,3,7
u,4,8


In [165]:
m['z', 'b']

In [154]:
df['a']

Unnamed: 0_level_0,a
Unnamed: 0_level_1,<int>
x,1
y,2
z,3
u,4


In [155]:
df$a

In [156]:
df[['a']]

In [157]:
df[,'a']

In [47]:
df[1:2, 'a']

#### iii. by logical condition : returns all values that satisfy the condition.

When we apply logical operator with a vector, R will check each element in a vector and return a logical oputput.


In [166]:
v

In [48]:
v > 3

This logical output can be passed to the vector suc that only elements which have TRUE will be returned.

In [49]:
v[ v > 3 ]

In [172]:
m[ m[,'a'] > 2, ]

Unnamed: 0,a,b
z,3,7
u,4,8


In [51]:
df[ df$a > 2, ]

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,<int>,<int>
z,3,7
u,4,8


### INTERESTING TIP

Usually, you will have a lot of columns in your dataframe. It is easy if you have to pick out 2 or 3 columns out of the dataframe using the name, but let's say you want all columns except 2 or 3 columns. You can use not ( `!` ) operator along with the `%in%`. 

`%in%` is the value matching operator in R which can be used to compare two lists. It returns values a boolean vector which defines `whether the values in RHS are present in LHS.


In [183]:
df[ , !colnames(df) %in% 'a']

By the way, can you tell the difference between both these statements.

In [54]:
1:3 %in% c(2, 4)

1:5 %in% 3:4


If you know the index of the column/s you want to exclude then use the `-` sign.

In [179]:
df[-1]

Unnamed: 0_level_0,b
Unnamed: 0_level_1,<int>
x,5
y,6
z,7
u,8


With this, we have covered a very important milestone in learning R.

___
