# Introduction to R Programming

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques.

In addition to its extensive analytical packages, R’s plotting capabilities far exceed that of almost any plotting library Python can offer. ggplot2 (R's graphical package) charts can be found in many (if not most) publications and papers you may read.

R programming is very different from Python because it is an analytical language, not one designed for software engineering. Thus, many of the constructs were built with analysis in mind.

In this lecture, we will go over basic syntax for R, the corolaries between Python and R, and some of the key differences between the two languages.

### Basics

#### Data Types and Variable Assignment

R does several things differently for the same result in Python. Let's go through the basics. First, variables are assigned via "arrows" instead of equal signs. The equal sign *can* work, but there are situations where it will not, so defaulting to arrows should be your standard practice.

##### Numerics
Numeric values in R are all the same. Unlike in Python, R does not distinguish between floats and integers.

In [None]:
#Assign numerics
a <- 1
b <- 1.1

In [None]:
print(class(a))
print(class(b))

##### Characters
Character values in R are like strings in Python.

In [None]:
#Assign characters
c <- 'My name is Jason'
d <- "Quote types don't matter"

In [None]:
print(c)
print(d)

In [None]:
print(class(c))
print(class(d))

#### Vectors
Vectors in R are like lists in Python

In [None]:
e <- c(1,2,4,7,9,1)
length(e)
print(e)
class(e)

Accessing an element of a vector much like in Python lists. R, however, poses two distinct differences. First, it indexes starting with 1 instead of 0. Second, it is *inclusive* of the ending range.

In [None]:
#First element
e[1]

In [None]:
#Fifth element
e[5]

In [None]:
#Trying to access beyond what is available
e[7]

In [None]:
#Accessing first through third
e[1:3]

In [None]:
#Accessing third through end
e[3:] #Nope!

In [None]:
e[3:length(e)]

Note that the class for the vector is not of vector-type, but rather the type of data contained IN the vector. Let's see what happens when we try other combinations of values.

In [None]:
f <- c('a', 'b', 'hello')
length(f)
print(f)
class(f)

Can we mix types like we can in Python?

In [None]:
g <- c('a', 1, FALSE)
length(g)
print(g)
class(g)

Uh oh! Much like Pandas and Numpy will sometimes do to get all of the data into the same data types, R is picking the "lowest common denominator" and auto-converting things for you. You should be very careful when creating arrays to ensure that you are not mixing data types.

#### Lists
Ironically, lists in R are a hybrid of lists and dictionaries in Python. They are not restricted to holding all of the same data types as vectors are. The indices are defaulted to positions, but those positions can also be named.

In [None]:
h <- list('a', 1, c(1,2,3), FALSE)
length(h)
print(h)
class(h)

In [None]:
#Access first element of the list (unlike Python dictionaries, they are ordered!)
h[1]

In [None]:
#Access first and third elements of the list
print(h[c(1,3)])

Now let's give these list indices some names

In [None]:
names(h) <- c('first', 'second', 'third', 'fourth')

In [None]:
print(h)

In [None]:
h['first']

In [None]:
h$first

In [None]:
h[1]

In [None]:
h[2] + 2

In [None]:
h$second + 2

##### Variable Naming
Unlike in Python, variable names in R often contain periods as opposed to underscores. For example:

In [None]:
my_number <- 7

In [None]:
my_number.plus7 <- my_number + 7
my_number.plus7

In [None]:
my_number.plus7.another7 <- my_number.plus7 + 7
my_number.plus7.another7

While this would never work in Python, this is common practice for people coding in R

### Operators

Many of the operators in R and Python are the same while others are different.

In [None]:
#addition
9 + 4

In [None]:
#multiplication
9 * 4

In [None]:
#division
9 / 4

In [None]:
#exponentiation
9 ^ 4
9 ** 4

In [None]:
#integer division
9 %/% 4

In [None]:
#modulo
9 %% 4

### Conditionals
Conditional statements are nearly identical to Python. Note that the values for booleans in R are in upper case rather than title case.

In [None]:
3 == 3.0

In [None]:
3 < 3

In [None]:
3 <= 4

In [None]:
3 != 4

In [None]:
(4+3) >= (14/2)

In [None]:
1 %in% c(1,4,5)

In [None]:
is.element(1, c(2,3,4))

In [None]:
match(1, c(1,4,5)) #returns index of first match; else NA

In [None]:
match(1, c(2,3,4))

### Control Flow in R

Conditional statements in R are quite similar to Python. However, the big difference is that indentation is no longer the determining factor in identifying code blocks. R uses curly braces to segment off sections of code. Let's try it below.

In [None]:
i <- 7

if (i %% 2 == 1) {
    return('odd')
} else {
    return('even')
}

In [None]:
i <- 7.1

if (i %% 2 == 1) {
    return('odd')
} else if (i %% 2 == 0) {
    return('even')
} else {
    return('not a whole number')
}

### For-Loops
The structure of for-loops in R is very similar to if statements.

In [None]:
for (i in c(1:10)) {
    print(i)
}

In [None]:
for (i in c(1:10)) { #NOTE: c(1:10) is equivalent to range(1,11) in Python
    if (i %% 2 == 0) {
        print(c(i, 'even'))
    } else {
        print(c(i, 'odd'))
    }
}

### Functions
Functions act exactly as they do in Python. They allow you to store procedures that are accessible by passing arguments (if any). Let's write a simple one here.

In [None]:
determine_if_even <- function(x) {
    if(x %% 2 == 0) {
        return(TRUE)
    } else {
        return(FALSE)
    }
}

In [None]:
j <- 7
determine_if_even(j)

In [None]:
for (i in c(1:10)) {
    print(c(i, determine_if_even(i)))
}

### DataFrames

Pandas DataFrames were derived from R DataFrames. Some of their functionality is similar but the syntax is very different. Let's go over a few of the basic commands.

In [None]:
df <- read.csv('students.csv')

In [None]:
head(df, 5)

In [None]:
tail(df, 10)

In [None]:
df[1,]

In [None]:
df[c(1,3,4),c('student_id', 'first', 'last')]

In [None]:
df[c(1,3,4), c(1,2,3)]

In [None]:
student.genders <- df$student_id

In [None]:
male_df <- subset(df, gender=='Male')
head(male_df)

In [None]:
female_econ_df <- subset(df, gender=='Female' & major=='Economics')
head(female_econ_df)

In [None]:
male_engineering_honors <- subset(df, gender=='Male' & major=='Engineering' & gpa >= 3.7)
head(male_engineering_honors)
length(male_engineering_honors)

In [None]:
mean(female_econ_df$gpa)
sd(female_econ_df$gpa)

In [None]:
female_econ_df$mean_gpa <- mean(female_econ_df$gpa)
head(female_econ_df)

# In-Class Exercise

In [None]:
data(ChickWeight)

In [None]:
head(ChickWeight)

In [None]:
subset(ChickWeight, Chick==1)

In [None]:
aggregate(x=ChickWeight$weight, by=list(ChickWeight$Time), FUN=mean)