# Introduction to R

In this introductory tutorial, we will explore some R basics. Topics include basic math, Booleans and logical operations, vectors, matrices, factors, data frames, lists, and stats examples. Getting help is simple, use the help command. For example to get help on the objects command:

In [None]:
help(sum)

Examples for a command can be run by using the example command:

In [None]:
example(max)

## Basic math

Floating point math operations work the way you would expect, following C/C++ conventions and operation precedence. As a convenience, two equivalent exponentiation operators (\*\* and ^) are provided

In [None]:
2.0^3 + 2.0^4 + (3.0*7.0 +1.0)/2.0

Most integer operations work as expected (addition, subtraction, multiplication, exponentiation)

In [None]:
1 + 2*3 + 3^2 - 7

Just beware that integer division is automatically coerced to a floating point operation, unlike the C/C++ and other languages where integer divisions are truncated.

In [None]:
7/2

R provides a modulo operator, but it's two percent signs instead of the usual one percent sign used in most other languages

In [None]:
8%%3

## Booleans and logical operations

R's Boolean values are TRUE and FALSE (all caps). These can also be accessed as T and F, but the important difference is that TRUE/FALSE are logical constants while T/F are global variables that can be overwritten. Conjunction, disjunction and negation are done using &&, || and !

In [None]:
TRUE && FALSE

In [None]:
TRUE || FALSE

In [None]:
!TRUE

In [None]:
3 > 5

In [None]:
4 < 7

## Assignments

It takes a little getting used to, but R provides three different assignent operators: **=**, **->** and **<-**

**<-** is the most general and can be used anywhere, while **=** has some restrictions. You'll generally see **<-** most often, with **=** used to assign values for optional function arguments.

R also has an assign function that allows you to do an assignment within an environment, but this is a more advanced feature.

In [None]:
x <- 5
x

In [None]:
6 -> x
x

In [None]:
x = 7
x

In [None]:
assign('x', 8)
x

## Comments

Comments extend from # to the end of the line. Unfortunately, R does not have a standard way to do multiline comments although there is some functionality for this in RStudio. A workaround in R is to use the following construct (we'll get to if statements later)

```R
if (FALSE) {
    ...
}
```

In [None]:
# Comment on its own line
7+2 # Comment at end of line

## Vectors and Vector arithmetic

Vectors are the simplest R data structures. Examples of vector assignment are shown below using the c() function (think c = combine or concatenate)

In [None]:
x1 <- c(1,5,4)

In [None]:
x1

In [None]:
assign("x2",c(2,1,-1,4,5))

In [None]:
x2

In [None]:
c(4,4,1,2) -> x3

In [None]:
x3

### Vector elements have a consistent type

All elements of a vector have the same type. If a mix of types are provided, the elements will be converted to the most general type.

+ characters are more general than floating point numbers
+ floating point numbers are more general than integers

In [None]:
x4 <- c('a', 1, 1.2)
typeof(x4)

In [None]:
x4 <- c(1, 2, 3.4)
typeof(x4)

### Indexing starts at 1

Indexing for R objects (vectors, lists, etc.) starts with 1, just like Fortran and Matlab. Keep this in mind if you work back and forth between R and Python

In [None]:
x4 <- c('a', 'b', 'c')
x4[1]

### Generating sequences with seq

The R seq function is used to generate sequences of values. With a single integer value *n*, get integers 1 through *n*, but can specify arbitrary start, end and stride.

In [None]:
seq(10)

In [None]:
seq(-1, 5, by=.5) -> seq1

In [None]:
seq1

### Operations on vectors are done element by element

In [None]:
y1 = 4*x1 + x1^2
print(x1)
print(y1)

What happens if we do operations with vectors of varying length?

In [None]:
y2=x1+2*x3
print(x1)
print(x3)
print(y2)

As you can see the shorter vector is recycled (as often as needed).

When working in a notebook, you get a warning message if the vectors have different lengths. When running as a script, you might easily miss this warning. When developing applications, may want to test vector lengths.

### Other built-in R functions

Other common operations include log, exp, sin, cos, tan, sqrt. Basic statistical functions include mean(x) and var(x)

In [None]:
max(y2)

In [None]:
min(y2)

In [None]:
mean(y2)

In [None]:
var(y2)

In [None]:
sum((y2-mean(y2))^2)/(length(y2)-1)

Character vectors are also an option, elements are enclosed in single or (preferably) double quotes and comma delimited. For example:

In [None]:
cvector <- c('name1','name2','name3')

In [None]:
cvector

Index vectors can be used to select and modify subsets of a dataset. There are four types:
(a) Logical Vectors

In [None]:
seq(-1, 5, by=.5) -> seq2

In [None]:
seq2

In [None]:
y <- seq2[seq2 > 0]

In [None]:
y

(b) Select a subset of positive integral labels

In [None]:
seq2[2:5]

(c) Exclude a subset, negative integral labels

In [None]:
seq2[-(2:5)]

(d) If an object has names attribute, subvectors of the names vector can be used

In [None]:
week <- c(1,2,3,4,5,6,7)

In [None]:
names(week) <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")

In [None]:
weekend <- week[c("Saturday","Sunday")]

In [None]:
weekend

## Listing objects

The R objects funcion lists all the objects that we had defined

In [None]:
objects()

We can remove an object using the rm function

In [None]:
a <- 1.2
objects()

In [None]:
rm(a)
objects()

## Arrays and Matrices

Arrays can have multiple subscripts specified by the dimension vector (of positive integers). Matrices are a special case - i.e. a 2 dimensional array. The ordering is column major (like in Fortran).

In [None]:
h <- seq(1,12)

In [None]:
Matrix1 <- array(h, dim=c(4,3))

In [None]:
Matrix1

Just like index vectors, we have array indexing for extracting subsets of arrays and index matrices for extraction and assignment operations on a subset of the data. 

In [None]:
indexm <- array(c(2,4,1,2,1,3),dim=c(3,2))

In [None]:
indexm

In [None]:
Matrix1[indexm]

In [None]:
Matrix1[indexm] <- -1

In [None]:
Matrix1

Several standard matrix operations are part of R. These include transpose, multiplication, inversion (linear equation solution), eigen values and vectors, determinants, singular value decomposition, least squares fit, QR decomposition.

In [None]:
svd(Matrix1)

In [None]:
AMAT <- array(1:12,dim=c(4,3))

In [None]:
AMAT

In [None]:
BMAT <- array(7:-4,dim=c(3,4))

In [None]:
BMAT

Note that the matrix multiplication operator is "%\*%". Using the "\*" operator will multiple the arrays element by element.

In [None]:
CMAT=AMAT %*% BMAT

In [None]:
CMAT

In [None]:
determinant(CMAT)

## Factors

Factors are the data objects that can be used to categorize the data and store it as levels. Factors in R are stored as a vector of integer values with a corresponding set of character values (which are used when the factor is displayed). R has both ordered and unordered factors.

In [None]:
months = c(4,11,2,3,3,4,5,1,2,6,9,8,8,6,7,10,12)

In [None]:
fmons <- factor(months)

In [None]:
fmons

In [None]:
levels(fmons)

In [None]:
levels(fmons)=c('January','February','March','April','May','June','July','August','September','October','November','December')

In [None]:
fmons

In [None]:
tickets=c(4,5,1,0,7,4,27,2,55,2,11,3,8,10,9,22,3)

In [None]:
ticketav=tapply(tickets,fmons,mean)

In [None]:
ticketav

## Branching

Branching is done using "if, else if, else" constructs.

In [None]:
# A simple if statment
n = 3
if (n < 5) {
    print("hello world")
}

In [None]:
# If, else
n = 3
if (n > 5) {
    print("expression is TRUE")
} else {
    print("expression is FALSE")
}

In [None]:
# If, else if, else
n = 3
if (n > 5) {
    print("n is greater than 5")
} else if (n == 5) {
    print("n is equal to 5")
} else {
    print("n is less than 5")
}

## Loops

Basic looping in R is done using a for loop. The syntax is

```R
for (x in iterable) {  
   loop body  
}
```

In [None]:
for (i in seq(1:5)) {
    print(i)
}

Exiting a loop early and jumping to the next iteration is done using the break and next reserved words, respectively.

In [None]:
for (i in seq(1:5)) {
    if (i == 3) break
    print(i)
}

In [None]:
for (i in seq(1:7)) {
    if (i %%3 == 0) next
    print(i)
}

## Lists and Data Frames

An R list is an object that comprises of an ordered collection of objects (components). The components don't have to be of the same kind and are numbered. Components can also be named in which case the component can be referred either by giving the component name or the number. 

In [None]:
FamilyList <- list(name="John", wife="Jane", numberofkids=4, kids.ages=c(5,6,11,18), numcars=3, carmodels=c('Volvo 230','Ford Ranger','Ford Fiesta'))

In [None]:
FamilyList["carmodels"]

In [None]:
length(FamilyList)

In [None]:
AveKidAge=(FamilyList[[4]][1]+FamilyList[[4]][2]+FamilyList[[4]][3]+FamilyList[[4]][4])/4

In [None]:
AveKidAge

A Data Frame is a list of vectors of equal length. The restrictions are : column names must be non-empty, row names should be unique, each column must have the same number of data items, components can be vector (numeric, character, or logical), factors, numeric matrices, lists, or other data frames. For practical purposes a data frame can be considered as a matrix with columns possibly of differing modes and attributes. Here is an example of a built-in data frame:

In [None]:
mtcars

There are a lot of options for slicing-and-dicing the information in a dataframe. Some examples:

In [None]:
head(mtcars)

In [None]:
mtcars[['disp']]

In [None]:
mtcars[c(2,15),]

In [None]:
mtcars['Hornet Sportabout',]

Lot more information and hands on material on data frames upcoming in tomorrow's talk on R. Btw, we used a built in dataset above. To get the full list you can run:

In [None]:
data()

## Statistical Models

R has a wide array of options to make fitting statistical models easy. This includes functions for extracting model information, analysis of variance and model comparison, generalized linear models, and nonlinear least squares, maximum likelihood models. The class of generalized linear models includes gaussian, binomial, poisson, inverse gaussian, and gamma response distributions. Quasi-likelihood models are an option where the response distribution is not specified. A simple glm example is provided below but more details upcoming in other talks.

In [None]:
mlfit <- glm( mpg ~ cyl + disp + hp, data=mtcars, family=quasi)

In [None]:
summary (mlfit)

In [None]:
help(rm)