# Exam Reference Page

## Basic Functions
- `sqrt( NAME )` : to take the square root of the value stored in NAME
- `abs( NAME )` : to take the absolute value of the value stored in NAME
- `NUM1 %% NUM2` : the remainder when we divide NUM1 by NUM2
- `exp( NUM )`: $e^{\texttt{NUM}}$ the number $e$ raised to the power NUM
- `log( NUM )` : The log base $e$ of NUM

## Functions on Vectors
- `c( VALUE1, VALUE2, ... )` : to put values into one vector
- `NUM1:NUM2` : to make a vector of whole numbers starting from NUM1 to NUM2
- `length( NAME )` : to find the number of entries in the vector NAME
- `max( NAME )` : to find the largest value in the vector NAME
- `min( NAME )` : to find the smallest value in the vector NAME
- `sum( NAME )` : to find the sum of values in the vector NAME
- `mean( NAME )` : to find the average of values in the vector NAME

## Loading Data Sets
- `read.csv( 'FILENAME.csv')` : to "read" a csv file and import it as a data frame in R
- `read.csv( 'http://webaddress.etc' )` : to "read" a csv file from a URL (web address) and import it as a data frame in R
- `data.frame( COLUMNNAME1 = VECTOR1, COLUMNNAME2 = VECTOR2, ...)` : to create a new data frame with
columns COLUMNNAME1 , COLUMNNAME2 (and possibly more or fewer columns), where the entries in
COLUMNNAME1 are specified in VECTOR1 , the entries is COLUMNNAME2 are specified in VECTOR2 , etc.

## Functions on Data Frames
- `dim( DATAFRAMENAME )` : to find the number of rows and columns of the data frame DATAFRAMENAME
- `names( DATAFRAMENAME )` : to find the column names of the data frame DATAFRAMENAME
- `str( DATAFRAMENAME)` : for an overview of the structure of the data frame
- `head( DATAFRAMENAME )` : to preview the first six rows of the data frame DATAFRAMENAME
- `head( DATAFRAMENAME, NUMBER)` : to preview the first NUMBER rows of the data frame DATAFRAMENAME

## `dplyr` Functions
- `mutate( DATAFRAMENAME, NEWCOLNAME = FORMULA )` : to create a new column called COLUMNNAME based on given instructions/formula
- `arrange( DATAFRAMENAME, COLUMNNAME )` : to sort rows by values in the column called COLUMNNAME in ascending order
- `arrange( DATAFRAMENAME , desc( COLUMNNAME ) )` : to sort rows by values in the column called COLUMNNAME in descending order
- `filter( DATAFRAMENAME, CRITERIA)` : to produce a new data frame that contains only: rows in the data
frame DATAFRAMENAME that satisfy the criteria specified in CRITERIA .
- `group_by( DATAFRAMENAME, COLUMNNAME )` : to group rows of DATAFRAMENAME by their values in the
column COLUMNNAME
summarize( GROUPEDDATAFRAMENAME, NEWCOLUMN = FORMULA ) : to compute a summary quantity from
grouped data GROUPEDDATAFRAMENAME (usuall the output of group_by() ), where the summary quantity is
stored in a new column called NEWCOLUMN (this is up to you).


## `ggplot2` Functions
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME1, y = COLUMNNAME2 ) ) + geom_point()` : to
create a scatterplot with data in DATAFRAMENAME, with COLUMNNAME1 on the x-axis and COLUMNNAME2 on
the y-axis
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME ) ) + geom_bar()` : to create a bar chart with data in
DATAFRAMENAME  with COLUMNNAME on the x-axis and the number of observations on the y-axis
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME) ) + geom_histogram()` : to create a historam of
the data in column COLUMNNAME in the data frame DATAFRAMENAME; the default number of bins is 30
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME) ) + geom_histogram( bins = NUMBER )` : to
create a historam of the data in column COLUMNNAME in the data frame DATAFRAMENAME, with the number
of bins equal to NUMBER
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME) ) + geom_histogram( breaks = LIST )` : to
create a historam of the data in column COLUMNNAME in the data frame DATAFRAMENAME, where the bins
are specified by LIST
- The aesthetic mapping for a particular geometry can be set by including `mapping = aes( ... )` inside the geom function.

### Color in plots
- `color = COLUMNNAME` inside `aes()` to make the color of points or border of rectangles depend on a
variable.
- `color = 'COLORNAME'` inside of `geom_PLOTTYPE()` to set the color of points or border of rectangles to
the named color.
- Substitute fill for color to change the color of the inside of rectangles.

## Boolean Expressions
- `BOOLEAN1 & BOOLEAN2` "and" evaluates to TRUE when both BOOLEAN1 and BOOLEAN2 are TRUE.
- `BOOLEAN1 | BOOLEAN2` "or" evaluates to FALSE when both BOOLEAN1 and BOOLEAN2 are FALSE.
- `!` "not" 

## Conditionals
```
if( CONDITION ){
 ( ... task to be completed if CONDITION is TRUE ... )
} else {
 ( ... task to be completed if CONDITION is FALSE ... )
}
```
- CONDITION is a boolean expression; it evaluates to either TRUE or FALSE

## Creating Functions
```
NEWFUNCTIONNAME <- function( INPUT1, INPUT2, ... ){
    CODE
    }
```
- NEWFUNCTIONNAME is the name of the new function
- INPUT1, INPUT2, etc are the inputs of the function
- CODE is the R code that is executed each time the function is called

- `FUNCTIONNAME <- Vectorize(FUNCTIONNAME)` makes the function FUNCTIONNAME evaluate on each value in a vector individually. 

## Models
- `knn( TRAININGDATAFRAME, TESTDATAFRAME, CLASSOFTRAININGDATA, k = NUMBER )`: to predict the class of each row of TESTDATAFRAME using the k-Nearest Neighbor classifier with k = NUMBER.
- `ggplot( DATAFRAMENAME, aes( x = COLUMNNAME1, y = COLUMNNAME2) ) + geom_smooth( method = 'lm', formula = 'y~x' )` plots the best fit linear model for values in COLUMNNAME2 as a function of those in COLUMNNAME1
- `lm( FORMULA , DATAFRAMENAME)` outputs the best fit model for the given formula and data
- Use the formula `y ~ x` for a linear model, `y ~ x + I(x^2)` for a quadratic model and `log(y) ~ x` for an exponential model
- `MODELNAME$coefficients[[ N ]]` to extract the Nth coefficient of the model MODELNAME
- `MODELNAME$fitted.values` the outputs predicted by the model MODELNAME on the original set of inputs
- `MODELNAME$residuals` the residuals for the model MODELNAME