# Preface

[GNU foundation's FAQ on GPL](http://www.gnu.org/licenses/gplfaq) - Handy when it comes to rights to extensions written for R (when you write code that is only interpreted by R, then no rigth bounds apply)

# Part I. R Basics

- installing
- packages (installing, usage)
- quick tutorial
- overview of R features

## Chapter 1. Getting and Installing R

## Chapter 2. The R User Interface

### Command line editing

Lets user type commands into R interactive shell and get computed results immediately

### Batch mode 

Provides way to run large set of commands in sequence and save the results to a file using `CMD` or `RScript` command

```
R CMD BATCH generate_graphs.R
```
or
```
RScript generate_graphs.R
```

#### Executable batch file

Contents (mind the shebang)
```
#! /usr/bin/env RScript

print("Hello world!");
```
Commands
```
# making executable
chmod +x hello_world.R

# executing
./hello_world.R
```

#### Executing batch file from inside R

`source` command

### R and other technologies (some libs are obsolete)

- MS Excel (many packages)
- rApache (R analysis in web app)
- Rserve (binary R server for multiple users)
- ESS - Emacs Speaks Statistics (package for using R inside Emacs)

## Chapter 3. A Short R Tutorial

In R `"Hello world."` is a character vector of length 1.

`c("Hello","World")` is a character vector of length 2.

To test it type: 

In [14]:
length(c('hello','hello'))

Therefore there is no string concept in R, each character vector can store more than one character.

### Functions

Functions take arguments and processes them, they can take form of `f(...)` or operator (e.g. addition `+`, exponentiation `^` or equality `==` operator)

Functions are assigned like variables
```
f <- function(x,y) {c(x+1, y+1)}
```
(to see code for existing function, type it's name without `()`)

### Data Structures

Array - one data type vector that's associated with a dimension attribute

  - can have more than 2 dimensions
  
```
> a <- array(c(1:12), dim=c(3,4))
```
Matrix - two dimensional array

List  

  - vector containing multiple objects of possibly varius data types
  - each component in a list can be named
  - objects can be referred by location or name
  - can contain other lists

In [15]:
# a list containing two strings
e <- list(thing='hat', size='8.25')
e

In [12]:
e$thing

In [11]:
e[1]

In [10]:
e[[1]]

In [9]:
g <- list(element_1 = 'element one',other_list = e)
g

Data frame

  - list
  - contains multiple vectors (optionally of various types)
  - each vector has to be the same length

In [7]:
# creating df
teams <- c('PHI','NYM','FLA','ATL','WSN')
w <- c(92,89,94,72,59)
l <- c(70,73,77,90,102)
nleast <- data.frame(teams,w,l)
nleast

teams,w,l
PHI,92,70
NYM,89,73
FLA,94,77
ATL,72,90
WSN,59,102


In [8]:
# get number of loses by Florida Marlins
nleast$l[nleast$teams=='FLA']

### Objects and Classes

Functions available only for specific classes are called methods (duh), but class system is less formal then in e.g. Java.

Methods for different methods that share the same name are called *generic* methods.

E.g. `+` is generic function for adding objects or `print()` for printing objects to console (which is being evoked every time user calls output using R interactive shell).

### Models and Formulas

*To statisticians, a model is a concise way to describe a set of data, usually with a mathematical formula. Sometimes, the goal is to build a predictive model with training data to predict values based on other data. Other times, the goal is to build a descriptive model that helps you understand the data better.*

*R has a special notation for describing relationships between variables. Suppose that you are assuming a linear model for a variable y, predicted from the variables x1, x2, ..., xn. (Statisticians usually refer to y as the dependent variable, and x1, x2, ..., xn as the independent variables.) In equation form, this implies a relationship like:*

![image.png](attachment:image.png)

*In R, you would write the relationship as* `y ~ x1 + x2 + ... + xn`*, which is a formula object.*

#### Example:

`lm` function estimates the parameter of a linear model, it returns an object of class `lm`, which is assigned to variable `cars.lm`

In [4]:
cars.lm <- lm(formula=dist~speed, data=cars)
cars.lm


Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  


For additional data call `summary(cars.lm)`

In [5]:
summary(cars.lm)


Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12


Which can be performed in one step if preferred

In [6]:
summary(lm(dist~speed, cars))


Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12


### Charts and Graphics

Most useful packages: `graphics`, `lattice`