# A quick R tutorial

## Installation

### Installing R into a conda environment
my `.yml` file for installing a new conda environment with R and several python packages:

```
name: R
channels:
  - defaults
  - conda-forge
  - r
dependencies:
  - python 3.7.*
  - pip
  - numpy
  - scipy
  - pandas
  - scikit-learn
  - matplotlib
  - seaborn
  - openpyxl
  - jupyter
  - jupyterlab
    
  
  # R
  - r:
    - r
    - r-irkernel
```

### Installing R packages
Packages can be installed directly from R environment (console, jupyter or IDE).   
First you will need to install several basic packages + IRkernel (in case you want to use jupyter).

In [None]:
install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest'))
devtools::install_github('IRkernel/IRkernel')
IRkernel::installspec() # install IRkernel for the current user

The following commands show how to install packages in general:
- One at a time
- Multiple at a time

In [27]:
install.packages("ggplot2")
install.packages(c("reshape2", "dplyr"))

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

also installing the dependencies ‘plyr’, ‘tidyselect’, ‘plogr’


Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done



updating works as follows:

In [28]:
update.packages(c("ggplot2", "reshape2", "dplyr"))

## Check up

In [22]:
myString <- "Hello, World!"
print ( myString)

[1] "Hello, World!"


In [24]:
# get version and path of the R interpreter
print(R.version.string) 
print(R.home())

[1] "R version 3.6.3 (2020-02-29)"
[1] "/usr/local/anaconda3/envs/R/lib/R"


## Packages

### Importing

In [26]:
library(ggplot2)

### List Installed packages
PyCharm shows these

In [14]:
ip = as.data.frame(installed.packages()[,c(1,3:4)])
ip = ip[is.na(ip$Priority),1:2,drop=FALSE]
ip

Unnamed: 0_level_0,Package,Version
Unnamed: 0_level_1,<fct>,<fct>
askpass,askpass,1.1
assertthat,assertthat,0.2.1
backports,backports,1.1.6
base64enc,base64enc,0.1-3
BH,BH,1.72.0-3
brew,brew,1.0-6
callr,callr,3.4.3
cli,cli,2.0.2
clipr,clipr,0.7.0
commonmark,commonmark,1.7


### List all packages (pre-installed + installed)
All packages you can import.  Rstudio shows this.  

In [15]:
library()

R packages available

Packages in library ‘/usr/local/anaconda3/envs/R/lib/R/library’:

askpass                 Safe Password Entry for R, Git, and SSH
assertthat              Easy Pre and Post Assertions
backports               Reimplementations of Functions Introduced Since
                        R-3.0.0
base                    The R Base Package
base64enc               Tools for base64 encoding
BH                      Boost C++ Header Files
boot                    Bootstrap Functions (Originally by Angelo Canty
                        for S)
brew                    Templating Framework for Report Generation
callr                   Call R from R
class                   Functions for Classification
cli                     Helpers for Developing Command Line Interfaces
clipr                   Read and Write from the System Clipboard
cluster                 "Finding Groups in Data": Cluster Analysis
                        Extended Rousseeuw et al.
codetools               Code Analysis

Notice, that `datasets` package is preinstalled, so you can play with some small datasets, for example `iris`, even without installing new packages.

## Datastructures

### Vectors
- the most important data suructure in R
- scalar are vectors of L=1
- 1-D order data structures: a vector will flatten other vectors when initialized
- container vector rather than geometric vector
- vector cycling for operations on vectors with different length
- indexing starting with 1

**Contain elements of the same type**  
 

In [48]:
b <- c(1,2,3)
b

If the input has different types, elements will be casted to smth like String.

In [42]:
a <- c(1,2,3, "c", c(1,2,3))
a

In [44]:
a[1:2]

In R, x[-n] returns a copy of x with the nth element removed.

In [43]:
a[-2]

### Lists
Can contain different types of objects including other lists, vectors, functions etc.

In [135]:
list1 = list(c(2,5,3),21.3,sin)
list1

In [138]:
# a smaller list
list1[1]

In [139]:
# the element itself
list1[[1]]

### Sequences

In [45]:
seq(1,10,3)

The notation a:b is an abbreviation for seq(a, b, 1).

### Matrices and arrays
- `array` datatype. Matrix is just a 2-D array.
- Matrices are filled in column-major order. This can be changed with `byrow = TRUE` in `matrix` function.
- the access works as usual with the `[row, col]`

In [60]:
m <- array( c(1,2,3,4,5,6), dim=c(2,3) )
m

0,1,2
1,3,5
2,4,6


In [77]:
m[1,]

In [78]:
m[,1]

In [61]:
n <- matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, byrow = TRUE)
n

0,1,2
1,2,3
4,5,6


In [76]:
m[2,]

### DataFrames
similar to pandas DataFrames

In [80]:
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)

  gender height weight Age
1   Male  152.0     81  42
2   Male  171.5     93  38
3 Female  165.0     78  26


In [126]:
BMI$weight

In [129]:
BMI$weight[1:2]

In [128]:
BMI[1:2, "weight"]

In [130]:
BMI[1:2, c(FALSE, TRUE, FALSE, TRUE)]

Unnamed: 0_level_0,height,Age
Unnamed: 0_level_1,<dbl>,<dbl>
1,152.0,42
2,171.5,38


### Functions

In [71]:
f <- function(a, b)
{
    return (a+b)
}

In [72]:
f(2,3)

## Lambdas

In [2]:
l1 <- (function(x,y) (x + 1) * y^3 > 0.3 )(1,3)
print(l1)
condition <- function(x,y) (x + 1) * y^3 > 0.3
l2 <- condition(1,-3)
print(l2)

[1] TRUE
[1] FALSE


### Factors
factors are used to work with categorical variables like "male"/"female", "north"/"east/"south"/"west" etc

In [119]:
# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.factor(data))

# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))


 [1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West" 
[10] "East"  "North"
[1] FALSE
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
[1] TRUE


## **Note:** DF transforms all text data into factor type.

# Misc
- `args(fctn)` displays functions arguments
- `?fctn` and `help(fctn)` display help on any function fctn, as in Python.
- To invoke complex arithmetic, add 0i to a number. For example, sqrt(-1) returns NaN, but sqrt(-1 + 0i) returns 0 + 1i.
- sessionInfo() prints the R version, OS, packages loaded, etc.
- `ls()` shows which objects are defined.
- rm(list=ls()) clears all defined objects.
- dev.new() opens a new plotting window without overwriting the previous one.
- The function sort() does not change its argument.
- **Distribution function prefixes d, p, q, r stand for density (PDF), probability (CDF), quantile (CDF-1), and random sample. For example, dnorm is the density function of a normal random variable and rnorm generates a sample from a normal random variable. The corresponding functions for a uniform random variable are dunif and runif.**

## Naming
- can use underscores, but not at the start
- can use dot and start with a dot  
Examples:
- `var_name2.`
- `cat.1`
- `cat.2`
- `cat.3`

# More

In [85]:
# printing several statements
cat(1,"c", "\n") 
cat(2)

1 c 
2

In [86]:
class(c(2,3))

In [124]:
typeof(c(2,3))

In [87]:
class(2)

In [91]:
class(list(1,2))

In [125]:
typeof(list(1,2))

In [92]:
class(TRUE)

In [94]:
var.1 = 1
var.2 = 'b'
var.3 = list(1,2)

# list all variables
print(ls())

# list variables with `var` in their name
print(ls(pattern = "var"))  
rm(var.2)
print(ls(pattern = "var"))  

 [1] "a"        "b"        "BMI"      "f"        "ip"       "m"       
 [7] "myString" "n"        "var.1"    "var.2"    "var.3"   
[1] "var.1" "var.2" "var.3"
[1] "var.1" "var.3"


## Some useful operations
- `%in%` checks if an element is inside of the vector, array, list
- `%*%` does matrix multiplication
- `t()` tranposes a matrix

In [95]:
# %in% operator
v1 <- 8
v2 <- 12
t <- 1:10
print(v1 %in% t)
print(v2 %in% t)

[1] TRUE
[1] FALSE


In [98]:
M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
MM = M %*% t(M)
print(M)
print(t(M))
print(MM)

     [,1] [,2] [,3]
[1,]    2    6    5
[2,]    1   10    4
     [,1] [,2]
[1,]    2    1
[2,]    6   10
[3,]    5    4
     [,1] [,2]
[1,]   65   82
[2,]   82  117


In [101]:
2 %in% list(2,3, 'c')

In [107]:
print(LETTERS)
for (i in LETTERS[2:5]) {
    print(i)
}

 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
[1] "B"
[1] "C"
[1] "D"
[1] "E"


In [108]:
for (i in -3:1) {
    print(i)
}

[1] -3
[1] -2
[1] -1
[1] 0
[1] 1


## How Matrices work

In [110]:
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

     [,1] [,2] [,3]
[1,]    3    4    5
[2,]    6    7    8
[3,]    9   10   11
[4,]   12   13   14
     [,1] [,2] [,3]
[1,]    3    7   11
[2,]    4    8   12
[3,]    5    9   13
[4,]    6   10   14
     col1 col2 col3
row1    3    4    5
row2    6    7    8
row3    9   10   11
row4   12   13   14


In [113]:
N[2:3,]

0,1,2
4,8,12
5,9,13


In [115]:
P[2:3,-3]

Unnamed: 0,col1,col2
row2,6,7
row3,9,10


## Most operations on vectors, matrices, arrays etc are elementwise
- *, /, +, -