<center><h1><b>R LANGUAGE</b></h1></center>


## 00 - INTRODUCTION TO R

#### WHAT IS R
R is a powerful programming language and environment primarily used for statistical computing, data analysis, and visualization. It was developed in the early 1990s. Today, R is widely used in academia, research, and industry, particularly in fields like data science, finance, and bioinformatics.

One of R's key strengths is its extensive ecosystem of packages that provide tools for machine learning, time series analysis, and graphical representation of data. R also excels in data visualization, thanks to libraries like ggplot2, which enable users to create complex and aesthetically pleasing charts.

Despite being primarily designed for statistical tasks, R is a fully free and functional programming language that supports object-oriented and functional programming paradigms. It integrates well with other languages such as Python and C++, making it a versatile choice for data-driven projects.

For more info see [here](https://www.r-project.org/).

#### HOW CAN I USE IT
R can be used in multiple environments:
* RStudio: The most popular integrated development environment (IDE) for R, providing a user-friendly interface, debugging tools, and built-in support for visualization and package management.
* Bash (Command Line): R can be run directly from the terminal or command prompt using the R or Rscript commands, allowing for automation and scripting. (after installation, it's sufficient to type 'R' in the shell)
* Jupyter Notebook: By installing the IRKernel, R can be used in Jupyter Notebooks, enabling integration with Python and other languages.

#### MAIN DIFFERENCES WITH PYTHON
* R uses curly braces `{}` to define code blocks, and indentation is optional
* R only supports `#` for single-line comments; no built-in multi-line comment syntax (like apexes in python)
* R uses `<-` for variable assignment (though '=' and '->' also works in some cases)
* R uses install.packages() for package installation (instead of conda install)
* two or more expressions can be placed on the same line, if are separated by `;`
* you can print an object during definition including it inside parenthesis `(object)`
* vector indexing starts at 1!
* the integer are initialized with the letter `L` (=long), as in `x <- 2L`

### OTHER PECULIARITIES TO REMEMBER:
* variable names are case sensitive : y different from Y
* variable names must not begin with numbers (4t) or symbols (%8)
* variable names must not contain blank spaces (use m.value instead of m value)

Both languages are powerful, but Python is more versatile, while R is specialized for statistical computing and data visualization.

---

## 01 - FUNDAMENTALS

#### GETTING HELP
You can ask for help to R with: 
* a symbol `?` before a command
* `help.search("keyword")`: cerca documentazione relativa a un termine specifico all'interno di pacchetti installati. È utile se non conosci il nome esatto della funzione.
* `find("function_name")`: cerca in quali pacchetti caricati è definita una funzione specifica.
* `apropos("keyword")`: cerca oggetti nel workspace e nei pacchetti caricati.
* `library(help=package_name)`: gives us details on a package

#### PACKAGES
To install a package use `install.packages(package_name)`.\
To update all packages use `update.packages(ask=False)`.\
With `installed.packages()` we can see a list with all installed packages.\
To load an already installed package we use `library(package_name)`.

#### OBJECTS
To list all the objects created with the current session, use the `ls()` or `objects()` functions.\
To list all the packages and data frames currently attached to the running R session, use `search()`.\
To show the structure of an object (functions...), in a compact way, the `str()` function can be used.

---

## 02 - NUMBERS
The number $\pi$ is known as `pi`. 

Calculations can lead to results which go to $\infty$ or are indeterminate $NaN$, but they are properly evaluated as numbers! We can test if a number is infinite with `is.infinite(x)`, `is.finite(x)` and `is.nan(x)` (they will return a boolean).

There could also be missing values, represented by NA (= not available), and we check them with `is.na(x)`.

In [65]:
3/0
Inf - Inf
typeof(Inf)
typeof(NaN)
is.finite(3/0)

#### MAIN OPERATIONS
* `+`, `-`, `*`, `/`: sum, subtraction, multiplication, division
* `%/%`, `%%`, `^`: integer quotient, modulo, power
* `>`, `>=`, `<`, `<=`, `==`, `!=`: relational operators
* `!`, `&`, `|`: logical not, and, or
* `~`: model formulae (‘is modelled as a function of’)
* `<-`, `->`: assignment (gets)
* `$`: list indexing (the ‘element name’ operator)
* `:`: sequence creation operator

Note: several of these operators have different meaning inside model formulae :
* `*` indicates the main effects plus interaction (rather than multiplication),
* `:` the interaction between two variables (rather than generate a sequence), and
* `^` interactions up to the indicated power (rather than raise to the power)

In [4]:
9 %/% 2      # integer part of the division
9 %% 2   # reminder ( modulo ) of the division
15421 %% 7 == 0

#### MATHEMATICAL FUNCTIONS
* `log(x)`: natural log of x
* `exp(x)`: exponential of x
* `log(x, n)`: log in base n of x
* `log10(x)`: log in base 10 of x
* `sqrt(x)`: square root of x
* `factorial(x)`: $x! = x(x − 1)(x − 2) . . . 3 · 2 · 1$
* `choose(n, x)`: binomial coefficient, $n!/(x! · (n − x)!)$
* `gamma(x)`: $\Gamma(x)$ for real x, (x − 1)! for integer x
* `lgamma(x)`: natural log of $\Gamma(x)$
* `abs(x)`: absolute value for x
* `floor(x)`: greater integer less than x
* `ceiling(x)`: smallest integer greater than x
* `trunc(x)`: closest integer to x between 0 and x; it behaves as floor() for x > 0 and like ceiling() for x < 0

#### COMPLEX NUMBERS
Here a list of built-in functions:
* `Re(z)`: extract the real part
* `Im(z)`: extract the imaginary part
* `Mod(z)`: calculate the modulus
* `Arg(z)`: calculate the argument Arg(x+yi) = atan(y/x)
* `Conj(z)`: work out the complex conjugate
* `is.complex(z)`: test for complex number membership
* `as.complex(z)`: force the input as a complex number

In [6]:
Im(3.5 + 2i)
Mod(3.5 + 2i)
is.complex( 3.5 + 2i)
as.complex( 3.5 )

---

## 03 - DATA TYPES

#### INSPECT DATA
* `class(x)`: tell us what kind of data we have in x (numeric, etc)
* `typeof()` / `storage.mode()`: get or set the mode (i.e. the type so double etc), or the storage mode of an R object
* `length(x)`: returns the number of element in the object x
* `str(x)`: compactly display the internal structure of the object x

In [21]:
(x <- c(3, 7, 9))
class(x)
typeof(x)
length(x)
str(x)

 num [1:3] 3 7 9


In [13]:
x <- 4.7; length(x)
y <- c(1, 2, 5, 8); str(y)

 num [1:4] 1 2 5 8


We can always test whether objects are a particular type and also coerce them to a different type. In this list the first command will test, the second will coerce:
* Array               → `is.array()` / `as.array()`
* Character           → `is.character()` / `as.character()`
* Complex             → `is.complex()` / `as.complex()`
* Dataframe           → `is.data.frame()` / `as.data.frame()`
* Double              → `is.double()` / `as.double()`
* Factor              → `is.factor()` / `as.factor()`
* List                → `is.list()` / `as.list()`
* Logical             → `is.logical()` / `as.logical()`
* Matrix              → `is.matrix()` / `as.matrix()`
* Numeric             → `is.numeric()` / `as.numeric()`
* Raw                 → `is.raw()` / `as.raw()`
* Time series         → `is.ts()` / `as.ts()`
* Vector              → `as.vector()` / `as.vector()`


#### VECTOR
The basic data structure is a **vector** : a sequence of values stored in contiguous memory areas. Vector are atomic types : all elements must be of same type. R is a dynamically-typed language. Dynamic typing allows to assign a value of a different data type to the same variable at any time. Scalar types do not exist, they are considered one-element vectors. Longer vectors are usually created with the concatenate `c()` function.

With the `rep(value, each=m, times=n)` function you can create a vector built by repeating n-times the same value (written m times each time).

INDEXING:  
The indexing is done through `[]`. Inside I can input a single index but also a vector of indexes. With negative input you can exclude elements, like `vector[-3]` will exclude the third element. Masking works as in python, so for example with `vector[x<5]` we select only the elements which are lower than 5, because the code `x<5` will generate a boolean mask (=array) of Ture and False values.

Functions for vectors:
* `max(x)` - the maximum value in x  
* `min(x)` - the minimum value in x  
* `sum(x)` - the sum of all values in x  
* `mean(x)` - arithmetic average of the values in x  
* `median(x)` - median value in x  
* `range(x)` - a vector with inside only two values: min(x) and max(x)  
* `var(x)` - sample variance of x  
* `cor(x, y)` - correlation between x and y vectors  
* `sort(x)` - a sorted version of x  
* `rank(x)` - a vector with the ranks of the x values, cioè la posizione di ciascun valore se fosse ordinato (tipo associa le posizioni in classifica)
* `order(x)` - a vector with the permutations to sort x in asc order, i.e. the indexes of the ordered values of x
* `quantile(x)` - a vector with: minimum, lower quantile, median, upper quantile and maximum of x  
* `cumsum(x)` - a running sum of the vector elements  
* `cumprod(x)` - a running product of the vector elements  
* `cummax(x)` - a vector of non-decreasing numbers with the cumulative maxima  
* `cummin(x)` - a vector of non-decreasing numbers with the cumulative minima  
* `pmax(x, y, z)` - vector containing the maximum of x, y or z for each position  
* `pmin(x, y, z)` - vector containing the minimum of x, y or z for each position  
* `colMeans(x)` - column means of a dataframe or matrix  
* `colSums(x)` - column sums of a dataframe or matrix  
* `rowMeans(x)` - row means of a dataframe or matrix  
* `rowSums(x)` - row sums of a dataframe or matrix  


In [66]:
( x <- 3 : 1 )

f1 <- 5
x*f1            # f1 will be broadcasted to all elements of x

f2 <- c(10, 100)
x*f2           # the shorter vector, f2, is elongated to cover the length of x

“la lunghezza più lunga dell'oggetto non è un multiplo della lunghezza più corta dell'oggetto”


In [47]:
vec <- c(4, 7, 6, 5, 6, 7)
mean(vec)
max(vec)
vec[1]
vec[-c(1,2,3)]
rep(1,5)
rep (1:4, 1:4)    # replicate each sequence number a different number of times

In [67]:
x <- c(10, 5, 8, 20, 1)
range(x)
rank(x)
order(x)
x[order(x)]

We can also set names to the elments with:

In [70]:
x <- c(a=1, b=2, c=3)
x
names(x)

#### SEQUENCES
R provides and easy way to generate a sequence of numbers. With `a:b` it creates a sequence from a to b (also in inverse if a>b), but only of step=1. If you want something eslse use `seq(start, stop, step)` for decididng th step or `seq(start, stop, length=n)` for deciding how many numbers.

In [41]:
0:5
5:-2
seq(-1, 1, 0.5)

#### MATRIXES
With the `dim(x)` attribute of a vector, we change its dimension for example in a matrix, 'reshaping' it and its elements in a wanted dimension c(dim1, dim2...).

In [74]:
vec <- c(1:20); vec
dim(vec) <- c(4,5); vec

0,1,2,3,4
1,5,9,13,17
2,6,10,14,18
3,7,11,15,19
4,8,12,16,20


We can directly define a matrix with `matrix(elements, nrow, ncol, byrow)`. L'opzione byrow decide se riempire la matrice per righe o per colonne. We can later reshape it using `dim`. An element is accessed with `mat[row,col]`. 

We multiply matrixes with `%*%`.

We can define diagonal matrixes with `diag(elem, nrow, ncol)`. 

We can compute the determinant with `det(mat)`.

Con `solve(mat, b)` possiamo risolvere un sistema lineare Ax=b, il che significa invertire la matrice mat nel caso in cui b=0, cioè con `solve(mat)`.

In [79]:
m1 <- matrix(1:12, nrow=3); m1
m1[,3]      # all third column

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


#### ARRAY
An array is a multi-dimensional object where all the entries have the same class. The dimensions of an array are specified by its dim argument.

In [80]:
ar <- array(1:24, dim = c(2, 4, 3) ); ar