# Statistical package R

It is used for statistical computations. It has a lot of statistical functions implemented. It tryies to show results of evaluations in a beautiful and understandible form. It also allows making beautiful plots and diagrams.

Let us use an "R cheatsheet". [https://www.rstudio.com/wp-content/uploads/2016/10/r-cheat-sheet-3.pdf](https://www.rstudio.com/wp-content/uploads/2016/10/r-cheat-sheet-3.pdf)

In [1]:
x <- 10 # an assignment
x = 10 # it is possible to use =, but don't do this, because it can lead to problems, and it is used in a different place
42 -> y # assignment

The most basic data type in R is a vector. Vector is a linear array of elements. There are several "modes": numerical, character, boolean. You create a vector with the `c()` function:

In [2]:
c(1, 2, 3) # numeric
c(T, F, T) # boolean
c("abc", "xyz") #character
# in octave: c(...) -> [...]

You can not mix modes, you either have numbers or characters, but not both.

In [3]:
2:6 #forward
10:1 # backwards
seq(1, 10, by=2) # from 1 to 10 with step 2 (In Octave: 1:2:10)
seq(1, 10, length.out=4) # from 1 to 10, but create 4 elements (In octave: linspace(1, 10, 4))
seq(1, 10, 4) # if you don't write anything before 4, it is "by="

Here we see that functions in R usually take _named arguments_. You put the name of an argument with the equality.

## Indexing

In [4]:
x <- seq(10, 1000, by=10)
x

In [5]:
x[4] # square brackets for indexing
x[c(1, 5, 7)]           # in Octave: x([1, 4])
x[-4]    # all except the fourth element
x[c(-4, -5)] # all except the 4th and 5th elements

x[x > 600] # all elements that are greater than 600 (In Octave: x(x > 600) )

x > 600 # this is a logical vector. And x[x > 600] is a logical indexing. We select only elements that have TRUE in the index

In [6]:
1 == 2
30 == 30
c(10, 20, 30, 40) == 20
c(10, 20, 30, 40) > 20

## Factors

Imagine, we want to store information about a sex of people: Males, Females. Let's have a vector:

In [7]:
s <- c(T, F, T, F, F, F) # not obvious, who is Male, who is Female.
s <- c("Male", "Female", "Male", "Female") # 1) we can make a typo 2) needs a lot of memoro
s <- c(1, 2, 1, 2, 2, 2) # you have to specify what is what

That's why we need factors. They are used to store variables that have a finite number of possible values (Male, Female or Bad, Medium, Good). We use a function called "factor" to create factors:

In [8]:
sexes <- factor(c("Male", "Female", "Male", "Male")) # we create a factor from a character vector
sexes

# we can specify levels explicitly
states <- factor(c("Good", "Medium", "Good"), levels=c("Good", "Medium", "Bad"))
states

How are factors stored. A factor is just a numeric vector (1, 2, 1, 1, 2) with additional metainformation inside. This is an information about levels, that is, what each number means.

Factor = Numeric vector + Metainformation. When we print a factor, we don't see numbers, we see them substituted by levels

Factors can be ordered. For states of a patient we know that Good is better than Medius, Medium is better that Bad. So we can say that this factor is ordered. In future we will be able to sort based on this factor:

In [9]:
states <- factor(c("Good", "Medium", "Good"), levels=c("Good", "Medium", "Bad"), ordered=T)

## Simple statistics

In [10]:
x <- c(1, 2, 5, 3, 5, 7, 8, 4, 3, 1, 4, 4, 5, 6, 7, 8)
summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.000   4.500   4.562   6.250   8.000 

`summary` shows information about a vector (or any other data given to it). Minimum, maximum, mean value, median value, 1st quantile, 3rd quantile.

(Median = 2nd quantile)

In [11]:
x <- c("a", "b", "a", "a", "c", "b")
summary(x)

   Length     Class      Mode 
        6 character character 

This only means that this is a vector of charaters (strings) of length 6

In [12]:
summary(sexes)

`summary` is a smart function that works differently for different types of arguments. Another example of such a function is a `table` function:

In [13]:
sexes <- factor(c("Male", "Female", "Male", "Male", "Female"))
states <- factor(c("Good", "Good", "Bad", "Medium", "Good"), levels=c("Bad", "Medium", "Good"), ordered=T)
table(sexes, states)

        states
sexes    Bad Medium Good
  Female   0      0    2
  Male     1      1    1

This works for vectors of the same size. It counts elements and tables them. It even works if there are 3 vectors. (you may try it)

## A little more about metainformation

In [14]:
x <- c(10, 20, 30)
x

`x` has data 10, 20, 30 inside. But we can add names for its elements

In [15]:
names(x) <- c("first", "second", "third")
x

Metainformation controls the way data is printed and manipulated

In [18]:
x == 10 # metainformation is not discarded
x[1]
x["first"]