# Variables, Types, Operators

---

## Basics of programming for Data Science and Machine Learning


Applied Mathematical Modeling in Banking

---

## Table of contents

1. Vectors
2. Matrices


N. Apply functions family

---

# 1. Vectors

### Announcement of vectors

A vector is a base data type in `R` that allows you to write a collection of elements of the same type with or without `c() `if it is a sequence of values.

_Note. In essence, the function `c )` allows you to combine several vectors._

Consider for example the usual variable `x`:

In [2]:
x <- 10

In essence, `x` in this case is a vector consisting of one value of` 10`. We can also write several elements to the variable `x`:

In [3]:
x <- c(1, 2, 2.5, 3)
x

Vector elements can be values of any type: `numeric`,` character`, `logical`, etc .:

In [4]:
v1 <- c(1, 3, 4, 6, 7)
v2 <- c(T, F, F, T, F)
v3 <- c("Hello", "my", "friend", "!")

Vector elements are also sequences created using the functions `rep ()`, `seq ()` and the operator `:`:

In [6]:
vtr <-  2:7
vtr
vtr <- 7:2
vtr

If you need to combine several vectors, use the `c()` function:

In [8]:
x <- 2:3
y <- c(4,6,9)
z <- c(x, y, 10:12, 100)
z

You can view brief descriptive statistics by vector using the ** `summary()` ** function:

In [11]:
summary(z)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    4.00    9.00   17.44   11.00  100.00 

---

### Operations on vectors

The advantage of using vectors over writing each value in a separate variable is the ability to perform 1 operation on all elements of the vector or on several vectors simultaneously, for example, arithmetic operations of addition or multiplication.

In [13]:
v1 <- c(1, 3)
v2 <- c(2, 5)
v1 + v2

From the example described above, it can be understood that the addition operation is essentially a superelement sum of vectors when the 1st element of the vector `v1` is added to the 1st element of the vector` v2`(`1 + 2`) and so on. Thus, the resulting vector will have the same length as the vectors `v1` and` v2`.

However, there may be a situation when one of the vectors has a shorter length or even consists of 1 element:

In [15]:
v1 <- c(1, 3)
v2 <- 2
v1 + v2

In this case, the number `2` will be added to each element of the vector` v1`. In fact, this means that the vector `v2` will look like `c 2, 2)`, ie there will be a duplication of values to the length of the vector` v1` and then perform the operation of adding elements. Thus, the resulting vector will have the length of the longest of the vectors.

Consider a more complex case where there are vectors with different numbers of elements other than 1:

In [17]:
v1 <- c(2, 3)
v2 <- c(4, 5, 6, 7)
v3 <- c(1, 8, 9)
v1 + v2 + v3

"longer object length is not a multiple of shorter object length"


To begin with, it should be noted that the interpreter warns that the lengths of the vectors are not multiples (if they were vectors of length 2, 4, 8, then there would be no warning).

If you extend each vector to the length of the maximum of them, repeating the elements cyclically, you get a set (*marked added elements*):

```r
v1 <- c(2, 3,*2,*3)
v2 <- c(4, 5, 6, 7)
v3 <- c(1, 8, 9,*1)
```

Subtraction (`-`), division(`/`) and multiplication (`*`) operations are performed similarly.

The relation operators and logical operators also act element by element with respect to the vector, but the result is a collection (vector) of values of the logical type `logical` with the values` TRUE/FALSE`.

Consider an example of finding all elements of the array `v1` that are greater than the corresponding index elements of the array` v2`:

In [20]:
v1 <- c(2, 4, 7, 9, 12)
v2 <- c(6, 4, 6, 7, 1)
v1 > v2

In essence, as a result of execution there is a comparison of each element of both vectors among themselves: `2>6`,` 4>4`, `7>6`,` 9>7`, `12>1`.

Therefore, the previously studied operators (arithmetic, logical, relations) can be used to work with vectors as well.

### Naming vector elements

In order to understand what vectors mean and what data is often described, analysts need to sign this data.

We will write down information about daily visits to the site by users during the week in the following way:

In [22]:
# Count of unique bank branch visits from Monday to Sunday
data <- c(1245, 2112, 1321, 1231, 2342, 1718, 1980)

Next, assign values to the days of the week using the `names ()` function:

In [24]:
names(data) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
print(data)

   Monday   Tuesday Wednesday  Thursday    Friday  Saturday    Sunday 
     1245      2112      1321      1231      2342      1718      1980 


Otherwise, this code could be written as follows:

In [26]:
data <- c(1245, 2112, 1321, 1231, 2342, 1718, 1980)
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
names(data) <- days
data

If we need to get information, for example, about the name of the 4th element of the vector, we can use the code:

In [28]:
names(data)[4]

The `names ()` function allows not only to set the values of names for vector elements, but also to obtain information about them.

---

### Access to vector elements

Indexing of elements inside the wind occurs from `1` to` n`, where `n` is the number of elements of the vector.

<div class = "alert alert-info alert-sm"> &nbsp; Note. In `R`, the indexing of array, vector, and all other collection types begins with <b>1</b>, not with <b class ="text-danger" style ="text-decoration: line-through">0</b>.<div>

Consider the previous example:

In [30]:
data <- c(1245, 2112, 1321, 1231, 2342, 1718, 1980)
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
names(data) <- days

In order to record information only about site visitors on `Wednesday`, you need to use the operator `[]` and specify the index of the element in the array:

In [32]:
data_wednesday <- data[3]
data_wednesday

If there is a need to get several elements of the vector that are out of order, you can do it like this:

In [34]:
some_days <- data[c(1, 2, 5)]
some_days

From the example above it is clear that the indices of the vector `data` are another vector `c(1, 2, 5)`, so it can be declared as a separate variable:

In [35]:
indexes <- c(1, 2, 5)
some_days <- data[indexes]
some_days

If there is a need to obtain information about several elements that are placed in a row, then for convenience (and in the case when such an array consists, for example, of 1000+ elements) use the operator `:`, for example:

In [37]:
working_days <- data[1:5]
working_days

Thus, all working days of the week are selected for the `working_days` vector.

---

### Useful functions

Let's take a look at some useful features that will simplify working with vectors. For further calculations we will use two vectors `A` and` B`:

In [39]:
A <- c(3, 5, 8, 2, 5, 4, 2)
B <- c(3, NA, 1, NA, 6, 4, 5)
A
B

<i class = "fa fa-sticky-note-o"> </i> **Function `sum()`**. This function is used to find the sum of the elements of the collection:

In [41]:
sum(A)
sum(B)

An interesting point is that in the presence of gaps in the data (value `NA`) the calculation of the amount is impossible. In this case, the functions can take the additional parameter `na.rm = T`, where` T` is an abbreviation of `TRUE`, which indicates the need to remove gaps in the data before performing the operation.

_Note. You should check the documentation for such a parameter in the function. If it is not present, then it is necessary to carry out cleaning in other ways before work with the data._

In [43]:
sum(B, na.rm = T)

<i class = "fa fa-sticky-note-o"> </i> **The `mean ()`** function is used to find the arithmetic mean of numbers:

In [45]:
mean(A)
mean(B, na.rm = T)

<i class = "fa fa-sticky-note-o"> </i> **`min ()` and `max ()`** functions allow you to find the minimum and maximum values, respectively:

In [47]:
min(A)
max(A)

Also to work in `R` there is a large number of built-in implemented functions to perform statistical, econometric and other research in the field of economics and beyond. Try the `sd()`, `cov()`, `cor()` functions.

<i class = "fa fa-sticky-note-o"> </i> **The `length ()`** function helps to determine the "length" of a vector, ie the number of elements:

In [48]:
length(A)
length(B)

<i class = "fa fa-sticky-note-o"> </i> **The `unique ()`** function identifies unique elements in an array:

In [50]:
A
unique(A)

print("---")

B
unique(B)

[1] "---"


<i class = "fa fa-sticky-note-o"> </i> **The `intersect()`** function allows you to find common elements of two vectors, so for vectors `A` and` B` common values are ` 3`, `4` and` 5`:

In [52]:
A
B
intersect(A, B)

Conversely, <i class = "fa fa-sticky-note-o"> </i> **The `union()`** function allows you to combine elements of both sets / vectors:

In [54]:
A
B
union(A, B)

Try to understand the operation of the functions `setdiff()`, `setequal()`, `is.element()`.

_I recommend reading the short materials here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/sets.html_.

---

### Correction of data (NA, NaN, Inf)

In the process of working with data there are problems associated with the correctness of their reading, conversion and operations on them. For example, an incorrect entry in the field of integer type `" +10 "` instead of `10` may result in conversion to` NaN` or division by `0` to` Inf`.

Before using numerical and other data, the stage of cleaning and replacement of values is usually performed depending on the tasks of programming / research. In `R` the following types of the missed values are possible:

- [x] `NA` ** - Not Available.
- [x] `NaN` ** - Not a Number.
- [x] `Inf` ** - Infinity (infinity, can be with the sign` + `and` -`).

Let's start with vector:

In [56]:
vtr <- c(1, -2, NA, NaN, Inf, 1223, -Inf, NA, 21) 
vtr

You can check a single value for a space with the functions `is.na()`, `is.nan()`, `is.infinite()`, `is.finite()`.

In [58]:
is.na(vtr)
is.nan(vtr)
is.infinite(vtr)
is.finite(vtr) # if infinite == TRUE => finite == FALSE :)

Then replacement of values can be executed as follows (we will replace all `NA` on` 1000`, and `Nan` on` 500`):

In [60]:
vtr[is.nan(vtr)] <- 500
vtr

vtr[is.na(vtr)] <- 1000
vtr

## Nan also replaced with is.na()!!!

And then replace `Inf` with the `maximum` value in the vector, and `-Inf` with the `minimum`:

In [62]:
vtr <- c(1, -2, NA, NaN, Inf, 1223, -Inf, NA, 21) 
vtr
vtr[vtr == Inf] <- max(vtr[is.finite(vtr)], na.rm = T)
vtr[vtr == -Inf] <- min(vtr[is.finite(vtr)], na.rm = T)
vtr

If you want to replace the value in `Inf` regardless of the sign, you can use` is.infinite() `.

---

## 2. Matrices

### Creating matrices

**Matrix** - a collection of elements of the same type (`numeric`,` character`, `logical`) with a fixed set of rows and columns. In the case where the matrix has only rows and columns, it is a two-dimensional data array.

The matrix is created using the `matrix()` function:

In [65]:
matrix(1:10, byrow = TRUE, nrow = 2)

0,1,2,3,4
1,2,3,4,5
6,7,8,9,10


where `1:10` - a set of elements of the matrix, it can also be a pre-formed vector (entered, by calculation, from a file, etc.),`byrow = TRUE` - means that the elements in the matrix will be written in rows, so in the pedestrian line contains the value `1:5`, and the second` 6:10` (if we need to write information on the lines then we should use `byrow = FASLE`),`nrow` - the number of rows of the matrix.

In [67]:
sales1 <- c(12, 14, 15)
sales2 <- c(22, 15, 21)
sales <- c(sales1, sales2)
m <- matrix(sales, byrow= T, nrow = 2)
m

0,1,2
12,14,15
22,15,21


---

### Naming matrices

To specify the names of rows and columns of the matrix, use the functions `rownames()` and `colnames()`:

In [69]:
m <- matrix(1:9, nrow = 3)
rownames(m) <- c("row1", "row2", "row3")
colnames(m) <- c("c1", "c2", "c3")
m

Unnamed: 0,c1,c2,c3
row1,1,4,7
row2,2,5,8
row3,3,6,9


---

### Add rows and columns

Special methods `cbind/rbind` are used to change the number of elements in rows and columns of matrices, as well as to quickly combine them.

<i class = "fa fa-sticky-note-o"></i> ** The `cbind` ** function allows you to add one or more matrices and/or vectors behind one of the columns. That is, there is not a simple connection, but a comparison by key field. Consider an example:

In [74]:
m1 <- matrix(c(1:3, 101:103), nrow = 3)
colnames(m1) <- c("A", "B")

m2 <- matrix(c(201:203, 1001:1003), nrow = 3)
colnames(m2) <- c("C", "D")

m_bind <- cbind(m1, m2)

m1
m2
m_bind

A,B
1,101
2,102
3,103


C,D
201,1001
202,1002
203,1003


A,B,C,D
1,101,201,1001
2,102,202,1002
3,103,203,1003


---

### Access to matrix elements

The elements of the matrix are accessed by the index of rows and columns. You can select ranges in a similar way to vectors.

Let's look at an example:

In [76]:
m <- matrix(11:25, nrow = 3)
m

0,1,2,3,4
11,14,17,20,23
12,15,18,21,24
13,16,19,22,25


To display the 10th element of the matrix, you can use the entries _(note that the account is from the right left corner of the columns)_:

In [78]:
m[10]    
m[[10]]

To display the same element using row and column indexes, write as follows:

In [80]:
# Row #1
# Column #4
m[1,4]

**Question**: What record should you use ti get **18**?

**Answer**: `m[2,3]`

In [84]:
m[2,3]

If you want to output / use an entire row or a whole column, then the block with the index of unnecessary dimensionality can be left blank:

In [86]:
m[1, ] # first row only
m[c(1,3), ] # first and third row only

0,1,2,3,4
11,14,17,20,23
13,16,19,22,25


In [88]:
m[, 1] # first column only
m[, c(1,3)] # first and third column only

0,1
11,17
12,18
13,19


You can also specify a list of rows and columns to be output / received simultaneously:

In [90]:
m[c(1,3), 2:4]

0,1,2
14,17,20
16,19,22


You can exclude individual columns or rows by using indexes with minus signs (`-`):

In [92]:
m[-1, c(-2:-3)]

0,1,2
12,21,24
13,22,25


---

### Useful functions

#### Matrix dimmentions

To obtain information about the dimensions of the table, there are special functions: `nrow()`, `ncol()`, `dim()`:

In [103]:
# Decalre matrix
m <- matrix(1:15, ncol = 3)

m

print(paste0("Rows: ", nrow(m)))
print(paste0("Cols: ", ncol(m)))

print(paste("Dim: ", paste0(dim(m), collapse = " x ")))

0,1,2
1,6,11
2,7,12
3,8,13
4,9,14
5,10,15


[1] "Rows: 5"
[1] "Cols: 3"
[1] "Dim:  5 x 3"


Using `nrow()` and` ncol()` allows you to access the last row and column of the matrix, respectively:

In [104]:
m[nrow(m), ] # last row
m[, ncol(m)] # last colum

---

# References

1. The Comprehensive R Archive NetworkRcran: Url: https://cran.r-project.org/
2. RStudio official website. Url: https://rstudio.com/
3. Anaconda official website. Url: https://www.anaconda.com/
4. Introduction to R. Datacamp interactive course. Url:  https://www.datacamp.com/courses/free-introduction-to-r
5. Quanargo. Introduction to R. Url: https://www.quantargo.com/courses/course-r-introduction
6. R Coder Project. Begin your data science career with R language! Url: https://r-coder.com/
7. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.URL https://www.R-project.org/.
8. A.B. Shipunov, EM Baldin, P.A. Volkova, VG Sufiyanov. Visual statistics. We use R! - M .: DMK Press, 2012. - 298 p .: ill.
9. An Introduction to R. URL: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
10. R programming. https://www.datamentor.io/r-programming
11. Learn R. R Functions. https://www.w3schools.com/r/r_functions.asp