# R syntax in a nutshell – Part II


## 1.6	Arrays, matrices, data frames
Several kinds of table-like objects exist in R. Data frames are data objects to be processed by statistics, with “observations” as columns (elements/oxides in geo-chemistry) and “cases” (samples) in rows. They can contain columns of any mode, even mixed modes; thus they are not meant for matrix operations.

For such purpose, matrices should be used. All elements of a matrix can only be of a single mode (numeric, most commonly). Arrays are generalized matrices: they must have a single mode but can have any number of dimensions. Although superficially similar, these three types of objects must not be confused.

Depending on the exact content of the file, many file reading operations (such as `read.table`) would generate a data frame: it is the user’s responsibility to convert it to a matrix if it is to be used for calculations.

#### matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE)
This command defines a matrix of nrow rows and ncol columns, filled by the data (if data has several elements, they will be used down columns, unless an extra parameter `byrow=TRUE` is provided). For instance:


In [37]:
x <- matrix(1:12,3,4)
x

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [38]:
x <- matrix(1:12,3,4,byrow=TRUE)
x

0,1,2,3
1,2,3,4
5,6,7,8
9,10,11,12


The default behaviour for filling a matrix with data — as well as matrix division by a vector — proceeds along columns, not rows!
array(data = NA, dim = length(data))
Defines a new data array and fills it with data. The argument dim is a vector of length one or more, giving maximal dimensions in each of the directions. 

#### array(data = NA, dim = length(data))
Defines a new data array and fills it with data. The argument dim is a vector of length one or more, giving maximal dimensions in each of the directions. 


### 1.6.1	Matrix/data frame operations
Matrices can be subject to scalar operations using the common operators (`+-*/^`). Similar to vectors, the shorter component is recycled as appropriate. Useful functions for matrix/data frame manipulations are summarized in Table:

<table>
    <tr>
    <th>Function</th>
    <th>Meaning</th>
    </tr>
    <tr><td>`nrow(x)`</td><td>number of rows</td></tr>
    <tr><td>`ncol(x)`</td><td>number of columns</td></tr>
    <tr><td>`rownames(x)` </td><td>row names</td></tr>
    <tr><td>`colnames(x)`</td><td>column names</td></tr>
    <tr><td>`rbind(x,y)`</td>
        <td>binds two objects (matrices or data frames) of the same `ncol`
        (or vectors of the same length) as rows
        </td></tr>
    <tr><td>`cbind(x,y)`</td>
        <td>binds two objects (matrices or data frames) of the same nrow 
        (or vectors of the same length) as columns
        </td></tr>
    <tr><td>`t(x)` </td><td>transposition</td></tr>
    <tr><td>`apply(X,MARGIN,FUN)` </td>
    <td>applies function `FUN` (for vector manipulations) along the rows
    (`MARGIN` = 1) or columns (`MARGIN` = 2) of a data matrix X</td></tr>
    <tr><td>`x%*%y` </td>
        <td>matrix multiplication (does not work on data frame!)</td></tr>
    <tr><td>`solve(A)`</td><td>matrix inversion</td></tr>
    <tr><td>`dix(x)`</td><td>diagonal elements of a matrix</td></tr>
</table>

It is worth noting that matrix multiplication is performed using the `%*%` operator. Of the functions presented in the table, some explanation is required for `apply`:
#### apply(X, MARGIN, FUN,…)
If `X` is a matrix, it is split into vectors along rows (if `MARGIN` = 1) or columns (if `MARGIN` = 2). To each of these vectors is applied the function `FUN` with optional parameters `…` passed to it. 
For instance, we can calculate row sums of a matrix:

In [39]:
x <- matrix(1:12,3,4,byrow=TRUE)
apply(x,1,sum)

## 1.7	Indexing/subsetting of vectors, arrays and data frames
In real life, one often needs to select some elements of a vector or a matrix, fulfilling certain criteria. This data selection functionality can be achieved using logical conditions or logical variables placed in square brackets after the defined ob-ject name. Subsets can be also used on the left hand side of the assignments when replacement of selected elements by certain values is desired.  
### 1.7.1	Vectors
Subsets of a vector may be selected by appending to the name of the vector an index vector in square brackets. For example, first create a named vector:

In [40]:
x <- c(1,12,15,NA,16,13,0,NA,NA)
names(x) <- c("Pl","Bt","Mu","Q","Kfs","Ky","Ol","Px","C") 
x

Index vectors can be of several types: logical, numeric (with positive or negative values), and character:

#### 1.	Logical vector

In [41]:
x[x>10] # all elements of x higher than 10 (or NA)
x[!is.na(x)] # all elements of x that are available

#### 2.	Numeric vector with positive values

In [42]:
x[1:5] # the first five elements
x[c(1,5,7)] # 1st, 5th and 7th elements

#### 3.	Numeric vector with negative values (specifies elements to be excluded)

In [43]:
x[-(1:5)] # all elements except for the first five

#### 4.	Character vector (referring to the element names)

In [44]:
x[c("Q","Bt","Mu")]

### 1.7.2	Matrices/data frames
Elements of a matrix are presented in the order [row,column]. If nothing is given for a row or column, it means no restriction. For instance:
```R
x[1,] 	# (all columns) of the first row
x[,c(1,3)] 	# (all rows) of the first and third columns
x[1:3,-2] 	# all columns (apart from the 2nd) of rows 1–3
```
If the result is a single row or column, it is automatically converted to a vector. To prevent such a behaviour, one can supply an optional parameter `drop=F`, e.g.:
```R
x[1,,drop=F] 	# (all columns) of the 1st row, keep as matrix
```

Moreover, matrices can be manipulated using index arrays. This concept is best explained on an example. Let’s assume a matrix defined as:
```R
x <- matrix(1:20,4,5)
```
If the elements `[1,3]`, `[2,2]` and `[3,1]` in x are to be replaced by zeroes, create an  index array i containing the element coordinates:

In [45]:
x <- matrix(1:20,4,5)
i <- matrix(c(1,2,3,3,2,1),3,2)
i

0,1
1,3
2,2
3,1


In [46]:
x[i]

In [47]:
x[i] <- 0
x

0,1,2,3,4
1,5,0,13,17
2,0,10,14,18
0,7,11,15,19
4,8,12,16,20


The situation for multidimensional arrays is analogous—just the appropriate number of dimensions is different.

## 1.8	Lists
Lists are ordered collections of other objects, known as components, which do not have to be of the same mode or type. Thus lists can be viewed as very loose groupings of R objects, involving various types of vectors, data frames, arrays, functions and even other lists. Components are numbered and hence can be addressed using their sequence number given in double square brackets, `x[[3]]`. 
Moreover, components may be named and referenced using an expression of the form list_name$component_name. Subsetting is similar to that of other objects, described above.
#### list.name <- list (component_name_1=, component_name_2=…) 
Builds a list with the given components.

Here is a simple real-life example of a list definition:

In [49]:
x1 <- c("Luckovice","9 km E of Blatna","disused quarry")
x2 <- "melamonzonite"
x3 <- c(47.31,1.05,14.94,7.01,8.46,10.33)
names(x3) <- c("SiO2","TiO2","Al2O3","FeO","MgO","CaO")
luckovice <- list(ID="Gbl-4",Locality=x1,Rock=x2,major=x3)
luckovice

As well as some examples of subsetting:

In [50]:
luckovice[[1]]

In [51]:
luckovice$Rock # or luckovice[[3]]

In [52]:
luckovice[[2]][3]

In [55]:
luckovice$major[c("SiO2","Al2O3")]

## 1.9	Coercion of individual object types
R is generally reasonably good at dealing seamlessly with data types, converting them on the fly when needed and being able to use the same operators on different data types. When necessary, there are a series of functions for testing the mode or type of an object:
`is.numeric(x)`, `is.character(x)`, `is.logical(x)`, `is.matrix(x)`, `is.data.frame(x)`

At times there is a need to explicitly convert between data types/modes using functions such as:
`as.numeric(x)`, `as.character(x)`, `as.expression(x)`.

Less straightforward are:
`as.matrix(x)`, `as.data.frame(x)` 
which attempt to convert an object `x` to a matrix or data frame, respectively. A more user-friendly way of converting data frames to matrices is provided by the function data.matrix that converts all the variables in a data frame x to numeric mode and then binds them together as the columns of a matrix.

## 1.10	 Factors
Factors are vector objects used for discrete classification (grouping) of compo-nents in other vectors of the same length, matrices or data frames. In statistical applications, these often serve as categorical variables. 
### 1.10.1	Basic usage of factors
#### factor(x)
The (unordered) factors are set by the function factor where x is a vector of da-ta, usually containing a small number of discrete values (known as levels). In this case the levels are stored in alphabetical order. For instance:

In [61]:
x <- c("Pl","Bt","Pl","Pl","Kfs","Pl","Bt","Pl",NA)
x.un <- factor(x)
x.un

### ordered(x, levels)
This function defines a special type of factor in which the order of levels is speci-fied explicitly using the namesake parameter. Following the previous example:

In [62]:
x.or <- ordered(x,c("Pl","Kfs","Bt"))
x.or

#### levels(x)
Returns all possible values (levels) of the factor `x`. 

In [63]:
levels(x.un)
levels(x.or)