# quanteda/tutorials.quanteda.io

Fetching contributors…
Cannot retrieve contributors at this time
289 lines (204 sloc) 5.5 KB
title weight draft
R commands
10
false

You do not need to have advanced knowledge of the R programming language to perform text analysis with quanteda because the package has wide range of functions. However, you still have to understand a number of basic R commands.

## Basic R objects and commands

R has three types of objects: vector, data frame and matrix. Since many of the quanteda objects behave similarly to these objects, it is essential for you to understand how to interact with them.

### Vectors

As a language for statistical analysis, R's most basic objects are vectors. Vectors contain a set of values. In the examples below, `num_vec` is a numeric vector, while `char_vec` is a chracter vector. We use `c()` to combine elements of a vector and `<-` to assign a vector to a variable.

```num_vec <- c(1, 5, 6, 3)
print(num_vec)```
``````## [1] 1 5 6 3
``````
```char_vec <- c('apple', 'banana', 'mandarin', 'melon')
print(char_vec)```
``````## [1] "apple"    "banana"   "mandarin" "melon"
``````

Once a vector is created, you can extract elements of vectors with the `[]` operator and index numbers of desired elements.

`print(num_vec[1])`
``````## [1] 1
``````
`print(num_vec[1:2])`
``````## [1] 1 5
``````
`print(char_vec[c(1, 3)])`
``````## [1] "apple"    "mandarin"
``````

You can apply arithmetical operations such as addition, subtraction, multiplication or division on numeric vectors. If only a single value is given for multiplication, for example, each element of the vector will be multiplied by the same value.

```num_vec2 <- num_vec * 2
print(num_vec2)```
``````## [1]  2 10 12  6
``````

You can also compare elements of a vector by relational operators such as `==`, `>=`, `>`, `<=`, `<`. The result of these operations will be a logical vector that contains either `TRUE` or `FALSE`.

```logi_gt5_vec <- num_vec >= 5
print(logi_gt5_vec)```
``````## [1] FALSE  TRUE  TRUE FALSE
``````

You cannot apply arithmetical operations on character vectors, but can apply the equality operator.

```logi_apple_vec <- char_vec == 'apple'
print(logi_apple_vec)```
``````## [1]  TRUE FALSE FALSE FALSE
``````

You can also concatenate elements of character vectors using `paste()`. Since the two vectors in the example have the same length, elements at the same positions of the vectors are concatenated.

```char_vec2 <- paste(c('red', 'yellow', 'orange', 'green'), char_vec)
print(char_vec2)```
``````## [1] "red apple"       "yellow banana"   "orange mandarin" "green melon"
``````

Finally, you can set names to elements of a numeric vector using `names()`.

```names(num_vec) <- char_vec
print(num_vec)```
``````##    apple   banana mandarin    melon
##        1        5        6        3
``````

### Data frames

A data frame combines multiple vectors to construct a dataset. You can combine vectors into a data frame only if they have the same lengths. However, they can be different types. `nrow()` and `ncol()` show the number of rows (observations) and variables in a data frame.

```fruit_df <- data.frame(name = char_vec, count = num_vec )
print(fruit_df)```
``````##              name count
## apple       apple     1
## banana     banana     5
## mandarin mandarin     6
## melon       melon     3
``````
`print(nrow(fruit_df))`
``````## [1] 4
``````
`print(ncol(fruit_df))`
``````## [1] 2
``````

You can use `subset()` to select records in the data frame.

```fruit_df2 <- subset(fruit_df, count >= 5)
print(fruit_df2)```
``````##              name count
## banana     banana     5
## mandarin mandarin     6
``````
`print(nrow(fruit_df2))`
``````## [1] 2
``````
`print(ncol(fruit_df2))`
``````## [1] 2
``````

{{% notice tip %}} We use `print()` to show values and structures of objects in the examples, but you do not need to use the `print()` command in the console, because it is triggered automatically when objects are returned to the global environment. {{% /notice %}}

### Matrices

Similar to a data frame, a matrix contains multi-dimensional data. In contrast to a data frame, its values must all be the same type.

```mat <- matrix(c(1, 3, 6, 8, 3, 5, 2, 7), nrow = 2)
print(mat)```
``````##      [,1] [,2] [,3] [,4]
## [1,]    1    6    3    2
## [2,]    3    8    5    7
``````

You can use `colnames()` or `rownames()` to set/retrieve names to rows or columns of a matrix.

```colnames(mat) <- char_vec
print(mat)```
``````##      apple banana mandarin melon
## [1,]     1      6        3     2
## [2,]     3      8        5     7
``````
```rownames(mat) <- c('bag1', 'bag2')
print(mat)```
``````##      apple banana mandarin melon
## bag1     1      6        3     2
## bag2     3      8        5     7
``````

You can obtain the size of a matrix by `dim()` that returns a two-element numeric vector.

`print(dim(mat))`
``````## [1] 2 4
``````

If a matrix has column and row names, you can extract rows or columns by their names.

`print(mat['bag1', ])`
``````##    apple   banana mandarin    melon
##        1        6        3        2
``````
`print(mat[, 'banana'])`
``````## bag1 bag2
##    6    8
``````

Finally, you can obtain marginals of matrix by `colSums()` or `rowSums()`.

`print(rowSums(mat))`
``````## bag1 bag2
##   12   23
``````
`print(colSums(mat))`
``````##    apple   banana mandarin    melon
##        4       14        8        9
``````

{{% notice tip %}} If you want to know the details of R commands, prepend `?` to the command and execute. For example, `?subset()` will show you how to use the subset function with different types of objects. {{% /notice %}}