# NB: Data Types

# R Data Types and Operators

There are several basic R data types.

-   Numeric
-   Integer
-   Complex
-   Logical
-   Character

## Numeric

Floating point numbers are called "numerics" in R.

It is the **default** data type.

If we assign a decimal value to a variable x, x will be of numeric type:

In [201]:
x <- 10.5      # assign a decimal value 
x              # print the value of x 

In [132]:
class(x)      # print the class name of x 

Even if we assign an integer to a variable k, it is will still be saved
as a numeric value.

In [133]:
k <- 1
k              # print the value of k 

In [134]:
class(k)       # print the class name of k 

That k is not an integer can be confirmed with `is.integer()`:

In [135]:
is.integer(k)  # is k an integer?

## Integers

To create an integer variable in R, we use `as.integer()`.

In [204]:
y <- as.integer(3) 
y              # print the value of y 

In [205]:
class(y)       # print the class name of y 
is.integer(y)  # is y an integer? 

We can also declare an integer by appending an `L` suffix.

In [206]:
y <- 3L 
is.integer(y)  # is y an integer? 

We can coerce, or cast, a numeric value into an integer with
`as.integer()`.

In [207]:
as.integer(3.14)    # coerce a numeric value 

And we can parse a string for decimal values in much the same way.

In [208]:
as.integer("5.27")  # coerce a decimal string 

On the other hand, you can't parse a non-decimal string.

In [209]:
as.integer("Joe")   # coerce an non-decimal string 

“NAs introduced by coercion”


We can convert booleans to numbers this way, too.

In [210]:
as.integer(TRUE)    # the numeric value of TRUE 
as.integer(FALSE)   # the numeric value of FALSE 

## Math Operators

Numerics and integers are subject to the standard array of arithmetic
operations.

| **Operator**   | **Description**             |
|----------------|-----------------------------|
| **+**          | addition                    |
| **-**          | subtraction                 |
| **\***         | multiplication              |
| **/**          | division                    |
| **\^ or \*\*** | exponentiation              |
| **x %% y**     | modulus (x mod y) 5%%2 is 1 |
| **x %/% y**    | integer division 5%/%2 is 2 |

## Logical (Boolean)

A logical value is often produced from the **comparison** between
values.

In [211]:
x <- 1
y <- 2      # sample values 
z <- x > y  # is x larger than y? 
z           # print the logical value 

In [212]:
class(z)       # print the class name of z 

## Logical Operators

The standard logical operations are `&` (and), `|` (or), and `!`
(negation).

In [213]:
u <- TRUE
v <- FALSE
u & v          # u AND v 

In [214]:
u | v          # u OR v 

In [147]:
!u             # negation of u 

Note that you can use `T` and `F` instead of `TRUE` and `FALSE`.

In [148]:
a <- T
b <- F
a & b

## Characters

A character object is used to represent string values in R.

This may confusing if you are coming from a language where 'character'
means an individual character, such as `A`.

We may convert non-character objects into characters with the
`as.character()` function:

In [149]:
x <- as.character(3.14) 
x

In [150]:
class(x)       # print the class name of x 

## `paste()`

Two character values can be concatenated with the `paste()` function.

R does *not* overload the `+` operator.

In [151]:
fname <- "Joe"
lname <-"Smith" 
paste(fname, lname) 

`paste()` takes a `sep` argument:

In [152]:
paste("A", "B", "C", sep="--")

## `sprintf()`

It is often convenient to create a readable string with the `sprintf()`
function, which has a C language syntax.

In [153]:
sprintf("%s has %d dollars", "Sam", 100) 

## `substr()`

To extract a substring, we apply the `substr()` function.

Here is an example showing how to extract the substring between the
third and twelfth positions in a string.

In [154]:
substr("Mary has a little lamb.", start=3, stop=12) 

## `sub()`

And to replace the first occurrence of the word "little" by another word
"big" in the string, we apply the `sub()` function.

This function can use regular expressions.

In [155]:
sub("little", "big", "Mary has a little lamb.") 

# Factors

- Implicitly, a kind of **data structure**, since they organize data types.

  - They can store both strings and integers. 

- The categorize data as **levels**. 

  - Levels are **distinct**, like sets in Python or LOVs in SQL.

  - Levels are **stored** alongside the vector.

  - Levels **constrain** what can be added to the factor vector.

  - Levels are always **characters**, even when the data are numeric or boolean 

- Useful in data frame columns that have a list of values which make sense as a group. 

    - Male, Female
    
    - True, False
    
    - 1, 2, 3, 4, 5

- Created with `factor()` taking a vector as input.

- Analagous to **Categories in Pandas**.

**Example**

Take a vector of integers.

In [1]:
v1 = c(1,5,6,9,4,3,5,8,7,6,3,0,0,0,1,2,3,6,4,5,7,9)

See that it has not levels associated with it.

In [2]:
levels(v1)

NULL

Convert the vector in a factor.

In [3]:
f1 = factor(v1)

Now see that it has extracted a distinct list of items and converted them to strings.

In [4]:
levels(f1)

Printing the factor shows that the object contains two structures:

In [5]:
print(f1)

 [1] 1 5 6 9 4 3 5 8 7 6 3 0 0 0 1 2 3 6 4 5 7 9
Levels: 0 1 2 3 4 5 6 7 8 9


The object has properties accessible by functions.

In [6]:
nlevels(f1)

Note that levels act as a constraint on the factor vector.

So, if we want to edit the factor to have a value that is not in the distinct list of levels, we get an error.

In [7]:
f1[5]

In [8]:
f1[5] <- 20

“invalid factor level, NA generated”

Note that we also blow away the original value!

In [9]:
f1

Be careful out there.