# Data Types

One of the features of computers than can be frustrating to new programmers is their inflexibility, and one place this inflexibility is most evident is in the fact that programs like R store data in discrete *types*. In this section, we'll go over the three data types you'll encounter most -- `numeric`, `character`, and `logical` types -- as well as what it means for R to have these discrete data types. 

## Types and Their Uses

The three types of data you'll work with most in R are `numeric`, `character`, and `logical`, and each has an important role to play in for the social science researcher. 

`numeric` data, as the name implies, is data that stores numbers. This may be people's ages or incomes, countries' GDPs and infant mortality rates, global temperatures, or survey responses on a scale from 1 to 7. `numeric` data is the only data type we'll talk about here that supports mathematical operations, like multiplying, dividing, adding and subtracting, etc. 

`character` data is text data. It could be something short, like a survey respondent's name, or something longer, like the content of a tweet or a politician's speech.  

Finally, `logical` data takes on only two values: `TRUE` and `FALSE` (note those values have to be written in all capitals to be recognized by R!). `logical` data can store information about the world (you could have a variable called `female` that only has values of `TRUE` and `FALSE`), but more often we use it for testing. For example, suppose we wanted to test whether a survey respondent's age is over 18 to evaluate whether they are eligible to vote. We'd probably do something like ask whether `age >= 18`, which would evaluate to `TRUE` or `FALSE`. 

## Working with Data Types

Unlike some programming languages, when you assign values to a variable, you don't have to tell R the type of data you're assigning in advance (this is definitely not true for all languages, and something that makes R much easier to use than many other languages!). Instead, R will make inferences based on what you've passed. Namely: 

- If you type out a number, R will assume it's `numeric`. 
  - (If you've worked in other programming languages, you've probably heard these referred to as `floats`.)
- If you type out something and put it in double-quotation marks (`"like this"`), R will treat it as a `character`.
  - (If you've worked in other programming languages, you've probably heard these referred to as `strings`.)
- If R sees `TRUE` or `FALSE`, or an expression that is evaluated to `TRUE` or `FALSE` (like `7 > 3`, which is obvously `TRUE`) it will treat it as logical. 
  - (If you've worked in other programming languages, you've probably heard these referred to as `bools` or `booleans`.)

To illustrate, let's play around a little. Note that we can always check the type of data by passing our data, or the variable to which the data has been assigned, to the `class()` function, which then returns the type of the input. For example:

In [22]:
pi <- 3.1416
class(pi)

In [23]:
mystery_novel <- "T'was a dark and story night"
class(mystery_novel)

In [24]:
my_logical <- 7 < 3
class(my_logical)

It's worth emphasizing here that putting a variable into a function is exactly the same as putting the value assigned to that variable into the function. This is, indeed, one of those core ideas about how most programming languages work: a variable is just a stand-in for the value that has been assigned for it, and R will treat them interchangably. e.g.:

In [27]:
# Evaluating the variable pi
pi <- 3.1416
class(pi)


In [26]:
# Has the same effect as just putting 
# in the value directly!
class(3.1416)

## Operations and Data Types

Data types aren't just about helping R remember what kind of data has been assigned to a variable -- it also affects how some operators (like `+`) are interpreted. 

For example, if I put `+` between two numeric variables, R will do the obvious thing and add them up:

In [3]:
a <- 10
b <- 2
a + b

But if I try and put a `+` between two character variables, R will stop and say "WAIT A MINUTE! I don't know how to add two characters!" 

```r 
a <- "Lyra"
b <- "Belacqua"
a + b 

> Error in a + b : non-numeric argument to binary operator
```

Note that one place that this behavior can be confusing is when R has stored numbers as characters -- something that happens a lot when you are importing a file. That's because, as I said before, computers are *really* inflexible. For example, suppose you had the following code:

```r
a <- "5"
b <- "4"
a + b
```

Now, if I asked you what that should print out, I'm sure you'd say "9", because you're smart and can recognize what I *want*. But R can't do that -- it sees you trying to add two character variables and says "nope, sorry! Can't do that."

```r
a <- "5"
b <- "4"
a + b

> Error in a + b : non-numeric argument to binary operator
```




Which brings us to...

## Converting Data Types

From time to time, you'll want to move between data types, and for that we have a couple special functions, all with the same naming structure: `as.numeric()`, `as.character()`, and `as.logical()`. Each of these will take a variable and *try* to convert it to the type named, and it if can't throw up an error. 

So let's do our example above again using `as.numeric()`:


In [6]:
a <- "5"
b <- "4"
a <- as.numeric(a)
b <- as.numeric(b)
a + b

Ta-da! 

(See how I assigned the return value of `as.numeric(a)` to the variable `a`, overwriting the old value? Remember you have to assign those return values if you want them remembered!)

## More on Logical

OK, before we move on, let's talk a little more about logical data, as its usefulness is perhaps the least evident at the moment. But they're important because of how often in data analysis or data wrangling we want to evaluate logical statements, and the results of those need to take the form of `TRUE` and `FALSE`. 

To illustrate here are a number of examples where logicals come into play:

In [18]:
# Simple math tests, like inequalities
7 > 5

In [19]:
-1 >= 10

The other place logical data types come up a lot are when we want to test whether two things are the same (equal). Because we can use `=` to assign values to variables (see note on assignment operators [here](introduction.ipynb)), we can't test whether two things are equal by typing something like `a = 5` -- because R wouldn't know if you're assigning 5 to `a`, or asking whether the value already assigned to `a` is equal to 5.

So to evaluate whether two things are equal, we use a double-equal sign (`==`). For example:

In [20]:
a <- 5
b <- 5
a == b

In [21]:
c <- 7
a == c

## Exercises

stuff here.