# Base R Part 1
This is a Jupyter Notebook. Essentially, this is a file where you can both write text (like this) and code (as you will see below). The goal of this lecture is to get you comfortable with basic commands in the R programming language, which we will call R for short.


# Comments and Basic Data Types
In this first section, we will discuss writing comments and learn about basic data types (numeric, logical, character, boolean).


In [6]:
# This is a comment, you can use '#' to write notes to yourself in your code
# - Comments can make or break good coders
# - Good comments also create coders who can collaborate with others
# - If you ever think you're writing "too" many comments, you are not
# - The things you think are obvious in your code won't be to others
# - (nor yourself in a year when you get back to a project)

?read.csv

0,1
read.table {utils},R Documentation

0,1
file,"the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). Tilde-expansion is performed where supported. This can be a compressed file (see file). Alternatively, file can be a readable text-mode connection (which will be opened for reading if necessary, and if so closed (and hence destroyed) at the end of the function call). (If stdin() is used, the prompts for lines may be somewhat confusing. Terminate input with a blank line or an EOF signal, Ctrl-D on Unix and Ctrl-Z on Windows. Any pushback on stdin() will be cleared before return.) file can also be a complete URL. (For the supported URL schemes, see the ‘URLs’ section of the help for url.)"
header,"a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns."
sep,"the field separator character. Values on each line of the file are separated by this character. If sep = """" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns."
quote,"the set of quoting characters. To disable quoting altogether, use quote = """". See scan for the behaviour on quotes embedded in quotes. Quoting is only considered for columns read as character, which is all of them unless colClasses is specified."
dec,the character used in the file for decimal points.
numerals,"string indicating how to convert numbers whose conversion to double precision would lose accuracy, see type.convert. Can be abbreviated. (Applies also to complex-number inputs.)"
row.names,"a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered. Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be ‘automatic’ (and not preserved by as.matrix)."
col.names,"a vector of optional names for the variables. The default is to use ""V"" followed by the column number."
as.is,"controls conversion of character variables (insofar as they are not converted to logical, numeric or complex) to factors, if not otherwise specified by colClasses. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. Note: to suppress all conversions including those of numeric columns, set colClasses = ""character"". Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped."
tryLogical,"a logical determining if columns consisting entirely of ""F"", ""T"", ""FALSE"", and ""TRUE"" should be converted to logical; passed to type.convert, true by default."


In [0]:
## using count.fields to handle unknown maximum number of fields
## when fill = TRUE
test1 <- c(1:5, "6,7", "8,9,10")
tf <- tempfile()
writeLines(test1, tf)
tf

In [0]:
# Comments are written in the same spot as code
# But they are ignored by the computer

# Let's run some code
print("Hello, World!")


Congrats! You just ran your first line of code and joined the thousands of people whose first line of code was "Hello, World!" as well. Welcome to the tradition. 

Let's now run our first operation. 

In [0]:
2 + 3

We can see below our code block that the code block "returned" 5. This means that the computer "executed" the line of code `2 + 3`. 

Now, let's create a variable and store a value in it. Creating variables is a core piece of programming. 

In [0]:
# in R, we assign a value to a variable with the following arrow symbol: <-
a <- 2
b <- 3

a + b


In [0]:
# The nice thing about a variable is that we can change the value assigned

# Let's change the value of our variable a
a <- 6

a + b

Let's now discuss some basic data types. 

In [0]:
# Numeric -- integer: no decimal points
myInt <- 1
myInt

# Numeric -- double/float: decimal points
myNum <- 2.4
myNum

In [0]:
# character (string)
myChar_a <- "a"
myChar_a

myChar_b <- 'b'
myChar_b

In [8]:
# You can have a variable that stores a character value,
# but the character value is a number

trick_q <- "1"
trick_q

Notice that the `'1'` has quotation marks around it, as the `'a'` and `'b'` did. This indicates to us that it is a character data type. When you look at the output of `myInt`, it also returns `1` but without the quotation marks because `myInt` is a numeric variable and `trick_q` is a character (or string) variable.


In [9]:
# logical (Boolean/Indicator variable): a true/false statement. 
# Use () to evaluate if something is true or false
myBool_1 <- (3 < 4)
myBool_1

myBool_2 <- (3 > 4)
myBool_2

## Ways to store data types 


In [10]:
# Vector: can only be a vector of one data type (numeric, logical, string)

# numeric vector
myVec_n <- c(1, 2, 3, 4, 5)
myVec_n

# string vector
myVec_s <- c(myChar_a, "b", "c")
myVec_s

# tricky! But a string  
myVec_string <- c(1, "b", "c")
myVec_string # notice the 1 has been made a character because of the "

Vectors are a useful way to store multiple observations. However, they only have a single column of information. In EDS, we are interested in relating different variables to one another (*i.e.,* How does temperature affect a lion's hunting success rate?)

To relate multiple variables together, we need to make collections of data. The most basic way to do this is a **matrix**. 

In [11]:
# We already have a vector of numeric data
myVec_n

# Matrix: should only be a matrix of one data type
myMat_n <- matrix(c(myVec_n, # first elements will come from our pre-existing vector
                6, 7, 8, 9, 10),
              nrow = 2,
              ncol = 5)

myMat_n

0,1,2,3,4
1,3,5,7,9
2,4,6,8,10


We can also collect objects in a "list". Lists are very powerful, but more advanced. For now, just know that they also exist. 

In [12]:
# Lists: Very powerful, but somewhat confusing. For now, just know they exist
myList <- list(2, "c", myMat_n)
myList[[1]] # returns numeric 
myList[[2]] # returns string
myList[[3]] # returns matrix

0,1,2,3,4
1,3,5,7,9
2,4,6,8,10


## Data Frames

- Like matrices
- Can have different data types in each column
- Reference specific columns using the "$" operator, followed by the name of the column
- For the most part, you’ll be loading new data by reading a CSV 
- You might have to create one at some point
- By looking at how they’re created we can get a better sense of what goes into them

Let's convert our matrix into a data frame. 


In [13]:
# Data frame: can have multiple data types
myDF <- as.data.frame(myMat_n)
colnames(myDF) # print the column names

These column names don't mean anything to me. Let's assign some column names. Each column is a variable that you may have collected data. 

In [14]:
colnames(myDF) <- c("num_hunts", "temp", "num_adults", "num_cubs", "distance_from_road")
colnames(myDF)


We can now look at a single column using the '$' operator. 

In [15]:
# Investigate one column
myDF$num_adults

Let's create a new column. 

In [16]:
# Create a new column
myDF$total_lions <- myDF$num_adults + myDF$num_cubs
myDF

num_hunts,temp,num_adults,num_cubs,distance_from_road,total_lions
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,3,5,7,9,12
2,4,6,8,10,14


Let's build a dataframe with multiple datatypes. (Our lion dataframe is only numeric columns).

In [17]:
# Create the data frame
myPpl <- data.frame(
   name = c("Andie", "Sam", "Bill"),
   gender = c("Female", "non-binary", "Male"),
   male = c(FALSE, FALSE, TRUE),
   income_cat = c("middle", "poor", "rich"),
   park_dist_mi = c(1, 0.5, 0.1)
)
myPpl


name,gender,male,income_cat,park_dist_mi
<chr>,<chr>,<lgl>,<chr>,<dbl>
Andie,Female,False,middle,1.0
Sam,non-binary,False,poor,0.5
Bill,Male,True,rich,0.1


Now that we have our dataframe built, we can reference columns in the data frame.

First, there are multiple ways to reference a column. The first is to reference by the column name.

In [18]:
# Try referencing one column
myPpl$name # version 1

The second is the column number in the data frame using square bracket notation `[row, column]`

In [19]:
myPpl[, 1] # version 2


We can also reference a single row, instead of a column. 

In [20]:
# Try referencing one row
myPpl[1, ]

Unnamed: 0_level_0,name,gender,male,income_cat,park_dist_mi
Unnamed: 0_level_1,<chr>,<chr>,<lgl>,<chr>,<dbl>
1,Andie,Female,False,middle,1


Finally, let's try to reference a single cell. Like referencing columns, there are two ways to do this. 

In [21]:
# Try referencing one cell
myPpl$name[2] # version 1: combination of $ and []

In [22]:
myPpl[2, 1] # version 2

# A word of caution 

- Make sure you don't overwrite your variables by accident. 


In [23]:
# Assigning new value to same variable (something to do carefully)
a <- 5
a <- a + 1 # If you run this line more than once, you will NOT get six
a

# Assigning new value to new variable
a <- 5
a_new <- a + 1 # If you run this line more than once, you WILL get six
a_new
