# Part 1: Introduction to Jupyter

Hello! This is Jupyter notebooks, a platform we will be using for this course to access and complete data-intensive assignments. In this course, these assignments will be designed to give you familiarity with how to use the programming language `R` to make sense of large amounts of data, to verify and interpret statistics you encounter, and 

## Jupyter Functionalities

### Running a cell
Try running the cell below by pressing (`Command` + `Enter`) on Mac or (`Control` + `Enter`) on Windows. 

In [None]:
# RUN THIS CELL
print("hello world")

### Changing the format of a cell

Cells in Jupyter can be code chunks or markdown chunks. Markdown (like this code chunk) is used to present text in a formatted manner. Learning markdown is outside the scope of this class, but in case you accidentally change the formatting of a cell in an assignment, practice changing the cell below from code to markdown. Then from markdown back to code. 

In [None]:
### CHANGE ME TO MARKDOWN
### THEN BACK TO CODE

<!-- BEGIN QUESTION -->

### Text Answers
Manually graded questions will look like the cell below. You don't need to do any fancy formatting. If you're interested though, there is a cheatsheet in the [section discussion folder](https://bcourses.berkeley.edu/courses/1536438/files/folder/Section%20Notes/Coding%20Cheatsheets?preview=89473315) on how to format Markdown cells. 

**Double click this cell to see some basics:**

List
* Bullet point 1
* Bullet point 2

Numbering
1. item 1
2. item 2

Text formatting
* *italics*
* **bold**
* underline is complicated

First line\
Use the "\\" to create a newline

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Inserting Cells Above and Below
If you want to insert cells above or below, press `A` for above, and `B` for below. Try inserting a cell below this cell.  

Use `X` to delete a cell. 

### Saving your work
`Command` + `S` to save your work. Try saving this file (the save status is indicated at the bottom bar)

---

# Part 2: Introduction to R
We might not get through this entire section. I will upload a video on Media Gallery for those this is completely new for. 

## **R Foundations**
This is an incredibly brief overview of R tailored to the needs of this class. Look at the bcourses for some additional R resources.
We will learn:
1. What are the data types in R?
2. What are vectors in R?
3. What are packages and functions in R?
4. How do you look up functions?
5. Operators review

--- 

### Data Types
There are different data types in R. The main ones we will be working with are:

1. Characters, denoted by "quotation marks"
2. Integers, numeric with no decimals
3. Doubles, numeric with decimals
4. Logicals, `TRUE` or `FALSE`, `T` or `F`

In [None]:
# character
"hey hey"
'hey hey'

# integer
1

# double
1.5

# logical
TRUE
FALSE
T
F

**Look at the code block below. Why is it throwing an error?**

In [None]:
"1" + "2"

In [None]:
# TO DO: Fix the error in the code block above

"1" + "2" # FIX THIS CODE

In [None]:
# Why is this throwing an error? Can you fix it so it doesn't?
hello

**In any coding language, TRUE is equivalent to 1, and FALSE is equivalent to 0. So we can actually do the following.**

In [None]:
TRUE + TRUE + FALSE

---
### Variables
**We often want to assign values to a variable. We can do so by using the assignment operator `<-`. For good practice, we want to use variable names that are meaningful.**

In [None]:
# EXAMPLE: assign "maya" to name
name <- "maya"

# EXAMPLE: reassign "lu" to name
name <- "lu"

**What is the value of name now? Run the chunk below to find out.**

In [None]:
name

In [None]:
# TO DO: reassign name to your name
name <- NULL # YOUR ANSWER HERE
name

**We can assign data types to variable names.**

In [None]:
# EXAMPLE
number <- 1
logical <- TRUE

# TO DO: Assign your favorite number to fav_num, replacing the NULL
fav_num <- NULL # YOUR ANSWER HERE
fav_num

**We can use variable names in functions.**

In [None]:
# using math operators
number + fav_num

---
### Vectors
You can also store a list (here called a vector) of items in them. You can create lists by using `c(element_a, element_b, element_c)`

In [None]:
# EXAMPLE
vote_margin <- c(0.1, 0.2, 0.3, 0.05, 0.01)
regions <- c("east asia and the pacific", "europe", "middle east and north africa", 
             "north america", "south america", "south asia", "sub-saharan africa")

# YOUR TURN, create a list of your hobbies
my_hobbies <- NULL # YOUR ANSWER HERE

In [None]:
# display the vectors vote_margin, regions
vote_margin
regions

In [None]:
# type "my_hobbies" to display your vector


---
### What are Packages? Why do I need them?

Packages are a collection of functions in R. Packages are stored in libraries. The terms package and library are sometimes used synonymously. You load a package using `library(package_name)`.

 In each assignment, you will run a code chunk that loads packages. These are packages that others have developed to make our lives easier. Rather than having to write code from scratch for complex problems, others have already simplified the process for us.

In [None]:
# RUN THIS CELL
library(testthat) 
library(tidyverse) 



### What are Functions?
Functions are given an input (aka parameter, argument), does something with it, then gives you an output. You use functions to tell R what to do!


Let's take a look at some basic functions below. What does each one do?

### print() function

**Q1) Print "hello world"**

In [None]:
# EXAMPLE
print("oh, hey there!")

In [None]:
# YOUR ANSWER HERE
# print hello world


### Descriptive Statistics functions

**Q2) Use the `sum()` function. Take the sum of $2, 5, 10$, assigning it to `my_sum`**

In [None]:
# EXAMPLE
data <- c(1,2,3)
example_sum <- sum(data)

# display
example_sum

In [None]:
# YOUR ANSWER HERE: Take the sum of 2, 5, and 10
data <- NULL # YOUR CODE HERE
my_sum <- NULL # YOUR CODE HERE

# display
my_sum

In [None]:
. = ottr::check("tests/q3.R")

**Q3) Using the `mean()` function, take the mean of 5,2,3,1,19, assigning the value to `my_mean`**

In [None]:
# Example
data <- c(1,2,3,4,5)
example_mean <- mean(data)

# display
example_mean

In [None]:
# Why does this give the wrong answer?
mean(1, 2, 3, 4, 5)

In [None]:
# YOUR ANSWER HERE: take the mean of 5,2,3,1,19
data <- NULL # YOUR CODE HERE
my_mean <- NULL # YOUR CODE HERE

# display
my_mean

In [None]:
. = ottr::check("tests/q3.R")

**Q4) Take the mean of female and store it in `prop_female`. Hint: You need to remove the NA values first by using `mean(x, na.rm = T)`** \

In [None]:
# RUN THIS CELL
# I'm loading in data for you
ctdc <- read.csv("ctdc_vp_2021.csv")
female<- ctdc %>% select(gender) %>%
        mutate(gender = case_when(gender == "" ~ NA, 
                                gender == "Female" ~ 1,
                                TRUE ~ 0)) %>% pull()

In [None]:
# number of elements in female
length(female)

In [None]:
# Note that there are missing values in female!!
table(female, useNA = "ifany")

In [None]:
# YOUR ANSWER HERE: Find the mean of female. What happens when you don't include na.rm?
prop_female <- NULL # YOUR CODE HERE

# display your answer
prop_female

In [None]:
. = ottr::check("tests/q4.R")

**Q6) Take the max and min of 5,2,3,1,19**

In [None]:
# EXAMPLE
max(c(1,2,3,4,5))
min(c(1,2,3,4,5))

Why do some functions need to be wrapped in a c(), while others do not? Just the way they are set up. To be safe, always use the c()

In [None]:
# Your answer here: take the max of 5,2,3,1,19
data <- NULL # YOUR CODE HERE
my_max <- NULL # YOUR CODE HERE
my_min <- NULL # YOUR CODE HERE

# display
my_max
my_min

In [None]:
. = ottr::check("tests/q6.R")

### Looking up Functions
If you do not know what a function does, or what parameters a function takes, you can look it up by using `?function_name`. For example...

You can also google! There is a lot of documentation for R online. 

In [None]:
## TODO: Lookup any function


---
## Operators


R categorizes into:
* Arithmetic operators 
    * `+` : Add
    * `-` : Subtract
* Assignment operators
    * `<-`: Assignment operator (as reviewed above)
* Comparison operators
    * `!=`: Does not equals
    * `>=`: Greater than or equal to
    * `<=`: Less than or equal to
    * `>`: Greater than
    * `<`: Less than
* Logical operators
    * `|` : OR - returns true if at least one is TRUE
    * `&` : AND - returns true if both are TRUE
* Miscellaneous operators

In [None]:
# EX: TRUE OR FALSE
TRUE | FALSE

# TO DO: FALSE OR FALSE


In [None]:
# EX: TRUE AND FALSE
TRUE & FALSE

# TO DO: FALSE AND FALSE


In [None]:
# EX: "A" EQUALS "A"
"A" == "A"

# TO DO: "A" EQUALS "B"


In [None]:
# EX: "A" DOES NOT EQUALS "A"
"A" != "A"

# TO DO: "A" DOES NOT EQUAL "B"


In [None]:
# EX: 2 GREATER THAN OR EQUAL TO 1
2 >= 1

# TO DO: 2 LESS THAN 1


## Next Time
Keep building on these fundamentals, and start working with datasets in R. Next discussion session will be designed to help you get through your first problem set.  