### Update to newest tidyverse
- You will need to use the *conda-forge* version of the ```R``` implementation for Jupyter Notebook
 * More up to date (has tidyverse 1.3)
 * Not official implementation from ```R```
- You will have to use the command line to perform this update

1. Open command line interface (Terminal for Mac, Command Prompt for Windows)
2. Activate your R environment, for example"

```conda activate "base_R"```

3. Run the following two lines to enable conda to download packages from conda-forge.

```conda config --add channels conda-forge```

```conda config --set channel_priority strict```

4. Now run the following line to update tidyverse. Answer ```y``` when it asks yes or no to proceed.

```conda update r-tidyverse```

5. Launch Jupyter notebook

In [1]:
# colors!
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.6     [32m✔[39m [34mdplyr  [39m 1.0.4
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
# More informative tibble display
diamonds

carat,cut,color,clarity,depth,table,price,x,y,z
<dbl>,<ord>,<ord>,<ord>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>
0.23,Ideal,E,SI2,61.5,55,326,3.95,3.98,2.43
0.21,Premium,E,SI1,59.8,61,326,3.89,3.84,2.31
0.23,Good,E,VS1,56.9,65,327,4.05,4.07,2.31
0.29,Premium,I,VS2,62.4,58,334,4.20,4.23,2.63
0.31,Good,J,SI2,63.3,58,335,4.34,4.35,2.75
0.24,Very Good,J,VVS2,62.8,57,336,3.94,3.96,2.48
0.24,Very Good,I,VVS1,62.3,57,336,3.95,3.98,2.47
0.26,Very Good,H,SI1,61.9,55,337,4.07,4.11,2.53
0.22,Fair,E,VS2,65.1,61,337,3.87,3.78,2.49
0.23,Very Good,H,VS1,59.4,61,338,4.00,4.05,2.39


In [None]:
# gather() function replaced by pivot_longer() for data tidying (we will discuss this soon)
?pivot_longer

### Jupyter Notebook Widgets
(**optional**)
- You can add fancy add-ons to Jupyter Notebook using ```nbextensions```.

1. Install ```ipywidgets```. Run and hit ```y``` when prompted:

```conda install ipywidgets```

2. Install ```nbextensions```. Run and hit ```y``` when prompted:

```conda install jupyter_contrib_nbextensions```

3. Launch Jupyter notebook.

4. ![](https://github.com/mgruddy/Intro_Data_ScienceR_Spring2021/blob/main/Slides/Screenshots/Nbextensions.png?raw=true)

- Be careful if you enable certain widgets and then try to share your .ipynb file. If the other party doesn't have the same Nbextensions enabled, it can cause issues.
- For example, frozen cells may remain frozen, but the recipient of your .ipynb file may not be able to unfreeze.

### Reminders
- There is an online textbook
- There is a GitHub page with suggested reading in this textbook (I'll try to also include in this notebook file)

# Data Structures in R
(Chapters 20 and 10 in the Textbook)

|             |Homogeneous      | Hetereogeneous  |
| ----------- | --------------- |---------------  |
| 1D          | (Atomic) Vector | List            |
| 2D          | Matrix          | Dataframe/Tibble|
| 3D          | Arrays          |                 |

- Homogeneous: All elements are of the same *type*
- Heterogeneous: Elements can be differents *types*


**Four Common Types**

- Integer: integer values
- Double: real numbers
- Logical: Boolean values (TRUE or FALSE)
- Character: text-strings ("price", "hello", "12")

In [3]:
print(mpg)

[90m# A tibble: 234 x 11[39m
   manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
   [3m[90m<chr>[39m[23m        [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m
[90m 1[39m audi         a4         1.8  [4m1[24m999     4 auto(l… f        18    29 p     comp…
[90m 2[39m audi         a4         1.8  [4m1[24m999     4 manual… f        21    29 p     comp…
[90m 3[39m audi         a4         2    [4m2[24m008     4 manual… f        20    31 p     comp…
[90m 4[39m audi         a4         2    [4m2[24m008     4 auto(a… f        21    30 p     comp…
[90m 5[39m audi         a4         2.8  [4m1[24m999     6 auto(l… f        16    26 p     comp…
[90m 6[39m audi         a4         2.8  [4m1[24m999     6 manual… f        18    26 p     comp

In [4]:
mpg %>%
  mutate(old = (year < 2000)) %>%
  print(width = Inf)

[90m# A tibble: 234 x 12[39m
   manufacturer model      displ  year   cyl trans      drv     cty   hwy fl   
   [3m[90m<chr>[39m[23m        [3m[90m<chr>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m
[90m 1[39m audi         a4           1.8  [4m1[24m999     4 auto(l5)   f        18    29 p    
[90m 2[39m audi         a4           1.8  [4m1[24m999     4 manual(m5) f        21    29 p    
[90m 3[39m audi         a4           2    [4m2[24m008     4 manual(m6) f        20    31 p    
[90m 4[39m audi         a4           2    [4m2[24m008     4 auto(av)   f        21    30 p    
[90m 5[39m audi         a4           2.8  [4m1[24m999     6 auto(l5)   f        16    26 p    
[90m 6[39m audi         a4           2.8  [4m1[24m999     6 manual(m5) f        18    26 p    
[90m 7[39m audi        

In [6]:
# Create a real-valued vector
(x <- c(1.1, 2, 3.3))

In [7]:
# Check the type
typeof(x)

In [8]:
# Check if it is a vector
is_atomic(x)

# Check if it is a real-valued atomic vector
is_double(x)

In [9]:
# How many elements are in the vector?
length(x)

In [10]:
# Create an integer vector
(y <- c(1L, 2L, 3L, 10L))
typeof(y)
length(y)

In [13]:
# Create a logical-valued vector
(z <- c(TRUE, T, F, FALSE))
typeof(z)

# This will return an error
#(z <- c(TRUE, T, F, False))

In [15]:
# Create a character vector
(w <- c("hello", "world"))
typeof(w)
length(w)

In [16]:
# fancier creation methods
c(1:10)  # vector of integers 1 to 10, equivalent to seq(1, 10)
seq(2, 10, 2)   # vector of even integers 2 to 10

In [17]:
?seq

In [19]:
(x <- c(1:4))
(y <- c(1, 2, 3, 4))

typeof(x)
typeof(y)

Note: Differences between integer and real values are important when comparing values

In [20]:
# floating point operations are never exact
(x <- sqrt(2)^2)
x - 2
x == 2

# use dply::near() to compare
near(x,2)

In [22]:
x <- c(1)
is_atomic(x)

In [23]:
# there are no scalars in R, just one-element vectors
is_atomic(1)
is_double(2.2)
is_character("hello world")

In [24]:
length(200)
length("hello world")

### Vector Coercion

- Vectors can be coerced *upstream*
- If you create a vector with mixed types, everything will be coerced to the most flexible type
- TRUE and FALSE values are coerced to 1 and 0 respectively

Most flexible to least flexible:
1. Character
2. Double
3. Integer
4. Logical

In [25]:
# everything is coerced to character values
(x <- c(1L, 2.2, TRUE, "hello"))
typeof(x)

# everything is coerced to real values
(y <- c(1L, FALSE, 1.1))
typeof(y)

In [26]:
# We can explicitly coerce vectors using ```as```
(x <- as.integer(c(TRUE, FALSE)))
(y <- as.character(c(1L, 2L, 10L)))
typeof(x)
typeof(y)

In [28]:
x <- c(1,2)
typeof(x)
typeof(as.integer(x))

In [29]:
# This helps with tibbles as well

print(mpg)

mpg %>%
  mutate(cty = as.double(cty)) %>%
  print()

[90m# A tibble: 234 x 11[39m
   manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
   [3m[90m<chr>[39m[23m        [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m
[90m 1[39m audi         a4         1.8  [4m1[24m999     4 auto(l… f        18    29 p     comp…
[90m 2[39m audi         a4         1.8  [4m1[24m999     4 manual… f        21    29 p     comp…
[90m 3[39m audi         a4         2    [4m2[24m008     4 manual… f        20    31 p     comp…
[90m 4[39m audi         a4         2    [4m2[24m008     4 auto(a… f        21    30 p     comp…
[90m 5[39m audi         a4         2.8  [4m1[24m999     6 auto(l… f        16    26 p     comp…
[90m 6[39m audi         a4         2.8  [4m1[24m999     6 manual… f        18    26 p     comp

1. Create the following vectors:
- A vector of your name and age
- A vector of your seven favorite numbers
- A vector of logical values

Try to guess the type and length of each, before checking it explicitly using ```typeof()``` and ```length()```.

2. Run both ```?seq``` and ```?rep``` to see some more fancy ways of creating vectors.

In [33]:
x <- c("Mike", 29)
typeof(x)
length(x)

In [35]:
y <- c(7, 2.71828, 64, 1, 2, 3, 4)
typeof(y)
length(y)

In [36]:
z <- c(TRUE, T, F, FALSE)
typeof(z)
length(z)

### Subsetting Vectors

- You can use ```[```...```]``` to extract elements of vectors
- Indexing in R starts at 1

In [43]:
x <- c('one', 'two', 'three', 'four', 'five')
x[1]     # same as x[0] in Python
x[4]

In [44]:
# This returns a one-element vector (not a scalar!)
is_atomic(x[1])

In [45]:
# extract elements 2 through 4 (including 2 and 4!)
x[2:4]

In [48]:
# You can subset using other vectors
x[c(1,3)]
x[c(1,1,1,1,2,1)]

In [49]:
# negative values drop elements
x[c(-1,-2)]

# order is not important
x[c(-2,-1)]

In [50]:
# comparison statements involving vectors return logical vectors
y <- c(1.1, 1.2, 2, 5.5)
y > 1.5

In [54]:
# Logical vectors can be used to subset vectors
x[c(F,T,T,F,F)]

In [55]:
# We can combine this to get statements like this:
# all elements of y, greater than 1.5
y[y > 1.5]

# this is equivalent to
y[c(F,F,T,T)]

In [56]:
c(1, 2, 3)

In [58]:
# you can add names to the elements of a vector
(x <- c(a = 1, b = 2, c = 3))

# this can be used to subset the vector
x[c('c', 'a')]

# get a vector of the names
names(x)