# Introduction to R

And to Jupyter Notebooks?

Consider going through https://learnxinyminutes.com/docs/r/.

## Step 1: Install R

Download the latest version of R from https://cloud.r-project.org/index.html (or RStudio from https://www.rstudio.com/products/rstudio/download/#download), if you don't already have it.

Don't worry about getting Jupyter Notebook. It's a great tool for making documents like this though, so consider installing it. Go to https://developers.refinitiv.com/en/article-catalog/article/setup-jupyter-notebook-r and just do steps 1 - 4 (after that is stuff for the "Refinitiv" tools, which don't interest us).

## Basic arithmetic

In [None]:
1 + 3.14159

In [None]:
pi / 2

$e^1$

In [None]:
exp(1)

In [None]:
exp(1i)

In [None]:
exp(1) < pi

## Types

In [None]:
class(3)

In [None]:
class("three")

In [None]:
class(TRUE)

## Variables

In [None]:
aa = 3

In [None]:
aa

In [None]:
aa <- 3

In [None]:
aa

## Vectors

In [None]:
c(1, 2, 3, 4)

In [None]:
xx = c(2, 3, 5, 7)

In [None]:
xx + 1

In [None]:
2*xx

In [None]:
xx^2

In [None]:
yy = c(1, 2, 3, 4)

In [None]:
xx * yy

In [None]:
mean(xx)

In [None]:
median(xx)

In [None]:
sqrt(sum((xx - mean(xx))^2) / (length(xx) - 1))

In [None]:
xx < 5

In [None]:
class(xx)

In [None]:
xx

In [None]:
xx[1]

In [None]:
xx[c(2, 3)]

In [None]:
xx[2:3]

In [None]:
xx[c(T, T, F, F)]

In [None]:
xx[xx < 5]

## Plotting (1)

In [None]:
?plot

In [None]:
plot(xx)

In [None]:
plot(c(2, 4, 6, 8), xx)

In [None]:
plot(xx, xx)

In [None]:
plot(xx, xlab="My x-axis", ylab="My y-axis")

In [None]:
plot(xx, type='l')

In [None]:
?points

In [None]:
plot(xx, pch=10, cex=50)

## Control structures

In [None]:
aa

In [None]:
if (aa > 2) {
    print("OK!")
} else {
    print("No way.")
}

In [None]:
yy = rep(0, 4)
for (ii in 1:length(xx)) {
    yy[ii] = xx[ii]^2
}

In [None]:
yy

In [None]:
for (x in xx) {
    if (x > 2)
        print(x)
}

## An equation

**How would you calculate isoelastic utility, if you know consumption?**

Version 1: $$u(c) = \frac{c^{1 - \eta}}{1 - \eta}$$

Version 2: $$u(c) = \begin{cases} \frac{c^{1 - \eta}}{1 - \eta} & \text{if $\eta \ne 1$} \\ \ln(c) & \text{if $\eta = 1$} \end{cases} $$

Let $$\eta = 1.45, c = 10000$$

In [None]:
uu = ???

In [None]:
cc = seq(1000, 100000, by=1000)

Inverse of isoelastic utility?

$$c(u) = \left((1 - \eta) u\right)^{\frac{1}{1 - \eta}}$$

## Data frames

In [None]:
df = data.frame(temp=c(3, 5, 7), income=c(5, 6, 7))

In [None]:
write.csv(df, "myfile.csv")

In [None]:
read.csv("myfile.csv")

In [None]:
df$temp

In [None]:
df$temp == c(3, 5, 7)

In [None]:
df$temp[3]

In [None]:
df$income[df$temp < 7]

## Some more equations

**Calculate the radiative forcing from CO2**

$$F = R \frac{1}{\log{2}} \log{\left(\frac{C_1}{C_0}\right)}$$

 - R is the (additional) forcing from CO2 twice pre-industrial levels, at equilibrium (3.8 W m-2).
 - C_0 is the concentration of CO2 at pre-industrial levels (280 ppm)
 - C_1 is the new concentration of CO2.

In [None]:
df = read.csv("monthly_flask_co2_mlo.csv")

In [None]:
head(df)

In [None]:
plot(df$Continuous.Date, df$CO2, type='l')

In [None]:
plot(df$Continuous.Date, df$CO2.filled, type='l')

In [None]:
yearly = df[df$Mn == 1,]

In [None]:
head(yearly)

In [None]:
ff = ???

In [None]:
plot(yearly$Yr, ff)

**"Rate equation" for temperature change.**

$$T_{t+1} = T_t + c \left(F - \frac{R}{ECS} T_t\right)$$

Let $$c = 0.098, ECS = 3, T_{1960} = 0.2$$

In [None]:
TT = 0.2
for (ii in 2:nrow(yearly)) {
    TT[ii] = ???
}

In [None]:
TT

In [None]:
plot(yearly$Yr, TT)

**What if we now stop emissions?**

In [None]:
for (ii in nrow(yearly) + (1:100)) {
    TT[ii] = ???
}

In [None]:
plot(c(yearly$Yr, yearly$Yr[nrow(yearly)] + (1:100)), TT)

## Plotting (2)

In [None]:
install.packages("tidyverse")

In [None]:
library(tidyverse)

We're going to use a lot of `ggplot2`.

https://nyu-cdsc.github.io/learningr/assets/data-visualization-2.1.pdf

In [None]:
ggplot(df, aes(x=Continuous.Date, y=CO2.filled)) + geom_point() + geom_line() +
  xlab("My great x-axis") + theme_bw()

## Constructing a list of all the primes to 100

In [None]:
1:100

In [None]:
nn = 53
prime <- T
for (ii in 2:52) {
  if (nn %% ii == 0)
    prime <- F
}

In [None]:
prime

In [None]:
is.prime <- function(nn) {
  prime <- T
  for (ii in 2:(nn-1)) {
    if (nn %% ii == 0)
      prime <- F
  }
  prime
}

In [None]:
is.prime(53)

In [None]:
is.prime(51)

In [None]:
primes <- c()
for (xx in 2:100) {
    if (is.prime(xx)) {
        primes <- c(primes, xx)
    }
}

In [None]:
plot(primes)

# Discoveries data

In [None]:
data(discoveries)

In [None]:
?discoveries

In [None]:
discoveries

In [None]:
plot(1860:1959, discoveries)

In [None]:
df = data.frame(year=1860:1959, count=discoveries)

In [None]:
df

In [None]:
ggplot(df, aes(year, count)) + geom_point()

In [None]:
?geom_smooth

In [None]:
ggplot(df, aes(year, count)) + geom_point() + geom_smooth()

In [None]:
sumdf = data.frame(count=mean(df$count))

In [None]:
ggplot(df, aes(year, count)) + geom_point() + geom_hline(data=sumdf, aes(yintercept=count))

But actually seems to change. Let's do decadal averages!

In [None]:
seq(1860, 1959, by=10)

In [None]:
df$year >= 1860 & df$year < 1870

In [None]:
df$count[df$year >= 1860 & df$year < 1870]

In [None]:
sumdf = data.frame()
for (decade in seq(1860, 1959, by=10)) {
    mu = mean(df$count[df$year >= decade & df$year < decade + 10])
    sumdf = rbind(sumdf, data.frame(decade=decade, count=mu))
}

In [None]:
ggplot(df, aes(year, count)) + geom_point() + geom_col(data=sumdf, aes(x=decade + 5, count), alpha=.5)

Rolling 10-year windows

In [None]:
sumdf = data.frame()
for (year1 in 1860:1950) {
    mu = mean(df$count[df$year >= year1 & df$year < year1 + 10])
    sumdf = rbind(sumdf, data.frame(year1=year1, count=mu))
}

In [None]:
head(sumdf)

In [None]:
ggplot(df, aes(year, count)) + geom_point() + geom_line(data=sumdf, aes(x=year1 + 5, count))

In [None]:
sumdf = data.frame()
for (year1 in 1860:1940) {
    mu = mean(df$count[df$year >= year1 & df$year < year1 + 20])
    sumdf = rbind(sumdf, data.frame(year1=year1, count=mu))
}

In [None]:
sumdf2 = data.frame()
for (year1 in 1860:1940) {
    mu = median(df$count[df$year >= year1 & df$year < year1 + 20])
    sumdf2 = rbind(sumdf2, data.frame(year1=year1, count=mu))
}

In [None]:
ggplot(df, aes(year, count)) + geom_point() + geom_line(data=sumdf, aes(x=year1 + 10, count)) +
  geom_line(data=sumdf2, aes(x=year1 + 5, count))

## Tabulated data

Passengers on the Titanic.

In [None]:
data(Titanic)

In [None]:
as_tibble(Titanic)

In [None]:
as.tibble(Titanic) %>% group_by(Class, Survived) %>% summarize(n=sum(n)) %>% group_by(Class) %>% summarize(pr=n[Survived == 'Yes'] / sum(n))

In [None]:
as.tibble(Titanic) %>% group_by(Sex, Survived) %>% summarize(n=sum(n)) %>% group_by(Sex) %>% summarize(pr=n[Survived == 'Yes'] / sum(n))

In [None]:
as.tibble(Titanic) %>% group_by(Age, Survived) %>% summarize(n=sum(n)) %>% group_by(Age) %>% summarize(pr=n[Survived == 'Yes'] / sum(n))