# Intro to Econometrics with R

Hans Martinez
Western University

## R Basics

### What is R?
R is a programming language and free software environment for statistical computing and graphics. It's widely used among statisticians and data miners for developing statistical software and data analysis.

```{r}
# This is an R code cell
print("Hello, welcome to R!")
```


In [None]:
# This is an R code cell




### Types of variables

#### Strings
```{r}
# String variable
name <- "John Doe"
print(name)
```


In [None]:
# String variable



#### Integers
```{r}
# Integer variable
age <- 25L  # The 'L' suffix denotes an integer
print(age)
```


In [None]:
# Integer variable


#### Float (Numeric)

```{r}
# Float variable
height <- 5.9
print(height)
```


In [None]:
# Float variable



#### Logic (Boolean)
```{r}
# Logical variable
is_student <- TRUE
print(is_student)
```


In [None]:
# Logical variable


### Collections of variables

#### Vectors
```{r}
# Vector
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
```


In [None]:
# Vector


#### Lists
```{r}
# List
person <- list(name="John", age=25, height=5.9)
print(person)
```


In [None]:
# List


#### Data Frames
```{r}
# Data Frame
df <- data.frame(
  name = c("John", "Jane", "Mike"),
  age = c(25, 30, 35),
  height = c(5.9, 5.6, 6.1)
)
print(df)
```


In [None]:
# Data Frame


### Arithmetic functions
```{r}
a <- 10
b <- 5

print(paste("Addition:", a + b))
print(paste("Subtraction:", a - b))
print(paste("Multiplication:", a * b))
print(paste("Division:", a / b))
print(paste("Exponentiation:", a ^ 2))
```


In [None]:
# Basic arithmetic functions


### Logical functions
```{r}
x <- 10
y <- 5

print(paste("x > y:", x > y))
print(paste("x < y:", x < y))
print(paste("x == y:", x == y))
print(paste("x != y:", x != y))
```


In [None]:
# Logical functions


### Functions

R comes with built-in functions that one can apply to data.

```{r}
# Some R built-in functions
a<-1.5
log_a <-log(a)
exp_a <- exp(a)
sqrt_a <- sqrt(a)

print(paste("The log of",a,"is",log_a))
print(paste("The exp of",a,"is",exp_a))
print(paste("The square root of",a,"is",sqrt_a))

```


In [None]:
# Some R built-in functions


We can also define our own functions

```{r}
# Define a function
square <- function(x) {
  return(x^2)
}

# Use the function
result <- square(4)
print(paste("The square of 4 is:", result))
```


In [None]:
# User defined function



## Data Visualization

### Reading Data from File

```{r}
# Reading data from file
heroes<-read.csv("Super_Heroes.csv")

names(heroes) |> print()

```


In [None]:
# Reading data from file
heroes<-read.csv("Super_Heroes.csv")

names(heroes) |> print()

### Tables
```{r}
# Create a simple table
table(heroes$Alignment) |> as.data.frame() |> print()

```


In [None]:
# Create a simple table
table(heroes$Alignment) |> as.data.frame() |> print()


### Cross tables
```{r}
# Create a cross table

with(
    heroes,
    table(
    Publisher,
    Gender
    )
)
```



In [None]:
# Create a cross table

with(
    heroes,
    table(
    Publisher,
    Gender
    )
)

### Bar plot
```{r}
# Barplot
barplot(table(heroes$Alignment), main="Heros by Alignment",
xlab="Alignment")
```


In [None]:
# Barplot
barplot(table(heroes$Alignment), main="Heros by Alignment",
xlab="Alignment")

### Pie Chart

```{r}
# Pie Chart
pie(table(heroes$Gender), main="Publisher Distribution")
```


In [None]:
# Pie Chart
pie(table(heroes$Gender), main="Publisher Distribution")

## Continuous Variables

```{r}
# Read data 
housing <- read.csv("california_housing_train.csv")
names(housing) |> print()
```

In [None]:
# Read data 
housing <- read.csv("california_housing_train.csv")
names(housing) |> print()


### Histogram
```{r}
# Histogram

hist(housing$median_house_value, main="House Value Distribution")
```


In [None]:
# Histogram
with(
    housing,
    hist(
        median_house_value,
        main="House Value Distribution"
    )
)


### Scatter plot
```{r}
with(
    housing,
    plot(
        median_house_value, 
        total_rooms,
        main="House Value vs. Rooms"
    )
)
```


In [None]:
with(
    housing,
    plot(
        median_house_value, 
        median_income,
        main="House Value vs. Rooms"
    )
)


### Time series plot
```{r}
# Read time series data
df_ts<-read.csv("US_Birthrates_over_Years.csv")

names(ts) |> print()

# Plot time series
with(
    ts,
    plot(
        Year,
        Births1,
        type="l",
        main="Births over Time"
    )
)
```


In [None]:
# Read time series data
ts<-read.csv("US_Birthrates_over_Years.csv")

names(ts) |> print()

In [None]:
with(
    ts,
    plot(
        Year,
        Births1,
        type="l",
        main="Births over Time"
    )
)


## Data Summaries

### Mean, Variance, Median

Mean

$$
\mu = \frac{1}{n}\sum_x x_i
$$

Variance

$$
\sigma^2=\frac{1}{(N-1)}∑_i^N(x_i-\mu)^2
$$

#### Step-by-step

```{r}
# Step-by-step
data <- housing$households

# Mean
data_sum <- sum(data)
n_obs <- length(data)
data_mean <- data_sum/n_obs

print(paste("Mean:", data_mean))

# Variance
diffs <- data-data_mean
diffs_squared <- diffs^2
data_var <- sum(diffs_squared)/(n_obs-1)

print(paste("Variance:", data_var))

```


In [None]:
# Step-by-step
data <- housing$households

# Mean
data_sum <- sum(data)
n_obs <- length(data)
data_mean <- data_sum/n_obs

print(paste("Mean:", data_mean))

# Variance
diffs <- data-data_mean
diffs_squared <- diffs^2
data_var <- sum(diffs_squared)/(n_obs-1)

print(paste("Variance:", data_var))



#### Built-in Functions

```{r}
# Using built-in functions

print(paste("Mean:", mean(data)))
print(paste("Variance:", var(data)))
print(paste("Median:", median(data)))
```


In [None]:
# Using built-in functions

print(paste("Mean:", mean(data)))
print(paste("Variance:", var(data)))
print(paste("Median:", median(data)))


### Interquartile range
```{r}
print(paste("Interquartile Range:", IQR(data)))
```


In [None]:
print(paste("Interquartile Range:", IQR(data)))


### Box-and-whisker plots
```{r}
boxplot(data, main="Box-and-Whisker Plot", ylab="Values")
```


In [None]:
# Boxplot
summary(data) |> print()
boxplot(data, outline = FALSE,main="Box-and-Whisker Plot", ylab="Values")
# outline = FALSE removes outliers


### Weighted Mean and measures of grouped data
```{r}
values <- housing$median_income
weights <- housing$population

average_income <- weighted.mean(values, weights)
print(paste("Weighted Mean:", average_income))
```


In [None]:
# Weighted Mean

values <- housing$median_income
weights <- housing$population

average_income <- weighted.mean(values, weights)
print(paste("Weighted Mean:", average_income))


### Covariance and Correlation
```{r}
# Correlation of two continuous random variables
cor(housing$housing_median_age, housing$median_income)

# Correlation of all the random variables in the dataset
cor(housing)
```


In [None]:
# Correlation of two continuous random variables
cor(housing$housing_median_age, housing$median_income)

# Correlation of all the random variables in the dataset
cor(housing)

In [None]:
# Scatter Plots for all the variables in the dataset
pairs(housing)

### Data Summaries

```{r}
# Data summaries built-in

summary(housing)
```

In [65]:
# Data summaries built-in

summary(housing)

   longitude         latitude     housing_median_age  total_rooms   
 Min.   :-124.3   Min.   :32.54   Min.   : 1.00      Min.   :    2  
 1st Qu.:-121.8   1st Qu.:33.93   1st Qu.:18.00      1st Qu.: 1462  
 Median :-118.5   Median :34.25   Median :29.00      Median : 2127  
 Mean   :-119.6   Mean   :35.63   Mean   :28.59      Mean   : 2644  
 3rd Qu.:-118.0   3rd Qu.:37.72   3rd Qu.:37.00      3rd Qu.: 3151  
 Max.   :-114.3   Max.   :41.95   Max.   :52.00      Max.   :37937  
 total_bedrooms     population      households     median_income    
 Min.   :   1.0   Min.   :    3   Min.   :   1.0   Min.   : 0.4999  
 1st Qu.: 297.0   1st Qu.:  790   1st Qu.: 282.0   1st Qu.: 2.5664  
 Median : 434.0   Median : 1167   Median : 409.0   Median : 3.5446  
 Mean   : 539.4   Mean   : 1430   Mean   : 501.2   Mean   : 3.8836  
 3rd Qu.: 648.2   3rd Qu.: 1721   3rd Qu.: 605.2   3rd Qu.: 4.7670  
 Max.   :6445.0   Max.   :35682   Max.   :6082.0   Max.   :15.0001  
 median_house_value
 Min.   : 1499

## Probability

### Marginal Probabilities Discrete Random Variables

```{r}
# Marginal Probabilities 

with(
    heroes,
    table(
        Gender
    )
) |> proportions()

with(
    heroes,
    table(
       Alignment
    )
) |> proportions()
```

In [None]:
# Marginal Probabilities 

with(
    heroes,
    table(
        Gender
    )
) |> proportions()

with(
    heroes,
    table(
       Alignment
    )
) |> proportions()

### Joint Probabilities of Discrete Random Variables

```{r}
# Joint Probabilities 

with(
    heroes,
    table(
        Gender,
       Alignment
    )
) |> proportions()
```

In [None]:
# Joint Probabilities 

with(
    heroes,
    table(
        Gender,
       Alignment
    )
) |> proportions() 


## Probability Distributions

### Discrete Random Variables

#### Poisson Distribution

- Distribution Plot

```{r}
# Distribution Plot 
lambda <- 2
n <- 100
poisson_points <- rpois(n,lambda)
plot(
    table(poisson_points), 
    main="Poisson Distribution",
    ylab="Probability",
    xlab="X"
)
```


In [None]:
lambda <- 2
n <- 100
poisson_points <- rpois(n,lambda)
plot(
    table(poisson_points), 
    main="Poisson Distribution",
    ylab="Probability",
    xlab="X"
)

- Probabilities

$$
P(X=x|\lambda)=\frac{e^{-\lambda} \lambda^x}{x!}
$$

In [None]:
# Individual Poisson Probabilities
dpois(3,4.4)

# Cumulative Poisson Probabilities
ppois(3,4.4)


### Continuous Random Variables

#### Normal Distribution

- Probability Density Function

$$
f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma^2} \; \text{for } -\infty < x < \infty 
$$


```{r}
mu <- 0    # mean
sigma <- 1 # standard deviation
n <-1000
x <- rnorm(n,mu,sigma)

hist(x,main="Normal Distribution", xlab="X", ylab="Density")

```


In [None]:
mu <- 0    # mean
sigma <- 1 # standard deviation
n <-1000
x <- rnorm(n,mu,sigma)

hist(x,main="Normal Distribution", xlab="X", ylab="Density")


- Probability

$$
 P(X\le x_0)=\int_{-\infty}^{x_0} f(x) dx
$$



In [None]:
pnorm(1,mu,sigma)
