### Outline
2. Analyzing Data
    - Summary Tables
    - Figures
    - Regression Tables

In [21]:
library(dplyr)
library(stargazer) 
library(ggplot2)
library(lfe) 
# install.packages("lfe", repos='http://cran.us.r-project.org')

In [22]:
options(repr.matrix.max.rows=200, repr.matrix.max.cols=100) # set # of rows and columns to display 
options(repr.plot.width=4, repr.plot.height=4) # set figure size

In [23]:
setwd('...') # set the working directory

In [24]:
data <- as.data.frame(read.table('data/data_use_R.csv', sep=",", header = TRUE, stringsAsFactors=FALSE))

### Summary Tables
- Any empirical paper has a summary table.
- Summary tables are very important because people often see the tables to understand the distributions of variables.
- To make a summary table, use `stargazer` in `stargazer` package.
- You can produce an output in any format. However, I suggest LaTeX format.
    - The reason: You will import the file later when you write a paper.
- For the moment, we produce files with the `txt` file extension.
    - If you already have TeX in your computer, change the file extension from `txt` to `tex`.

In [None]:
stargazer(data[,c('gelec_dem', 'gelec_rep', 'gelec_oth', 'gelec_total', 'rep_share', 'dem_share', 'elec_year', 'temp_mean', 'temp_max_max', 'temp_max_mean')], 
          out="sum_stat_R.txt", digits=2, header=FALSE, float=FALSE, summary.stat = c('mean','sd','min','max','n'))

### Figures
- Good papers always have figures that summarize the results well.
    - You should not write a paper only with tables.
- A very useful package for plotting figures is `ggplot2`.
    - A [cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) is available.  
- There are several types of figures: histograms, density plots, scatter plots, bar plots, line plots...
    - Each type has different purposes.
- Regarding the format, I suggest either `png` or `jpg`.

#### - Histograms
- Histograms are often used to show the distributions of your data.

In [None]:
ggplot(data) + 
    geom_histogram(aes(x=dem_share), bins=20, alpha=0.3, fill='blue', na.rm=TRUE)  +
    geom_histogram(aes(x=rep_share), bins=20, alpha=0.3, fill='red', na.rm=TRUE)  +
    labs(title="Vote share, 2008-2014", x="") +
    theme(
        panel.background = element_rect(fill=NA),
        panel.border = element_rect(fill=NA, color='grey75'),
        axis.ticks = element_line(color='grey85'),
        #panel.grid.major = element_line(color = "grey95", size = 0.25),
        #panel.grid.minor = element_line(color = "grey95", size = 0.25),
        legend.position = 'none',
        plot.title = element_text(hjust=0.5, size=9),
        axis.title = element_text(size=9),
        axis.text = element_text(size=9)
    )
ggsave('hist_rep_share_R.png', width=4, height=4)

#### - Density Plots
- Density plots are a variation of histograms.
    - For density plots, you employ some method to smooth the distribution.
- It is not affected by, e.g., how you choose bins.

In [None]:
data1 <- data.frame(value=rnorm(1000,0,1))
ggplot(data1, aes(x=value)) +
    geom_histogram(bins=100, alpha=0.3, aes(y=..density..)) +
    stat_density(geom="line", color='red') +
    labs(title="Title", x="", y="") +
    theme(
        panel.background = element_rect(fill=NA),
        panel.border = element_rect(fill=NA, color='grey75'),
        axis.ticks = element_line(color='grey85'),
        legend.position = 'none',
        plot.title = element_text(hjust=0.5, size=9),
        axis.title = element_text(size=9),
        axis.text = element_text(size=9)
    )

#### - Bar Plots
- Bar plots are often used to compare statistics (e.g., mean) for different groups.

In [None]:
data1 <- data.frame(
    name = c('tom', 'jerry', 'spike', 'tyke'),
    height = c(1.75, 1.82, 1.65, 1.4),
    treatment = c(1, 1, 0, 0))

ggplot(data1, aes(x=treatment, y=height, fill=factor(treatment))) + 
    stat_summary(fun.y=mean, geom='bar', alpha=0.3) +
    scale_fill_manual(values=c('red', 'blue')) +
    scale_x_continuous(breaks=c(0,1)) +
    labs(title="", x="treatment", y="average height") +
    theme(
        panel.background = element_rect(fill=NA),
        panel.border = element_rect(fill=NA, color='grey75'),
        axis.ticks = element_line(color='grey85'),
        legend.position = 'none',
        plot.title = element_text(hjust=0.5, size=9),
        axis.title = element_text(size=9),
        axis.text = element_text(size=9)
    )

#### - Scatter Plots
- Scatter plots are often used to show a relationship between two samples.

In [None]:
ggplot(data, aes(x=temp_max_max, y=rep_share)) +
    geom_point(color='blue', na.rm=TRUE) +
    labs(title="",x="Mean temperature",y="Republican vote share") +
    theme(
        panel.background = element_rect(fill = NA),
        panel.border = element_rect(fill = NA, color = "grey75"),
        axis.ticks = element_line(color = "grey85"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, size=9),
        axis.title = element_text(size=9),
        axis.text = element_text(size=9)
    )

#### - Line Plots
- Line plots are often used to show time trends.

In [None]:
set.seed(123456789) # set seeds
n <- 1000
data1 <- data.frame(
    sample = 1:n,
    group=rep(c("one", "two", "three"), each=n),
    value=c(cumsum(rnorm(n,0,1)), cumsum(rnorm(n,0,1)), cumsum(rnorm(n,0,1))))

ggplot(data1, aes(x=sample, y=value)) +
    geom_line(aes(colour=group)) +
    labs(title="Title", x="", y="") +
    scale_color_manual(name="", values=c("blue","red","yellow")) +
    theme(
        panel.background = element_rect(fill=NA),
        panel.border = element_rect(fill=NA, color="grey75"),
        axis.ticks = element_line(color="grey85"),
        legend.position = "bottom",
        legend.key = element_blank(),
        legend.text = element_text(size=8),
        plot.title = element_text(hjust=0.5, size=9),
        axis.title = element_text(size=9),
        axis.text = element_text(size=9)
    )

### Regression Tables
- Empirical papers always have regression tables.
- To run a regression, use, e.g., `lm` or `felm` in `lfe` package.
- To produce a table of the regression results, use `stargazer` in `stargazer` package.
    - If you want to produce the result in tex format, use `type=latex` and replace `.txt` with `.tex`.
- How can you interpret the regression results?

In [None]:
reg1 <- lm(rep_share ~ ln_temp_max_max, data=data)
reg2 <- felm(rep_share ~ ln_temp_max_max | state_short + elec_year, data=data)
reg3 <- felm(rep_share ~ ln_temp_max_max | state_short + elec_year | 0 | state_short, data=data)

stargazer(reg1, reg2, reg3, 
          title="Table 1: Correlation between Election Day temperature and Senate Republican vote share", 
          column.labels = c("OLS", "FE", "FE+cluster"), 
          keep='ln_temp_max_max',
          model.names=FALSE, 
          type='text', header=FALSE, out="estimates_R.txt")