# Headline Analysis Example (assignment one)
[J. Nathan Matias](https://natematias.com), [COMM 4940](https://natematias.com/courses/comm4940/) (Jan 2020)

This code includes an analysis example for the [Week 2 Headline Experiment](https://github.com/natematias/design-governance-experiments/tree/master/assignments/1-headline-experiment).

### DATASET DESCRIPTION
Each row in *headline-experiment-impressions.csv* represents a single "impression" - a browser that viewed a given headline:
* **hed**: which headline number was used (the actual headline text is in *headline-experiment-heds.csv*)
* **click**: whether that participant clicked on the article to read further

In [None]:
## LOAD LIBRARIES
library(ggplot2) ## FOR PLOTS
library(gmodels) ## FOR CrossTable

## SET GGPLOT TO USE WIDE BUT NOT TOO TALL PLOTS
options(repr.plot.width=6, repr.plot.height=4)

In [None]:
# load participants file
participants     <- read.csv("headline-experiment-impressions.csv")
# convert headline to a factor, to simplify analysis later on
participants$hed <- factor(participants$hed)
# relevel the factors so that the 4th headline is the "reference factor" for the regression model
participants$hed <- relevel(participants$hed, ref="4", data=participants)


headlines        <- read.csv("headline-experiment-heds.csv")

### Show Headlines

In [None]:
headlines

### Show a CrossTable of the Data

In [None]:
CrossTable(participants$hed, participants$click, 
           prop.r =FALSE, prop.c=TRUE, prop.t=FALSE,prop.chisq=FALSE)

### Estimate Results With Confidence Intervals

In [None]:
summary(result.lm <- lm(click ~ hed, data=participants))

In [None]:
# generate a dataframe with a row for each headline
# so we can generate and store the estimates for each one
estimate.df <- data.frame(hed=factor(c(1,2,3,4)))

# use the predict() method with the result object
# to create estimates and confidence intervals for each 
preds.df <- data.frame(predict(result.lm, estimate.df, se.fit=TRUE, interval="confidence")$fit)

estimate.df$estimate <- preds.df$fit
estimate.df$estimate.lwr <- preds.df$lwr
estimate.df$estimate.upr <- preds.df$upr

### Plot Results using ggplot

In [None]:
print(paste("max estimate:", max(estimate.df$estimate.upr)))

In [None]:
ggplot(estimate.df, aes(estimate.df$hed, estimate)) +
    geom_point() +
    geom_errorbar(ymin=estimate.df$estimate.lwr, ymax=estimate.df$estimate.upr, width=0.15) +
    theme_bw() +
    scale_y_continuous(limits=c(0,0.03), breaks=(seq(0,0.04, 0.0025)),labels = scales::percent) +
    coord_flip() + ## This line converts the vertical plot to a horizontal plot
    xlab("") +
    ylab("ASSIGNMENT TODO: Label AXIS")