# Introduction to Visualizations in R (ggplot2)

While Python is my preferred language to do most thing, there are several very impressive capabilities that R brings to the table. R far surpasses anything that Python libraries can offer for statistical, linear, and time-series modeling. By most standards, R also has a superior plotting capability over Python. Most graphics you will see in academic research will come from R. Thus, a common workflow will be to use Python for your data wrangling needs, then imporing simple csv files in R for analysis and plotting.

Today we will learn the basics of ggplot2 and the idea of the "Grammar of Graphics." Matplotlib is widely viewed as cumbersome, unintuitive, and difficult to use. On the contrary, most view ggplot2 as more intuitive and easy to learn/remember. That being said, ggplot2 is still a massive library with capabilities that can take a long time to master. Our focus will be on the basic structure as well as some basic plotting features that you can expect to use.

In [None]:
library(ggplot2)

In [None]:
#Let's open our trusty students data set
students = read.csv('students.csv')

In [None]:
head(students)

One of the most used plots in visualizing data is the histogram.

In [None]:
ggplot(students, aes(x=gpa)) + geom_histogram(binwidth=.2)

In [None]:
ggplot(students, aes(x=major, y=gpa)) + geom_point()

In [None]:
agg <- aggregate(students$gpa, by = list(Major=students$major), FUN=mean, data=students)

In [None]:
agg

In [None]:
ggplot(students) + 
    geom_point(aes(x=major, y=gpa)) + 
    geom_point(data=agg, aes(x=Major, y=x), colour = 'red', size=3)

In [None]:
ggplot(students) + 
    geom_boxplot(aes(x=major, y=gpa))

In [None]:
data(mtcars)

head(mtcars)

In [None]:
ggplot(data=mtcars, aes(hp, qsec)) + 
    geom_point() +
    geom_smooth(method=lm)

In [None]:
pairs(mtcars)

In [None]:
data(iris)

In [None]:
head(iris)

In [None]:
avgs <- aggregate(x=iris[,1:4], by=list(Species=iris$Species), FUN=mean)

In [None]:
avgs

In [None]:

s.len <- ggplot(avgs, aes(x=Species, y=Sepal.Length)) + 
    geom_bar(stat='identity', fill='#d60000') +
    labs(title='asdf', x='qwer', y='zxv')

p.length <- ggplot(avgs, aes(x=Species, y=Petal.Length)) + 
    geom_bar(stat='identity', fill='#797c77') + 
    ggtitle('This is a title') + xlab('This is x') + ylab('This is y')

p.width <- ggplot(avgs, aes(x=Species, y=Petal.Width)) + 
    geom_bar(stat='identity', fill='#438e88') + 
    ggtitle('This is a title') + theme(plot.title = element_text(hjust = 0.5)) + 
    xlab('This is x') + 
    ylab('This is y')



In [None]:
install.packages("gridExtra")
library("gridExtra")

In [None]:
grid.arrange(s.len, p.length, p.width, ncol=2, nrow=2, widths=c(4,4), heights=c(2,1))