# Principles of Data Visualization
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo19_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *
from plotnine.data import *
import warnings 
warnings.filterwarnings('ignore') 

**NOTE** If you get errors when you run the cell above, go to the terminal and type the following
```python
pip install plotnine
pip install matplotlib==3.8.3
```

Then come back to this notebook and try again. (You might have to restart your kernel). 

In [None]:
df = midwest
df.head()

In [None]:
# adding variables with ggplot aesthetic mapping
(ggplot(df, aes('percollege','percprof',color = 'state'))
+geom_point()).draw()

## Facetting
In the plots above, a lot of points fell on top of eachother. Let's split up the visulizations for each state.

In [None]:
# with a plotnine ggplot
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+...).draw()

In [None]:
# adjust the number of rows in your facets
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state',...)
+theme(figure_size=(10,3))).draw()

In [None]:
# facetting by more than one variable
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+...
+theme(figure_size=(8,16))).draw()

## Statistical transformations (stat)

In [None]:
# add statistical transformations
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state')
+...).draw()

## Layer-specific mappings

In [None]:
# Use different aesthetics for different parts of graphic
(ggplot(df,aes('percollege','percprof'))
+geom_point(...))
+facet_wrap('state')
+stat_smooth()).draw()

## Activity

The `plotnine` module has several has a dataset called `diamonds`, a dataset containing the prices and other attributes of almost 54,000 diamonds.

In [None]:
diamonds.head()

1. According to the principles of data visualization, what is wrong with the graph below? Adjust the ggplot so that it aligns with the principles of data visualization.  

In [None]:
(ggplot(diamonds, aes(x='x',y='y'))
       +geom_line()).draw()

2. According to the principles of data visualization, what is wrong with the graph below? Adjust the matplotlib graph, or create a ggplot so that it aligns with the principles of data visualization.  

In [None]:
ideal = diamonds[diamonds.cut == 'Ideal']
prem = diamonds[diamonds.cut == 'Premium']
good = diamonds[diamonds.cut == 'Good']
vgood = diamonds[diamonds.cut == 'Very Good']
fair = diamonds[diamonds.cut == 'Fair']

plt.plot('carat','price','r.',data = ideal)
plt.plot('carat','price','m.',data = prem)
plt.plot('carat','price','y.',data = good)
plt.plot('carat','price','w.',data = vgood)
plt.plot('carat','price','k.',data = fair)
plt.show()

3. According to the principles of data visualization, what is wrong with the graph below? Adjust the ggplot so that it aligns with the principles of data visualization.  

In [None]:
avg_price = diamonds.groupby('clarity').price.mean().reset_index()
(ggplot(avg_price,aes(x='clarity',y='price',fill = 'clarity')) 
 + geom_bar(stat='identity',color='r')
 + geom_text(label=avg_price.clarity)
 + theme_classic()).draw()