#Univariate Visualizations: Plotting One Variable at a Time

To run this notebook, first download [this cereal data](https://docs.google.com/spreadsheets/d/1YzP0CF_stFjav6fkXeQ0J0gpk1PQoY2l/edit?usp=sharing&ouid=100081525283421980531&rtpof=true&sd=true) and upload it to this temporary Colab environment or your own drive folder.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
path ="https://docs.google.com/spreadsheets/d/e/2PACX-1vR4AOPKKW6zLpiLygapfcRy1vJjV130vQGRCy2v4Tx8vJARHy9K28TWVFboFOgxiPAGEu5R6nWZw3Qw/pub?output=xlsx"
cereal = pd.read_excel(path, header=1)
cereal.head()

NameError: ignored

# Pandas and Matplotlib.pyplot

Pandas uses plt behind the scenes to plot data.  You can use plt functions and methods to interact with plots you make with the df.plot() method.

## Bar Charts:  Plotting Categorical Frequencies

In [None]:
#Count cereals from each manufacturer
manufacturer_counts = cereal['Manufacturer'].value_counts()
manufacturer_counts

In [None]:
type(manufacturer_counts)

In [None]:
#Plot the counts of each cereal brand

manufacturer_counts.plot(kind = 'barh')
plt.title('Number of Cereals by Manufacturer', fontsize = 20)
plt.ylabel('Manufacturer', fontsize = 15)
plt.xlabel('Count', fontsize = 15)

#Show your figure
plt.show()

#Display the count chart below the plot
manufacturer_counts

In [None]:
#Distribution of Cereal types
cereal_types = cereal['type'].value_counts()

#Let's give it some color
cereal_types.plot(kind = 'bar', color = ['blue','red'])
plt.title('Cereal Type: Hot or Cold')
plt.show()

cereal_types

## Histograms: Plotting Continuous Distributions



In [None]:
#Create a histogram of a continuous variable: calories per serving
# with default numer of bins (10)
cereal['calories per serving'].hist()

#Set the label of the x axis
plt.xlabel('Calories')

#Set the label of the y axis
plt.ylabel('Count')

#Give it a title
plt.title('Calorie Distribution')
plt.show()
print('Calories per serving tend to fall around 110')

### Let's try automatically adjusting the # of bins `bins='auto'`

In [None]:
#Histogram of calories per serving: bins='auto'
cereal['calories per serving'].hist(bins='auto')
plt.xlabel('Calories')
plt.ylabel('Count')
plt.title('Calorie Distribution')
plt.show()
print('With more bins we see there are some higher numbers \nat the low and high calorie counts');

# Box and Whisker Plots: Quartiles and Outliers

Box and Whisker plots can help us visualize statistical distributions in meaningful ways.


![Box and Whisker Plot](https://miro.medium.com/max/1400/1*2c21SkzJMf3frPXPAR_gZA.png)

They can answer: 

1. How spread out is our data?
2. Is the data skewed high or low?
3. Are there outlier values, values that are very far from most of the others?

In [None]:
#Check the summary statistics of potassium content
cereal['milligrams of potassium'].describe()

# Finding 'Minimum' and 'Maximum' and Outliers

## (Different than the actual min and max value)

According to the results of our df.describe() above:

1. First Quartile (Q1) = 40
2. Third Quartile (Q3) = 120
3. Interquartile Range (IQR) = Q3 - Q1 = 80
4. 1.5 * IQR = 120
5. 'Minimum' (before outliers) is Q1 - 120 = -80
6. 'Maximum' (before outliers) is Q3 + 120 = 240

Notice below that the lower whisker is at 0, which is the actual minimum value (the plot does not cover non-existent values), and the upper whisker is at 240.

We can see a few outliers above 240.

In [None]:
#Boxplot of potassium content of cereals
cereal['milligrams of potassium'].plot(kind='box')
plt.title('Distribution of Potassium Content in Cereals')
plt.ylabel('Milligrams')
#plt.show()

# Okay, Let's Get Fancy!

[Here is a link to the pandas.DataFrame.boxplot method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html)

You can see all the arguments and description of what they do there.


In [None]:
#df.boxplot can plot multiple columns at a time and group by a categorical column

cereal.boxplot(['milligrams of potassium', 'milligrams of sodium'],  # column or columns to plot
               by = 'Manufacturer',  # column to group by
               figsize = (15,5), # size of the figure
               fontsize = 12, # size of the text
               rot=90,# degree of rotation of the labels
               grid=False, # turn off the grid
               ) # How else can you enhance this plot?

plt.show()

# What does the above boxplot tell you?