# Module 6 Class 4: Data Visualization and Matplotlib

In this activity, you will work directly with the Python visualization library called `matplotlib`. At first, you will be working with NBA Player Data and Salaries. Try to uncover some trends about the data that aren't totally obvious upon first glance!

**Pro Tip:** The Opportunity Through Data online [textbook](https://otd.gitbook.io/book/module-6/data-visualization/matplotlib) and `matplotlib` [documenation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) may be incredibly helpful.

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

Today you will be working with NBA salary data from the 2017-2018 season. Feel free to explore the data in any way that helps you become more comfortable with it! I suggest using Pandas' `shape` and `info` method. Simply run the following cell to load the DataFrame.

In [4]:
nba = pd.read_csv("nba_salaries.csv", index_col=0)
nba

Unnamed: 0,Player,Draft Pick,Salary,Pos,Age,Team,Games Played
0,Zhou Qi,43,815615,C,22,HOU,16
1,Zaza Pachulia,42,3477600,C,33,GSW,66
2,Zach Randolph,19,12307692,PF,36,SAC,59
3,Zach LaVine,13,3202217,SG,22,CHI,24
4,Zach Collins,10,3057240,C,20,POR,62
...,...,...,...,...,...,...,...
644,Al Jefferson,15,9769821,C,33,IND,34
645,Al Horford,3,27734405,C,31,BOS,70
647,Abdel Nader,58,1167333,SF,24,BOS,44
649,Aaron Gordon,4,5504420,PF,22,ORL,55


## Scatter Plots

**Question One:** Create a scatter plot using `matplotlib` that plots the `Salary` of all the players in the DataFrame with their respective `Draft Pick`. Is there a clear pattern?

1. Plot the figure below according to the above specifications.
2. Label the axes appropriately using methods learned in class from `matplotlib`.
3. Give the plot a meaningful title.

In [5]:
plt.figure(figsize=(10,5))
...
...
...
...
plt.show()

<Figure size 720x360 with 0 Axes>

## Bar Charts

**Question Two:** Create a bar chart that displays the average salary for each position `Pos` in the DataFrame. Use functions we have learned earlier this module! Which position has the highest average salary?

1. Create the bar chart according to the specifications
2. Add meaningful labels for axes and a relevant title

For an added challenge, try to create a bar chart where the bars are in **decreasing order** with respect to their heights. Use an additional Pandas method `.sort_values(by="Column Name")` to achieve this result.

In [6]:
plt.figure(figsize=(10,5))
...
...
...
...
...
plt.show()

<Figure size 720x360 with 0 Axes>

**Question Three:** Create a *horizontal* bar chart that shows the salaries of the **ten highest paid NBA players** using `matplotlib`'s `barh` method. The bars should be sorted in *increasing order*, with the highest paid player on the bottom. As always, add the appropriate labels!

1. Create a new DataFrame with the top ten highest paid NBA players called `top_10`.
2. Plot the horizontal bar chart as described above.

In [7]:
plt.figure(figsize=(10,5))
...
...
...
...
...
plt.show()

<Figure size 720x360 with 0 Axes>

## Histograms

**Question Four:** Create a histogram that shows the distribution of `Games Played` for the entire data set with appropriate labels. What can you observe from the trends?

**NOTE:** The y-label of the histogram can be difficult to come up with -- it is not as clear as it has been for other visualizations within this activity. Think hard about what both the **height** and **area** of the plot represent. 

In [8]:
plt.figure(figsize=(10,5))
...
...
...
...
plt.show()

<Figure size 720x360 with 0 Axes>

**Question Five:** Create two new Pandas DataFrames called `centers` and `point_guards` that contain only the information of their respective position. Recall that all the **centers** within the original data set will have `Pos` as "C" and all the **point guards** will have `Pos` as "PG. 

We will use these two new DataFrames in order to compare the distribution of salaries between each position.

In [9]:
centers = ...
centers

Ellipsis

In [10]:
point_guards = ...
point_guards

Ellipsis

Now, using your newly created DataFrames `centers` and `point_guards`, plot the two distributions of their respective salaries using a histogram. Look at the `matplotlib` documentation to find out how to change the **alpha** of the histogram (transparency), and change it to have `alpha=0.5`.

In [11]:
plt.figure(figsize=(10,5))
...
...
...
...
...
plt.show()

<Figure size 720x360 with 0 Axes>

If you created the histogram above correctly, you will notice that the `point_guards` distribution skews further out to the right! Our visualization allowed us to see this easier than we could have by simply looking at the raw data or at a DataFrame.

**Question Six:** Using this information and the `point_guards` DataFrame, return a list of all the point guards that have a salary greater than or equal to 25 Million Dollars. Do you recognize any of them?

In [12]:
over_25_million = ...
over_25_million

Ellipsis

**Interestingly enough,** we see that there are five point guards that make over 25 million, while there is only one center that earns a salary in that range, despite centers having the highest average salary.

You have now completed **Module 6!** One more left to go....