# Data Visualization

This assignment walks through the conceptual ideas behind data visualizations, as well as how to create them with Matplotlib and Seaborn.

First, we import some libraries and load some datasets.  

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

sns.set()

In [None]:
car_crashes = sns.load_dataset("car_crashes")
titanic = sns.load_dataset("titanic")
diamonds = sns.load_dataset("diamonds")
iris = sns.load_dataset("iris")
fmri = sns.load_dataset("fmri")

### Using the Appropriate Visualization

This assignment will cover these five basic visualizations: 
1. Bar Charts
2. Multi-level Bar Charts
3. Histograms
4. Scatter Plots
5. Line Charts

If you need a refresher on these visualizations, be sure to refer to the [slides]() or the [notes](). 

Of the five visualizations presented, which ones are appropriate for categorical variables? How many categorical variables does each visualization allow for? Justify why each visualization is appropriate. 

**STUDENT SOLUTION HERE**

Of the five visualizations presented, which ones are appropriate for quantitative variables? How many quantitative variables does each visualization allow for? Justify why each visualization is appropriate. 

**STUDENT SOLUTION HERE**

### Good Visualization Practices

What makes a good visualization? The intuitive answer is that they are easy to read and can quickly offer clear and accurate information. Each of the following visualizations below is problematic; find what the issue is and explain how to fix it. 

In [None]:
sns.scatterplot(
    data = car_crashes,
    x = "speeding",
    y = "alcohol"
)
plt.show()

What's wrong with the above visualization? 

**STUDENT SOLUTION HERE**

In [None]:
sns.histplot(
    data = titanic, 
    x = "fare"
)
plt.title("Distribution of Passenger Fares")
plt.xlabel("Fare ($)")
plt.show()

What's wrong with the above visualization? 

**STUDENT SOLUTION HERE**

In [None]:
sns.scatterplot(
    data = diamonds, 
    x = "price",
    y = "depth"
)
plt.title("Price vs. Depth")
plt.show()

What's wrong with the above visualization? 

**STUDENT SOLUTION HERE**

In [None]:
plot = sns.countplot(
    data = iris, 
    x = "petal_width"
)
for item in plot.get_xticklabels():
    item.set_rotation(45)
plt.xlabel("Petal width (cm)")
plt.title("Distribution of petal widths")
plt.show()

What's wrong with the above visualization? 

**STUDNET SOLUTION HERE**

### Creating Visualizations with Seaborn

Now that you know when to use what kind of visualization, as well as common pitfalls to avoid, you can now create your own visualizations! Use Seaborn and Matplotlib to generate the graphs described by the prompts. Make sure to include the appropriate axis labels and chart titles as well. 

Create a multi-level bar chart of counts `titanic` dataset, where the first level is the passenger's class and the second level (specified by color) is the town the passenger embarked from. 

*Hint: You may find the "hue" argument useful in seaborn functions.*

In [None]:
# BEGIN STUDENT SOLUTION

# END STUDENT SOLUTION 

plt.show()

Create a line plot for the `fmri` dataset, where signal is plotted against the timepoints. Create one line for each distinct region. The unit for timepoint is in seconds, and it's fine to exclude the units for signal (the signal has a complicated interpretation that requires reading a comprehensive paper to understand). 

In [None]:
# BEGIN STUDENT SOLUTION

# END STUDENT SOLUTION 

plt.show()

Fix the histogram of the `titanic` passenger fares shown in the previous section. Set the range of the x axis for the fares to be from 0 to 150. 

*Hint: You can specify bins in the Seaborn function, or manually change the x axis using Matplotlib.*

In [None]:
# BEGIN STUDENT SOLUTION

# END STUDENT SOLUTION 

plt.show()

Fix the scatter plot of the `diamonds` dataset shown in the previous section. Use a hex plot of the data instead of a scatter plot.

*Hint: You may find `sns.jointplot` useful.*

In [None]:
# BEGIN STUDENT SOLUTION

# END STUDENT SOLUTION 

plt.show()