# [SOC-88] Visualization Basics


In [None]:
import numpy as np
import pandas as pd
from datascience import *

## Part 1: Components of Graphs

Every graph has components that allow us to understand the data being represented in new, different ways. The following components will help improve the quality of your graphs and make it easier for the audience to understand. We will go over how to modify these components based on what you want. 

**We will review a few functions to customize your graphs and plots. You can follow this [link](https://matplotlib.org/3.1.1/api/pyplot_summary.html) and see more of the functions that can be used in `matplotlib.pyplot`.**

**Note:** Everything below will reference the imported `matplotlib.pyplot` package as `plt` in the line below. If you change the name you import it as, make sure to replace the name you decide with every `plt` below with your new name.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

### a) Displaying Graphs

In this notebook, the plots are being displayed directly after the code because of the `%matplotlib inline` library from the code cell above. If we did not have this line, then all the plots in this notebook would not appear at all.

There is another way to display plots without using `inline`; you can use `plt.show()`. However, it does requires that you makes some sort of plot beforehand. If you just run `plt.show()` by itself, you would get a bare-bone, empty plot.

In [None]:
plt.title("Name")
plt.show();

In the last line of your plot, it is good practice to add a semi-colon, `;`, which gets rid of the jargon text directly above your plot. The semi-colon is not a strict requirement since it does not change the code in any way, but it does get rid of the excess and only returns the important part: the graph.

**Note:** This extra rule is not effective in the plot above, but will be useful with later graphs in this notebook.

### b) Complex, Multiple Plots

When initializing plots, you can set up a figure using `plt.figure()`, which creates space for a plot. There are two key arguments you may want to adjust:

- The first, `figsize=(x,y)`, lets you determine what the `x` and `y` dimensions of your plot to be. 
- The second, `facecolor`, controls the color of the axes and border around your plot.

In [None]:
plt.figure(facecolor='white', figsize=(8,8));

We will see what the above code looks like in an actual plot later below. Traditionally, it is written as `fig = plt.figure()` so that we can reference it in the future.

In [None]:
fig = plt.figure();

When alone, the figure `fig = plt.figure()` does not create an image. It is usually accompanied with a description of the number of plots you want to graph. 

We describe how many plots will be in this space by adding subplots. Using `fig.add_subplot(...)`, it references the `fig` figure (like the one we created in the cell above) and calls the `add_subplot()` function. It is best practice to keep the `fig` and `subplots` together in the same cell to avoid confusion. Also note that it is tradition that we label each subplot with `ax`. Make sure if you have multiple subplots to number them with different values, so that you can reference each subplot individually, such as `ax1`, `ax2`, etc.

The numbers provided within the `add_subplot()` function represent the numbers of plots and the dimensions you are creating. For example, if we use `111`, within a 1x1 grid, we are creating the 1st subplot.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8));
ax = fig.add_subplot(111)

We can create a 2x1 grid in the first subplot, which has a visually longer x-axis.

In this example, we have 2 rows and 1 column. Remember to reference each subplot by changing the last number in your `ax` label.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8));

# Plot 1 (subplot number ends with 1)
ax1 = fig.add_subplot(211)

# Plot 2 (subplot number ends with 2)
ax2 = fig.add_subplot(212)

We can also create a 1x2 grid in the first subplot (1 row and 2 columns), which has a longer y-axis.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8));
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

You can remember which axis is going to be extended by remembering the dimensions: **x by y *=* rows by columns**.

***

### Question 1
Let's try it out by making a plot that has an 8x8 dimension with 2 subplots next to each other in columns. Plot the line y = x on the first subplot and y = -x on the second. If you get stuck on how to add plots to the different subplots, look at the documentation [here](https://matplotlib.org/3.1.0/gallery/subplots_axes_and_figures/subplots_demo.html).

In [None]:
fig = plt.figure(...)

ax1 = ...***
ax1.plot(np.arange(0,10), np.arange(0,10));

ax2 = ...
ax2.plot(np.arange(0,10), -np.arange(0,10));

***

### c) Labeling your Graphs (x-axis, y-axis, Title)

In Data 8, here is how we have traditionally made a bar chart to represent our data.

A benefit to using figures and axes as `fig` and `ax` is that they allow you to customize even more aspects of your plots. You can add a label to your x-axis, y-axis, and a title for your plot in the `ax` you are working on & also be able to adjust their font style and sizes. Their respective function names are:
* `set_xlabel(...)`
* `set_ylabel(...)`
* `set_title(...)`

Here are some possible arguments you can adjust along with their values that you can assign them to:
- **First argument:** String of the label name
- **`fontname`:** Set equal to a string containing the font style you want.
- **`fontsize`:** Set equal to a number that represents how large/small you want your text to be.
- **`weight`:** Change thickness of text (can be `'bold'` or `'thin'`).
- **`color`:** Set equal to a string with the name of the color, the abbreviation, or a grayscale number.

**Note:** `set_title(...)` has a unique argument called `loc`, the abbreviation for *location*, which you can set equal to `'right'` or `'left'` to change the position of the text. The axes' labels have something similar with the argument called `position`.

Find more information on color [here.](https://matplotlib.org/2.0.2/api/colors_api.html) You can also see the large variety of fonts available in matplotlib [here](http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/). Some of the fonts no longer work, but most still are available.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)
plt.plot(np.arange(0,10), -5*np.arange(0,10));

# Changing x labels
ax.set_xlabel('x axis label', fontname="Comic Sans MS", fontsize=50, weight='bold', color='red');

# Changing y labels
ax.set_ylabel('y axis label', fontname="serif", fontsize=20, weight='light', color='c');

# Add title
ax.set_title('New title', fontname = 'Impact', fontsize=30, loc='right', color='0.75');

***

### Question 2
Try plotting the same two lines from Question 1 (y = x and y = -x) in the following cell, but in rows. Add appropriate labels and title your plot as "Positive & Negative Slope". Use one unique font (anything different from the default) for all the labels. Use an appropriate color for your labels.

In [None]:
fig = plt.figure(figsize=(8,8))
ax1 = ...
ax1.plot(...);

ax2 = ...
ax2.plot(...);

ax2.set_xlabel(...)
ax2.set_ylabel(...)
ax1.set_title(...);

***

### d) Axes Limits and Ticks

Another feature you can edit on your graph are the bounds of your x-axis and y-axis. Note the last number in your bound is included, which is different from `np.arange(...)` that usually excludes the last number.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)
plt.plot(np.arange(0,10), np.arange(0,10));

# Changing x-axis limit 
ax.set_xlim(0,9)

# Changing y-axis limit
ax.set_ylim(0, 9);

You can also manually set the axis labels of your plot using `set_xticks(...)`, `set_xticklabels(...)` for the x-axis, as well as `set_yticks(...)` and `set_yticklabels(...)` for the y-axis.

- The first pair of functions, `set_xticks(...)` and `set_yticks(...)`, changes the number of ticks that are displayed on your graph, starting from whichever integer you indicated.
- The second pair of functions, `set_xticklabels(...)` and `set_yticklabels(...)`, changes the labels' styles.

It is possible to add labels that inaccurately represent the plot if you use `set_ticklabels(...)`, so be careful with this function. However, it is useful if you wish to change the font style and size of the actual labels on your axes.

***

### Question 3
In the cell below, there is a mistake in the plot's axes. Change the plot so that it demonstrates the line y=x.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)
plt.plot(np.arange(0,10), np.arange(0,10));

ax.set_xticks(np.arange(20))
ax.set_xticklabels(2*np.arange(0,20), fontsize=15);

ax.set_yticks(np.arange(20));

***

### e) Single Plot Graphs

By making figures and adding axes, we are able to easily customize different subplots. However, if you are not interested in customizing its features and only want to plot just one simple plot, then there are parallel fuctions you can use without making a figure. Here is an example.

In [None]:
# LABELS
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("Title", loc='left');

# AXES LIMITS
plt.xlim(0,1000)
plt.ylim(0, 8);

#### `xticks` and `yticks`

There are times where you may want to have a different range of values on your x or y axes. Using `plt.xticks()` or `plt.yticks()`, you can customize the number of values represented on each axis. 

Both require 2 arguments in the function:
- The first is an array of the number of ticks you want on the plot.
- The second is another array of numbers or values you want displayed on each tick


Here is an example. We are making 4 ticks. The first at index 0, the second at index 1, and so on & so forth. The numbers that go into those place are defined on the second array/argument. The number of ticks does not have to be the same in the x and y axes.


In [None]:
# Changing x
plt.xticks(np.arange(4), np.array([10, 20, 30, 40]))

# Changing y
plt.yticks(np.arange(8), np.array([5, 10, 15, 20, 25, 30, 35, 40]));

**Word of Caution:** *It is best that the arrays inputted into your graphs are the same length.* If your first array is longer, there will be ticks without labels. If your second array is longer, then it will not display the values that are out of the range of the first array.

#### `legend`

If your plot has multiple lines, it is important to add a legend to your plot to distinguish the lines. Here is an example containing multiple lines with a legend added to the plot using `plt.legend(list_of_labels)`.

In [None]:
plt.plot([10, 4])
plt.plot([5, 5])
plt.plot([3, 9])

plt.legend(['a', 'b', 'c']);

#### `grid`

Another key feature of plots is adding a grid in the background using `plt.grid()`. It provides an easy-to-follow visualization related to where in the x or y axes the plot is at.

The following cell shows an empty plot with grids in the background, marked by the x and y ticks.

In [None]:
plt.grid();

Here is an example of a graph with a legend AND a grid in the background! Note that with the grid, it is now easier to distingush where each line is at in comparison to one another.

In [None]:
plt.plot([10, 4])
plt.plot([5, 5])
plt.plot([3, 9])

plt.legend(['a', 'b', 'c']);
plt.grid();

### ***A Useful Tool: Saving Graphs***

An important aspect of making graphs is having a way to save them from the Jupyter Notebook, so that they can be used to support other work. To save our figures, we can use the matplotlib function: `savefig("file name")` which will save your image into your DataHub, so that you can download it from there. Learn more [here](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.savefig.html).

If you update your graph, then you can run the cell and it will update the image in your DataHub. **If you want to keep both versions, then you must add a different name to your image before you run the cell again, or else you will lose the original graph.** It is recommended to make a new graph in a new cell rather than changing the code in a plot you want to keep.

In [None]:
fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)
plt.plot(np.arange(0,10), 0*np.arange(0,10));

ax.set_xlabel('x axis label', fontname="serif", fontsize=20);
ax.set_ylabel('y axis label', fontname="serif", fontsize=20);
ax.set_title('New title',fontname="serif", fontsize=30, loc='center', weight='bold')

plt.savefig('Example_plot')

### \* * * * * * * * * *

## Part 2: Different Types of Graphs

These are the types of visualizations most commonly used in Data 8: 
* Bar Charts
* Histograms
* Line Plots
* Scatter Plots

We will be going more in-depth on customizing each type of plot and adding proper components to make it easier for the audience to understand what the graph is explaining.

### a) Bar Charts

**Bar charts are a way to display the relationship between a categorical variable and a numerical variable.** There is a bar for each category you want to represent, and the height of the bar represents the numerical variable paired with each category. The width of the bins should be the same so that you only need to compare the height of the bars.

In Data 8, here is how we have traditionally made a bar chart to represent our data.

Let's first make ourselves a small table, `favorite_1`, with "colors" and "count", or the number of people that like the respective color.

In [None]:
favorite_1 = Table().with_columns("Color", ['Blue', 'Green', 'Red', 'Purple'],'Count', [10, 4, 5, 11])
favorite_1

Then, we simply call the `barh()` function on our recently created table, assigning `'Color'` as our categorical variable.

In [None]:
# This creates a horizontal bar chart with the same labels as those found in the table
favorite_1.barh('Color'); 

#### `color`

To make a more customized bar chart, we can use the matplotlib version of `plt.barh(categories_array, numerical_array, color)`. This is minor, but note the bars in our new plot is organized in reverse order from the bars in the Data 8 version of the plot.

In [None]:
plt.barh(favorite_1.column(0), favorite_1.column(1), color=['blue','green','red','purple'])
plt.xlabel("Number of People")
plt.ylabel("Favorite Color")
plt.title("People's Favorite Colors");

Let's compare this plot with the categorical values sorted from greatest to least. To do this, we can create a new copied version of the original table sorted in descending order, add the proper color adjustments, and plot it.

In [None]:
sorted_favorite = favorite_1.sort('Count')
sorted_favorite

In [None]:
sorted_favorite.barh('Color'); 

In [None]:
plt.barh(sorted_favorite.column(0), sorted_favorite.column(1), color=['green','red','blue','purple'])
plt.xlabel("Number of People")
plt.ylabel("Favorite Color")
plt.title("People's Favorite Colors");

***

### Question 4

Using the figure format from above, make the bar chart in the cell above with Serif font and size 25 font. Save the figure as `"Fav_Colors"`. You can leave the `xticks` and `yticks` as they are or change them as you see fit.

**Note:** If you're having trouble starting, remember that the `fig` and `ax` formatting from Part 1 exists!

In [None]:
...

***

#### *VERTICAL BAR CHARTS*

We're not bound to plotting only horizontal bar charts. We can also represent data as a vertical bar chart.

In Data 8, we use `tbl.bar(...)`. Notice that we provided the xticks so that there wouldn't be any weird double-labelling along the x-axis.

In [None]:
sorted_favorite.bar('Color')
plt.ylabel("Number of People")

# xticks works with categorical types as well.
plt.xticks(np.arange(4), ['Green', 'Red','Blue','Purple'])
plt.xlabel("Favorite Color")
plt.title("People's Favorite Colors");

We can also use `plt.bar()` from the matplotlib library to further customize our plot. 

The formatting is much similar to what we saw in the `plt.barh()` example above. Feel free to experiment and try out other formatting techniques!

In [None]:
plt.bar(sorted_favorite.column(0), sorted_favorite.column(1), color=['green','red','blue','purple'])
plt.ylabel("Number of People")
plt.xlabel("Favorite Color")
plt.title("People's Favorite Colors");

### b) Histograms

**Histograms are a way to contextualize a single set of numerical values.** Histograms give you a way to understand how you data set is spread out. The spread is determined by the bins that represent the data. Bins are ranges that group close values of the data together. 

For example, if we are looking at data using age as a categorical value, one of your bins can include all ages between 0 to 5, and your next bin can hold ages in the range of 5 to 10, and so on & so forth.

*Remember that bins do not include the last number in their range, much like `np.arange()`.*

Histograms are special because they will calculate the percent per unit relative to your variable. This means that you can calculate the percentage of a certain range of your data. This percentage would, in turn, be represented by the area of your histogram.

**Note:** Because histogram bars are scaled by their area, unlike bar graphs, the widths of the bins do not have to be equal. If they are different, the respective heights must be readjusted accordingly.

An area of a histogram is equal to the height of a bar multiplied by the width of the bar:

$$
Area = Width * Height
$$

**Logic of Units:** The area is equivalent to the percentage of the graph that a bin (or multiple bins) represents. In a formula structure, area is equal to the unit of width multiplied by the percent per unit on the height axis. The units cancel out to leave only the percentage:

$$
Percent = Unit * \frac{Percent}{Unit}
$$

Here is a sample of weights from some unknown population.

In [None]:
weights = Table().with_columns("T", np.array([100, 101, 200, 150, 139, 180, 90, 125, 114, 160, 280, 145]))
weights

We can plot it using the Data 8 table function `tbl.hist(...)`, which requires a column of numerical values. Note the y-axis is automatically labeled as "Percent per unit" in the plot.

In [None]:
weights.hist('T')
plt.xlabel("Weight (Pounds)")
plt.title("Distribution of Weights");

We can also use matplotlib directly with `plt.hist(numerical_array)`. Note the default y-axis is different from the Data 8 version.

**This axis represents the number of data points in each bin.** The shape of the two distributions are the same, but they represent different information. This is a key distinction between Data 8's histogram and matplotlib's histogram; make sure to be careful about this!

In [None]:
plt.hist(weights.column('T'))
plt.xlabel("Weight (Pounds)")
plt.title("Distribution of Weights");

#### `density`

As seen above with matplotlib, we can represent different values per each bin. However, if you'd like to keep using matplotlib's histogram but change it in terms of proportion per unit like from Data 8, you can add the `density = True` argument to `plt.hist(...)` to make that change. Do remember that if you do so, make sure to multiply the height of the bar by 100 when calculating for the percentage per unit for a bar.

Below is an example of a density-based matplotlib histogram.

In [None]:
plt.hist(weights.column('T'), density=True)
plt.xlabel("Weight (Pounds)")
plt.ylabel('Proportion per unit')
plt.title("Distribution of Weights");

If you instead prefer to use Data 8's histogram but would like to get the number of data points in each bin, you can add `density = False` to the `tbl.hist(...)` function. Note that the y-axis label automatically changes to 'Count'.

In [None]:
weights.hist('T', density = False)
plt.xlabel("Weight (Pounds)")
plt.title("Distribution of Weights");

#### `bins`

Bins are the "boxes" that hold your data. They can be of different sizes as explained before (ie. different widths), but it is good pratice to have bins of all the same sizes so that it is simpler to compare the percent of your data captured within one specific bin to another.

There are two different types of bins you can generate. For the first way, you can just provide the number of bins you want in your plot and let Python decide the actual intervals. Python will then automatically split the data into equal sizes for the histogram.

For example, with the 10 bins from the plot above, and with our `weights` data ranging from 90 to 280, Python calculates the best intervals for each bin (which would be $(280 - 90) ÷ 10 = 19$), returning 90-109 for the first, 110-129 for the second, 130-159 for the third, and so on & so forth.

To use this way, add a `bins` argument inside your `plt.hist(...)` or `tbl.hist(...)` function, like so:

In [None]:
plt.hist(weights.column('T'), bins=10)
plt.xlabel("Weight (Pounds)")
plt.title("Distribution of Weights");

***

### Question 5
Plot two histograms with different bins sizes to compare them, one with 5 bins and the other with 15. You can either make a figure with 2 subplots or make 2 separate single plots. Make sure you plots are properly labeled.

In [None]:
...

***

The second way is by manually inputting each value you want per bin. To create the bins, you provide an array from the lowest bound to the upper bound of your histogram with each number in between them representing the start of a new bin. This allows us to make uneven bins and represent data in a unique and often deceiving way.

Below is a clear but odd example of utilizing this method.

In [None]:
# ADDED BINS
plt.title("Distribution of Weights");

# MATPLOTLIB
plt.hist(weights.column('T'), bins=np.array([80, 100, 130, 140, 300]), density = True)

# DATA 8 
weights.hist('T', bins=np.array([80, 100, 130, 140, 300]))

plt.xlabel("Weight (Pounds)");

To avoid making a poorly graphed histogram, it is helpful to use `np.arange(start, end, step)` which can make an array of values that are evenly spaced from the start to the end.

*Remember: `np.arange()` excludes the last number in `end`!*

#### `color`

For histograms, you can use only one color per dataset. If you have multiple histograms displayed over each other, each dataset will each have a single color.

**Note:** To plot two histograms over each other, code for each histogram in the same cell.

In [None]:
# ADDED COLOR
plt.hist(weights.column('T'), bins= 5, density = True, color='pink')
plt.hist(weights.column('T'), bins= 15, density = True, color='yellow')

plt.xlabel("Weight (Pounds)")
plt.ylabel('Percent per unit')
plt.title("Distribution of Weights");

### c) Line Plots
**Line plots are used to contextualize a numerical data set in regards to the passage of time.** Theses plots take a single numerical data set and plot it across a time axis. The x-axis is often related with time and allows us to draw conclusions based on the patterns we see.

Let's take a closer look with another example. Here, we're creating another small table with fictional data on the number of college students (in millions) from each year.

In [None]:
over_time = Table().with_columns("Students", np.array([4.5, 6.7, 8, 10, 15, 17 , 20]),
                                 "Years", np.arange(1950, 2020, 10))
over_time

#### `color`

In Data 8, we use the `tbl.plot(...)` function to create our line plots.

In [None]:
over_time.plot("Years", 'Students')
plt.xlabel("Year")
plt.ylabel("Number of College Students (millions)")
plt.grid()

Here is the matplotlib format: `plt.plot(...)`. Add a color argument to change the line's color.

In [None]:
plt.plot(over_time.column('Years'), over_time.column('Students'), color='green')
plt.xlabel("Year")
plt.ylabel("Number of College Students (millions)")
plt.grid()

**Note:** If you are plotting multiple lines, it is helpful to use different colors for each line. Make sure to also not forget to add a legend to distinguish between the lines!

#### `marker` and `markersize`

Markers indicate the exact points from your data. You can make those points stand out so that you can distinguish between the estimated lines and the actual data points with the `marker` argument. There are many different [markers](https://matplotlib.org/3.1.1/api/markers_api.html#module-matplotlib.markers), so please look over this list if you are interested in learning more about the different types. 

We can also change the size of the marker with the `markersize` argument so that it stands out more, or even less!

In [None]:
plt.plot(over_time.column('Years'), over_time.column('Students'), marker='o', markersize=5)
plt.xlabel("Year")
plt.ylabel("Number of College Students (millions)")
plt.grid()

#### `linestyle` and `linewidth`

In addition to modifying markers, you can also adjust the style and width of the line itself. By default, the line you plot will be a solid line with a small width. However, you can change it to either a dotted line, a dashed line, or a few more other types by adding a `linestyle` argument. Check out the different linestyles that you can graph [here](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D.set_linestyle). You can also change the width of the line with the `linewidth` argument.

Here are two examples of different combinations.

In [None]:
# Example 1, with dashed lines of width 5
plt.plot(over_time.column('Years'), over_time.column('Students'), linestyle= '--', linewidth=5)
plt.xlabel("Year")
plt.ylabel("Number of College Students (millions)")
plt.grid()

In [None]:
# Example 2, with dotted lines of width 4
plt.plot(over_time.column('Years'), over_time.column('Students'), linestyle= ':', linewidth=4)
plt.xlabel("Year")
plt.ylabel("Number of College Students (millions)")
plt.grid()

***

### Question 6
Make a single subplot of a line graph that is properly labeled on a 9x9-dimension figure. Use a different marker other than `'o'` and a markersize above 5. For the labels on the plot, use a consistent font across your labels and a readable fontsize. Add a different linestyle and width as you see fit!

In [None]:
...

***

### d) Scatter Plots
**Scatter plots are graphs that compare the relationshp between two different numerical data sets.** The relationship between the two data sets is also called the *association between the two variables*. 

The association can be either positive or negative. If the scatter plots are following a positive trend (or generally have a positive slope), this indicates a positive association, meaning the two variables are both increasing or decreasing together. If the scatter plots are following a negative trend (or a negative slope), this indicates a negative association, meaning one variable is increasing while the other is decreasing, or vice versa.

To help you remember, you can refer to the following.

<div align="center">For Positive associations, you can have either:
    
$$
+/+, \space -/-
$$

<div align="center">For Negative associations, you can have either:
$$
+/-, \space -/+
$$

Let's walk through an example with the following table.

In [None]:
debt = Table().with_columns("Number of College Students (millions)", [4.5,6.7, 8,10,15,17,20],
                            "Student Debt (thousands)", [1, 3 ,5 ,8, 20, 25, 36])
debt

We can make a scatterplot to analyze the relationship between the two numerical variables (in our case, the number of college students vs the student debt) by using `tbl.scatter(...)`, like in Data 8. Notice that our axes are automatically labeled based on the title of the columns.

In [None]:
debt.scatter("Number of College Students (millions)", "Student Debt (thousands)")
plt.grid()
plt.show()

We can also use `plt.scatter(...)` with the similar features in `linestyle` and `color` that you explored in line plots previously.

In [None]:
# Plot 1
new_x = np.arange(4, 22)
plt.plot(new_x, 2*new_x - 10, linestyle='-', color='purple');

In [None]:
# Plot 2 with added color
plt.scatter(debt.column("Number of College Students (millions)"), debt.column("Student Debt (thousands)"),
            color = 'blue', marker = 'v')

plt.xlabel("Number of College Students (millions)")
plt.ylabel("Student Debt (thousands)")
plt.grid()
plt.show()

### Congratulations! You made it to the end!

These are only some tips for creating effective visualizations, and some may play stronger than others depending on the actual data you are dealing with.

Overall, using either the functions from Data 8 or the matplotlib library will create a plot that is both powerful and insightful. Depending on what features you want to alter however, one may be better than the other.