**In this notebook, we'll explore how we can communicate the nuanced narrative of gender gap in STEM (science, technology, engineering, and mathematics) fields in the United States using effective data visualization.**

Let's first generate a standard matplotlib plot

* Generate a line chart that visualizes the historical percentage of Biology degrees awarded to women:
    * Set the x-axis to the `Year` column from `women_degrees`.
    * Set the y-axis to the `Biology` column from `women_degrees`.
* Display the plot.

In [5]:
import pandas as pd
import matplotlib.pyplot as plt


In [6]:
# Write your code here



From the plot, we can tell that Biology degrees increased steadily from 1970 and peaked in the early 2000's. We can also tell that the percentage has stayed above 50% since around 1987. While it's helpful to visualize the trend of Biology degrees awarded to women, it only tells half the story. If we want the gender gap to be apparent and emphasized in the plot, we need a visual analogy to the difference in the percentages between the genders.

If we visualize the trend of Biology degrees awarded to men on the same plot, a viewer can observe the space between the lines for each gender. We can calculate the percentages of Biology degrees awarded to men by subtracting each value in the `Biology` column from `100`. Once we have the male percentages, we can generate two line charts as part of the same diagram.

Let's now create a diagram containing both the line charts we just described.

In [10]:
# Write your code here.

# starter_code
# men_degrees_b = 100 - women_degrees['Biology'] (Select ctrl + / (PC) or ⌘ + / (Mac) to uncomment this line)

The chart containing both line charts tells a more complete story than the one containing just the line chart that visualized just the women percentages. This plot instead tells the story of two distinct periods. In the first period, from 1970 to around 1987, women were a minority when it came to majoring in Biology while in the second period, from around 1987 to around 2012, women became a majority. You can see the point where women overtook men where the lines intersect. While a viewer could have reached the same conclusions using the individual line chart of just the women percentages, it would have required more effort and mental processing on their part.

Although our plot is better, it still contains some extra visual elements that aren't necessary to understand the data. We're interested in helping people understand the gender gap in different fields across time. These excess elements, sometimes known as **chartjunk**, increase as we add more plots for visualizing the other degrees, making it harder for anyone trying to interpret our charts. In general, we want to maximize the **data-ink ratio**, which is the fractional amount of the plotting area dedicated to displaying the data.

The following is an animated GIF by [**Darkhorse Analytics**](https://www.darkhorseanalytics.com/blog/data-looks-better-naked) that shows a series of tweaks for boosting the data-ink ratio:

To customize the appearance of the ticks, we use the [**Axes.tick_params()**](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.tick_params) method. Using this method, we can modify which tick marks and tick labels are displayed. By default, matplotlib displays the tick marks on all four sides of the plot. Here are the four sides for a standard line chart:

* The left side is the y-axis.
* The bottom side is the x-axis.
* The top side is across from the x-axis.
* The right side is across from the y-axis.

The parameters for enabling or disabling tick marks are conveniently named after the sides. To hide all of them, we need to pass in the following values for each parameter when we call `Axes.tick_params()`:

* `bottom`: `"off"`
* `top`: `"off"`
* `left`: `"off"`
* `right`: `"off"`

**Instructions**
* Generate 2 line charts in the same plotting area:
    * One that visualizes the percentages of Biology degrees awarded to women over time. Set the line color to "blue" and the label to "Women". For answer checking purposes, make sure you start with this chart.
    * One that visualizes the percentages of Biology degrees awarded to men over time. Set the line color to "green" and the label to "Men".
* Remove all of the tick marks.
* Set the title of the plot to "Percentage of Biology Degrees Awarded By Gender".
* Generate a legend and place it in the "upper right" location.
* Display the chart.

In [None]:
# Write your code here

With the axis tick marks gone, the data-ink ratio is improved and the chart looks much cleaner. In addition, the spines in the chart now are no longer necessary. When we're exploring data, the spines and the ticks complement each other to help us refer back to specific data points or ranges. When a viewer is viewing our chart and trying to understand the insight we're presenting, the ticks and spines can get in the way. As we mentioned earlier, chartjunk becomes much more noticeable when you have multiple plots in the same chart. By keeping the axis tick labels but not the spines or tick marks, we strike an appropriate balance between hiding chartjunk and making the data visible.

In matplotlib, the spines are represented using the [matplotlib.spines.Spine class](https://matplotlib.org/api/spines_api.html). When we create an Axes instance, four Spine objects are created for us. If you run `print(ax.spines)`, you'll get back a dictionary of the Spine objects:

`{'right': <matplotlib.spines.Spine object at 0x111089c18>, 'bottom': <matplotlib.spines.Spine object at 0x111060898>, 'top': <matplotlib.spines.Spine object at 0x1110606a0>, 'left': <matplotlib.spines.Spine object at 0x11107cd30>}`

To hide all of the spines, we need to:

* access each Spine object in the dictionary
* call the Spine.set_visible() method
* pass in the Boolean value False

The following line of code removes the spines for the right axis:
` ax.spines["right"].set_visible(False) `

**Instructions**

Modify the code from the last screen. Select all of the lines in the code editor and press ctrl + / (PC) or ⌘ + / (Mac) to uncomment these lines so you can modify it.

* Generate 2 line charts on the same plotting area:
    * One that visualizes the percentages of Biology degrees awarded to women over time. Set the line color to "blue" and the label to "Women".
    * One that visualizes the percentages of Biology degrees awarded to men over time. Set the line color to "green" and the label to "Men".
* Remove all of the axis tick marks.
* Hide all of the spines.
* Set the title of the plot to "Percentage of Biology Degrees Awarded By Gender".
* Generate a legend and place it in the "upper right" location.
* Display the chart.

In [9]:
# fig, ax = plt.subplots()
# ax.plot(women_degrees['Year'], women_degrees['Biology'], c='blue', label='Women')
# ax.plot(women_degrees['Year'], 100-women_degrees['Biology'], c='green', label='Men')
# ax.tick_params(bottom="off", top="off", left="off", right="off")

# # Add your code here


So far, matplotlib has set the limits automatically for each axis and this hasn't had any negative effect on communicating our story with data. If we want to generate charts to compare multiple degree categories, the axis ranges need to be consistent. Inconsistent data ranges can distort the story our charts are telling and fool the viewer.

Edward Tufte often preaches that a good chart encourages comparison over just description. A good chart uses a consistent style for the elements that aren't directly conveying the data points. These elements are part of the non-data ink in the chart. By keeping the non-data ink as consistent as possible across multiple plots, differences in those elements stick out easily to the viewer. This is because our visual processing systems are excellent at discerning differences quickly and brings them to the front of our thought process. The similarities naturally fade to the back of our thought process.

Let's generate line charts for four STEM degree categories on a grid to encourage comparison. Our instructions for generating the chart are cumbersome. Here's what the final chart looks like, so you can refer to it as you write your code:

![](images/four_major_categories_plots.png)

We have provided starter code as comments in the code editor. Select all of these lines and press ctrl + / (PC) or ⌘ + / (Mac) to uncomment these lines so you can use it.

* Generate a line chart using the women and men percentages for `Biology` in the top left subplot.
* Generate a line chart using the women and men percentages for `Computer Science` in the top right subplot.
* Generate a line chart using the women and men percentages for `Engineering` in the bottom left subplot.
* Generate a line chart using the women and men percentages for `Math and Statistics` in the bottom right subplot.
* For all subplots:
    * For the line chart visualizing female percentages, set the line color to `"blue"` and the label to `"Women"`.
    * For the line chart visualizing male percentages, set the line color to `"green"` and the label to `"Men"`.
    * Set the x-axis limit to range from `1968` to `2011`.
    * Set the y-axis limit to range from `0` to `100`.
    * Hide all of the spines and tick marks.
    * Set the title of each subplot to the name of the major category (e.g. `"Biology"`, `"Computer Science"`).
* Place a legend in the `upper right` corner of the bottom right subplot.
* Display the plot.

In [11]:
# for sp in range(0,4):
#     ax = fig.add_subplot(2,2,sp+1)
#     ax.plot(women_degrees['Year'], women_degrees[major_cats[sp]], c='blue', label='Women')
#     ax.plot(women_degrees['Year'], 100-women_degrees[major_cats[sp]], c='green', label='Men')
#     # Add your code here.