# Custom Dumbbell Plot with Matplotlib
This notebook is purely an exercise in plot aesthetics. As such, not all states will be shown, but I will show the top 5 most Democratic and Republican states (by election margins of the 2020 presidential election) [according to Politico](https://www.politico.com/2020-election/results/president/).

NOTE: I copy and paste the entire cell at every step for those among us that want to copy and paste to look over the entire product later. Forgive the long notebook. Of course, you could do this step by step and keep editing the same Axes object in different cells. Feel free to ask questions on [Twitter](https://twitter.com/MitchsWorkshop) or in real-time on [Twitch](https://twitch.tv/MitchsWorkshop)!

### Import libraries and data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style

style.use("fivethirtyeight")

In [None]:
edu = pd.read_csv("../input/us-education-datasets-unification-project/states_all.csv")
edu.head()

### I want the last 10 years of data, what's the most recent one?

In [None]:
edu["YEAR"].max()

### 2009-2019 it is. We only need those two years for our plot.

In [None]:
edu_10yr = edu[edu["YEAR"].isin([2009,2019])]
edu_10yr

### Looks like a lot of nulls in 2019, let's check

In [None]:
edu_10yr[edu_10yr["YEAR"]==2019].isnull().sum()

### Looks like the personnel counts are missing, but the test scores are there. We can work with that. Let's subset by our most Republican and Democratic states.

In [None]:
r_states = [
    "OKLAHOMA",
    "IDAHO",
    "WYOMING",
    "NORTH_DAKOTA",
    "ARKANSAS"
]

d_states = [
    "DISTRICT_OF_COLUMBIA",
    "MASSACHUSETTS",
    "VERMONT",
    "CALIFORNIA",
    "RHODE_ISLAND"    
]

is_red = edu_10yr["STATE"].isin(r_states)
is_blue = edu_10yr["STATE"].isin(d_states)

subset = edu_10yr[is_red | is_blue] # subset of both red and blue for later use
red_states = edu_10yr[is_red]
blue_states = edu_10yr[is_blue]

print(red_states["STATE"].unique())
print(blue_states["STATE"].unique())

### Subsets complete! All the states are there. Let's start the plot.

# Default plot with FiveThirtyEight style
The plot looks fine by default, but we are going to make some nice changes later.

In [None]:
colors = {
    "republican": "#d1352a",
    "democrat": "#253db8"
}

fig, ax = plt.subplots(figsize = (15,7))

# 2009 red states
red_plot_2009 = plt.scatter(
    x = red_states.loc[red_states["YEAR"]==2009, "STATE"],
    y = red_states.loc[red_states["YEAR"]==2009, "AVG_MATH_8_SCORE"],
    c = colors["republican"],
    s = 1000,
    alpha = 0.5,
    linewidth = 2
)

# 2019 red states
red_plot_2019 = plt.scatter(
    x = red_states.loc[red_states["YEAR"]==2019, "STATE"],
    y = red_states.loc[red_states["YEAR"]==2019, "AVG_MATH_8_SCORE"],
    c = colors["republican"],
    s = 1000,
    alpha = 1,
    linewidth = 0
)

# 2009 blue states
blue_plot_2009 = plt.scatter(
    x = blue_states.loc[blue_states["YEAR"]==2009, "STATE"],
    y = blue_states.loc[blue_states["YEAR"]==2009, "AVG_MATH_8_SCORE"],
    c = colors["democrat"],
    s = 1000,
    alpha = 0.5,
    linewidth = 2
)

# 2019 blue states
blue_plot_2009 = plt.scatter(
    x = blue_states.loc[blue_states["YEAR"]==2019, "STATE"],
    y = blue_states.loc[blue_states["YEAR"]==2019, "AVG_MATH_8_SCORE"],
    c = colors["democrat"],
    s = 1000,
    alpha = 1,
    linewidth = 0
)

plt.savefig("before.png") # for twitter @MitchsWorkshop
plt.show()

# Making our changes:
### We are going to make the following style changes, in order:

- widen the y limit
- bold the x axis
- add a vertical line to split the plot by party affiliation
- label both sides of the plot with text
- change the x-tick labels
- annotate the meaning of different opacities with text
- draw the bars between circles with a `for` loop
- remove the vertical grid lines and customize the horizontal ones
- add a title
- add a subtitle
- add a signature bar
- change background colors
- remove spines
- save the figure for future Tweeting (who doesn't love a good chart after all?)

Each step is commented, with special focus on the loop that draws the bars. Feel free to ask questions!

In [None]:
colors = {
    "republican": "#d1352a",
    "democrat": "#253db8"
}

fig, ax = plt.subplots(figsize = (15,7))

# 2009 red states
plt.scatter(
    x = red_states.loc[red_states["YEAR"]==2009, "STATE"],
    y = red_states.loc[red_states["YEAR"]==2009, "AVG_MATH_8_SCORE"],
    c = colors["republican"],
    s = 1000,
    alpha = 0.5,
    linewidth = 2
)

# 2019 red states
plt.scatter(
    x = red_states.loc[red_states["YEAR"]==2019, "STATE"],
    y = red_states.loc[red_states["YEAR"]==2019, "AVG_MATH_8_SCORE"],
    c = colors["republican"],
    s = 1000,
    alpha = 1,
    linewidth = 0
)

# 2009 blue states
plt.scatter(
    x = blue_states.loc[blue_states["YEAR"]==2009, "STATE"],
    y = blue_states.loc[blue_states["YEAR"]==2009, "AVG_MATH_8_SCORE"],
    c = colors["democrat"],
    s = 1000,
    alpha = 0.5,
    linewidth = 2
)

# 2019 blue states
plt.scatter(
    x = blue_states.loc[blue_states["YEAR"]==2019, "STATE"],
    y = blue_states.loc[blue_states["YEAR"]==2019, "AVG_MATH_8_SCORE"],
    c = colors["democrat"],
    s = 1000,
    alpha = 1,
    linewidth = 0
)


### CHANGES TO DEFAULT PLOT ###

# set new y limit
plt.ylim((247, 305))

# bold x axis
plt.axhline(
    250, 
    linewidth = 3,
    color = "black",
    alpha = 0.5
)

# split plot by party
plt.axvline(
    4.5,
    linewidth = 2,
    color = "black",
    alpha = 0.5
)

# label sides of vline
# republican
plt.text(
    x = 3.4,
    y = 302,
    s = "Republican",
    color = colors["republican"],
    fontweight = "bold",
    fontsize = 14,
    alpha = 0.9
)

# democrat
plt.text(
    x = 4.7,
    y = 302,
    s = "Democrat",
    color = colors["democrat"],
    fontweight = "bold",
    fontsize = 14,
    alpha = 0.9
)

# change xtick labels
custom_labels = [
    "Arkansas", 
    "Idaho", 
    "N. Dakota", 
    "Oklahoma", 
    "Wyoming", 
    "California", 
    "D.C.", 
    "Massachusetts", 
    "R. Island", 
    "Vermont"
]
ax.set_xticklabels(custom_labels)

# label different opacities on both sides of plot
# republican 2009
plt.text(
    x = 0.22,
    y = 277.5,
    s = "2009",
    color = colors["republican"],
    alpha = 0.5,
    fontweight = "bold",
    fontsize = 16
)

# republican 2019
plt.text(
    x = 0.22,
    y = 271.5,
    s = "2019",
    color = colors["republican"],
    alpha = 1,
    fontweight = "bold",
    fontsize = 16
)

# democrat 2009
plt.text(
    x = 5.22,
    y = 271,
    s = "2009",
    color = colors["democrat"],
    alpha = 0.5,
    fontweight = "bold",
    fontsize = 16
)

# democrat 2019
plt.text(
    x = 5.22,
    y = 277,
    s = "2019",
    color = colors["democrat"],
    alpha = 1,
    fontweight = "bold",
    fontsize = 16
)

### DRAW BARS BETWEEN CIRCLES ###
"""
This loop checks to see if there is a wide enough gap to draw a line between, then draws it.
Setting the values was a matter of trial and error. If you change the dot size, these won't work.
And that's fine! Just adjust accordingly.

I originally hand-drew each of them and it took hours. I highly recommend automating this process.
This was the best way I could find to do so, but I am open to suggestions!

Note: With qualitative data in matplotlib, the different values (in this case, states) are numbered
0,1,2,... behind the scenes. So if we draw a point at (0,250), it will appear above Arkansas.
"""

# ordered list of states as they appear on the x axis
states = [
    "ARKANSAS",
    "IDAHO",
    "NORTH_DAKOTA",
    "OKLAHOMA",
    "WYOMING",
    "CALIFORNIA",
    "DISTRICT_OF_COLUMBIA",
    "MASSACHUSETTS",
    "RHODE_ISLAND",
    "VERMONT"
]

# i will be the numeric x-value on the plot
for i,state in enumerate(states):
    # subset by state
    st = subset[subset["STATE"]==state]
    
    # get scores for both years
    score_09 = st.loc[st["YEAR"]==2009, "AVG_MATH_8_SCORE"].values[0]
    score_19 = st.loc[st["YEAR"]==2019, "AVG_MATH_8_SCORE"].values[0]
    
    # find max and min score out of the two given
    big_score = max([score_09, score_19])
    small_score = min([score_09, score_19])
    
    # if there is a large enough gap, draw a line
    if big_score - small_score > 4.5: # 4.5 is the gap needed to have visible space between points
        plt.plot(
            [i,i], # x1, x2
            [small_score+2, big_score-2], # y1, y2 allowing padding to accomodate the size of the data point
            c = "#636363",
            linewidth = 3,
            alpha = 1
        )

# remove vertical grid (removing the grid first fixed some weird behavior)
plt.grid(False)
plt.grid(linewidth = 1.5, axis = "y", color = "black", alpha = 0.2)

# add title text
plt.text(
    x = -0.5,
    y = 312,
    s = "United States 8th Grade Standardized Math Scores 2009-2019",
    fontsize = 22,
    fontweight = "bold"
)

# add subtitle text
plt.text(
    x = -0.5,
    y = 308,
    s = "States shown had the widest party margin in the 2020 Presidential Election",
    fontsize = 17,
    fontweight = "regular"
)

# signature bar (white space necessary for style)
plt.text(
    x = -0.8,
    y = 237.8,
    s = "   twitch.tv/MitchsWorkshop                            Source: U.S. Census Bureau, National Center for Education Statistics  ",
    backgroundcolor = "grey",
    c = "white",
    fontsize = 17
)

# new background colors
face_color = "#d9d7d4"
ax.set_facecolor(face_color)
fig.set_facecolor(face_color)

# remove spines
for s in ["top", "left", "bottom", "right"]:
    ax.spines[s].set_visible(False)

plt.savefig("after.png", bbox_inches = "tight", facecolor = face_color) # for twitter @MitchsWorkshop
plt.show()

# Done!
### And there we have it! Want more or have questions? Stop by [my Twitch channel](https://twitch.tv/MitchsWorkshop) and ask me in real time!