<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 1: Standardized Test Analysis

--- 
# Part 1

Part 1 requires knowledge of basic Python.

---

## Problem Statement

The new format for the SAT was released in March 2016. Since then, levels of participation in multiple states have changed with varying legislative decisions. This project aims to explore trends in SAT and ACT participation for the years 2017-2019 in two different levels. First, it seeks to identify states that have decreasing SAT participation rates and secondly, argue the change in rates of participation for three different groups of states located in a same geographic area.

### Contents:
- [Background](#Background)
- [Data Import & Cleaning](#Data-Import-and-Cleaning)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [Data Visualization](#Visualize-the-Data)
- [Conclusions and Recommendations](#Conclusions-and-Recommendations)

## Background

The SAT and ACT are standardized tests that many colleges and universities in the United States require for their admissions process. This score is used along with other materials such as grade point average (GPA) and essay responses to determine whether or not a potential student will be accepted to the university.

The SAT has two sections of the test: Evidence-Based Reading and Writing and Math ([*source*](https://www.princetonreview.com/college/sat-sections)). The ACT has 4 sections: English, Mathematics, Reading, and Science, with an additional optional writing section ([*source*](https://www.act.org/content/act/en/products-and-services/the-act/scores/understanding-your-scores.html)). They have different score ranges, which you can read more about on their websites or additional outside sources (a quick Google search will help you understand the scores for each test):
* [SAT](https://collegereadiness.collegeboard.org/sat)
* [ACT](https://www.act.org/content/act/en.html)

Participation and scores vary from state to state. For instance, Central states are known for top scores in SAT but medium scores in ACT and the reason has been always an interesting topic especially among students from other states.The fact is that there are different approches toward standardized tests in different states.

### Choose your Data

There are 10 datasets included in the [`data`](./data/) folder for this project. You are required to pick **at least two** of these to complete your analysis. Feel free to use more than two if you would like, or add other relevant datasets you find online.

* [`act_2017.csv`](./data/act_2017.csv): 2017 ACT Scores by State
* [`act_2018.csv`](./data/act_2018.csv): 2018 ACT Scores by State
* [`act_2019.csv`](./data/act_2019.csv): 2019 ACT Scores by State
* [`act_2019_ca.csv`](./data/act_2019_ca.csv): 2019 ACT Scores in California by School
* [`sat_2017.csv`](./data/sat_2017.csv): 2017 SAT Scores by State
* [`sat_2018.csv`](./data/sat_2018.csv): 2018 SAT Scores by State
* [`sat_2019.csv`](./data/sat_2019.csv): 2019 SAT Scores by State
* [`sat_2019_by_intended_college_major.csv`](./data/sat_2019_by_intended_college_major.csv): 2019 SAT Scores by Intended College Major
* [`sat_2019_ca.csv`](./data/sat_2019_ca.csv): 2019 SAT Scores in California by School
* [`sat_act_by_college.csv`](./data/sat_act_by_college.csv): Ranges of Accepted ACT & SAT Student Scores by Colleges

### Chosen Datasets
* [`act_2017.csv`](./data/act_2017.csv): 2017 ACT Scores by State
* [`act_2018.csv`](./data/act_2018.csv): 2018 ACT Scores by State
* [`act_2019.csv`](./data/act_2019.csv): 2019 ACT Scores by State
* [`sat_2017.csv`](./data/sat_2017.csv): 2017 SAT Scores by State
* [`sat_2018.csv`](./data/sat_2018.csv): 2018 SAT Scores by State
* [`sat_2019.csv`](./data/sat_2019.csv): 2019 SAT Scores by State

A column of names of all states, the rate of participation and scores of each test for each state for the specific year.

### Outside Research

Based on your problem statement and your chosen datasets, spend some time doing outside research on state policies or additional information that might be relevant. Summarize your findings below. If you bring in any outside tables or charts, make sure you are explicit about having borrowed them. If you quote any text, make sure that it renders as being quoted. **Make sure that you cite your sources.**

If you break down SAT scores by state, you’ll find that the Midwest outperforms the rest of the nation—and it’s not close. Illinois boasts the highest average score (1807*), while North Dakota (1799), Michigan (1782), Minnesota (1780), Missouri (1773), and Wisconsin (1771) fill out the top five. Meanwhile, Idaho finishes last (1364), while Maine (1380) and South Carolina (1436) round out the bottom three. All scores are out of 2400 and are from 2013. For reference, the average national score in 2013 was 1498. For comparison, here is heat map of SAT scores by state, where darker is a higher average score:
![Average SAT score by state](../imgs/Average_SAT_Score_by_state_640_px.jpg)
So the Midwest rules the SAT, but apparently, no other test. And it gets even more bizarre: the SAT is unpopular in the Midwest, where almost everyone takes the ACT instead.
Granted, about 20% of students across America end of up taking both the SAT and ACT, but it’s much more common to choose just one exam or the other. In the Midwest, that choice is typically the ACT. Apparently, only the best students are taking the SAT in the Midwest. The ~20% of students who take both the SAT and ACT are often the most ambitious. The students who want to exhaust every possible option before turning in their college applications. Less motivated students are much more likely to take the ACT once, then put standardized testing behind them. The Midwest might win in points per test-taker, but the fact of the matter is the participation rate is less than other states.
[source](https://www.forbes.com/sites/bentaylor/2014/07/17/why-the-midwest-dominates-the-sat/)

### Coding Challenges

1. Manually calculate mean:

    Write a function that takes in values and returns the mean of the values. Create a list of numbers that you test on your function to check to make sure your function works!
    
    *Note*: Do not use any mean methods built-in to any Python libraries to do this! This should be done without importing any additional libraries.

In [1]:
# Code: 
def calculate_mean(ls):
    s = 0
    for i in ls:
        s += i
    return s/len(ls)
calculate_mean([1,2,3,4,5])

3.0

2. Manually calculate standard deviation:

    The formula for standard deviation is below:

    $$\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2}$$

    Where $x_i$ represents each value in the dataset, $\mu$ represents the mean of all values in the dataset and $n$ represents the number of values in the dataset.

    Write a function that takes in values and returns the standard deviation of the values using the formula above. Hint: use the function you wrote above to calculate the mean! Use the list of numbers you created above to test on your function.
    
    *Note*: Do not use any standard deviation methods built-in to any Python libraries to do this! This should be done without importing any additional libraries.

In [2]:
# Code:
def sigma(ls):
    s = 0
    for i in ls:
        s += (i - calculate_mean(ls))**2
    return round((s/len(ls))**0.5,2)
sigma([1,2,3,4,5]) 

1.41

3. Data cleaning function:
    
    Write a function that takes in a string that is a number and a percent symbol (ex. '50%', '30.5%', etc.) and converts this to a float that is the decimal approximation of the percent. For example, inputting '50%' in your function should return 0.5, '30.5%' should return 0.305, etc. Make sure to test your function to make sure it works!

You will use these functions later on in the project!

In [3]:
# Code:
def clean(with_symbol):
    return float(with_symbol[:-1])/100
clean('30.5%')

0.305

--- 
# Part 2

Part 2 requires knowledge of Pandas, EDA, data cleaning, and data visualization.

---

*All libraries used should be added here*

In [4]:
# Imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data Import and Cleaning

### Data Import & Cleaning

Import the datasets that you selected for this project and go through the following steps at a minimum. You are welcome to do further cleaning as you feel necessary:
1. Display the data: print the first 5 rows of each dataframe to your Jupyter notebook.
2. Check for missing values.
3. Check for any obvious issues with the observations (keep in mind the minimum & maximum possible values for each test/subtest).
4. Fix any errors you identified in steps 2-3.
5. Display the data types of each feature.
6. Fix any incorrect data types found in step 5.
    - Fix any individual values preventing other columns from being the appropriate type.
    - If your dataset has a column of percents (ex. '50%', '30.5%', etc.), use the function you wrote in Part 1 (coding challenges, number 3) to convert this to floats! *Hint*: use `.map()` or `.apply()`.
7. Rename Columns.
    - Column names should be all lowercase.
    - Column names should not contain spaces (underscores will suffice--this allows for using the `df.column_name` method to access columns in addition to `df['column_name']`).
    - Column names should be unique and informative.
8. Drop unnecessary rows (if needed).
9. Merge dataframes that can be merged.
10. Perform any additional cleaning that you feel is necessary.
11. Save your cleaned and merged dataframes as csv files.

In [5]:
# Code:
act_2017 = pd.read_csv('F:/GA/Projects/project-1/data/act_2017.csv')
act_2018 = pd.read_csv('F:/GA/Projects/project-1/data/act_2018.csv')
act_2019 = pd.read_csv('F:/GA/Projects/project-1/data/act_2019.csv')
sat_2017 = pd.read_csv('F:/GA/Projects/project-1/data/sat_2017.csv')
sat_2018 = pd.read_csv('F:/GA/Projects/project-1/data/sat_2018.csv')
sat_2019 = pd.read_csv('F:/GA/Projects/project-1/data/sat_2019.csv')

In [6]:
#Keep only columns and rows I need and reassign
act_2017 = act_2017.loc[:,['State','Participation']]
act_2017 = act_2017.iloc[1:,:]
act_2018 = act_2018.loc[:,['State','Participation']]
act_2019= act_2019.loc[:,['State','Participation']]
sat_2017 = sat_2017.loc[:,['State','Participation']]
sat_2018 = sat_2018.loc[:,['State','Participation']]
sat_2019 = sat_2019.loc[:,['State','Participation Rate']]

In [7]:
#Check for missing values
act.isna().sum()
sat.isna().sum()

NameError: name 'act' is not defined

In [None]:
#types
act_2017.dtypes

In [None]:
#fixing errors
act_2018['State'].replace({'District of columbia':'District of Columbia'}, inplace=True)

In [None]:
#fixing errors
act_2018.drop(19, axis=0, inplace=True)
#df[df['column']=='__'].index

In [None]:
#merging dataframes 2017 and 2018
act_1718 = pd.merge(act_2017, act_2018, how="outer",on='State')

In [None]:
act_2019.drop(51, axis=0, inplace=True)

In [None]:
#merging 2017, 2018 and 2019
act = pd.merge(act_1718, act_2019, how="outer", on="State")

In [None]:
act.rename(columns={'Participation_x':'Participation_2017',
                    'Participation_y':'Participation_2018',
                    'Participation':'Participation_2019'}, 
           inplace=True)

In [None]:
#cleaning SATs
sat_2019.drop([39, 47], axis=0, inplace=True)
sat_2019.head()

In [None]:
#merging SATs
sat_1718 = pd.merge(sat_2017,sat_2018, how="outer", on="State")

In [None]:
sat = pd.merge(sat_1718, sat_2019, how="outer", on="State")

In [None]:
#rename columns
sat.rename(columns={'Participation_x':'Participation_2017','Participation_y':'Participation_2018','Participation Rate':'Participation_2019'}, inplace=True)

In [None]:
#lowercase Names
act.columns = act.columns.str.lower()
sat.columns = sat.columns.str.lower()

In [None]:
act.head()

In [None]:
act.participation_2017 = act.participation_2017.apply(clean)
act.participation_2018 = act.participation_2018.apply(clean)
act.participation_2019 = act.participation_2019.apply(clean)
sat.participation_2017 = sat.participation_2017.apply(clean)
sat.participation_2018 = sat.participation_2018.apply(clean)
sat.participation_2019 = sat.participation_2019.apply(clean)


In [None]:
#save as a csv file
act.to_csv('F:\GA\Projects\project-1\data\ACT.csv', index=False)

In [None]:
#merging act and sat together
df = pd.merge(act, sat, on='state')

In [None]:
#rename columns
df.rename(columns = {'participation_2017_x':'act_participation_2017',
                     'participation_2018_x':'act_participation_2018',
                     'participation_2019_x':'act_participation_2019',
                     'participation_2017_y':'sat_participation_2017',
                     'participation_2018_y':'sat_participation_2018',
                     'participation_2019_y':'sat_participation_2019'},
                      inplace=True)

In [None]:
df.head()

### Data Dictionary

Now that we've fixed our data, and given it appropriate names, let's create a [data dictionary](http://library.ucmerced.edu/node/10249). 

A data dictionary provides a quick overview of features/variables/columns, alongside data types and descriptions. The more descriptive you can be, the more useful this document is.

Example of a Fictional Data Dictionary Entry: 

|Feature|Type|Dataset|Description|
|---|---|---|---|
|**county_pop**|*integer*|2010 census|The population of the county (units in thousands, where 2.5 represents 2500 people).| 
|**per_poverty**|*float*|2010 census|The percent of the county over the age of 18 living below the 200% of official US poverty rate (units percent to two decimal places 98.10 means 98.1%)|

[Here's a quick link to a short guide for formatting markdown in Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html).

Provided is the skeleton for formatting a markdown table, with columns headers that will help you create a data dictionary to quickly summarize your data, as well as some examples. **This would be a great thing to copy and paste into your custom README for this project.**

*Note*: if you are unsure of what a feature is, check the source of the data! This can be found in the README.



|__Feature__|__Type__|__Dataset__|__Description__|
|---|---|---|---|
|__state__|object|ACT/SAT|Name of the the states|
|act_particitation_2017|float|ACT|percentage of participation in ACT in 2017|
|act_particitation_2018|float|ACT|percentage of participation in ACT in 2018|
|act_particitation_2019|float|ACT|percentage of participation in ACT in 2019|
|sat_particitation_2017|float|SAT|percentage of participation in SAT in 2017|
|sat_particitation_2018|float|SAT|percentage of participation in SAT in 2018|
|sat_particitation_2019|float|SAT|percentage of participation in SAT in 2019|
|states_geo|object|ACT/SAT|'E': eastern state, 'W': western state, 'C': central state|

## Exploratory Data Analysis

Complete the following steps to explore your data. You are welcome to do more EDA than the steps outlined here as you feel necessary:
1. Summary Statistics.
2. Use a **dictionary comprehension** to apply the standard deviation function you create in part 1 to each numeric column in the dataframe.  **No loops**.
    - Assign the output to variable `sd` as a dictionary where: 
        - Each column name is now a key 
        - That standard deviation of the column is the value 
        - *Example Output :* `{'ACT_Math': 120, 'ACT_Reading': 120, ...}`
3. Investigate trends in the data.
    - Using sorting and/or masking (along with the `.head()` method to avoid printing our entire dataframe), consider questions relevant to your problem statement. Some examples are provided below (but feel free to change these questions for your specific problem):
        - Which states have the highest and lowest participation rates for the 2017, 2019, or 2019 SAT and ACT?
        - Which states have the highest and lowest mean total/composite scores for the 2017, 2019, or 2019 SAT and ACT?
        - Do any states with 100% participation on a given test have a rate change year-to-year?
        - Do any states show have >50% participation on *both* tests each year?
        - Which colleges have the highest median SAT and ACT scores for admittance?
        - Which California school districts have the highest and lowest mean test scores?
    - **You should comment on your findings at each step in a markdown cell below your code block**. Make sure you include at least one example of sorting your dataframe by a column, and one example of using boolean filtering (i.e., masking) to select a subset of the dataframe.

In [None]:
#2:dictionary comprehension
#{act.column:sigma(act.column)for column in act.to_dict()}

{k:sigma(v) for (k,v) in df.iloc[:,1:].to_dict('list').items()}


In [None]:
#summary statistics
df.describe()

In [None]:
# #Code:
# #Which states have the highest and lowest participation rates for the 2017, 2019, or 2019 SAT and ACT?
# #min
# df[df['act_participation_2017'] == min(df.act_participation_2017)]
# #ACT min 2017: Maine

# df[df['act_participation_2018'] == min(df.act_participation_2018)]
# #ACT min 2018: Maine

# df[df['act_participation_2019'] == min(df.act_participation_2019)]
# #ACT min 2019: Maine

# df[df['sat_participation_2017'] == min(df.sat_participation_2017)]
# #SAT min 2017: Iowa, Mississippi, North Dakota

# df[df['sat_participation_2018'] == min(df.sat_participation_2018)]
# #SAT min 2018: North Dakota

# df[df['sat_participation_2019'] == min(df.sat_participation_2019)]
# #SAT min 2019: North Dakota


In [None]:
#Which states have the highest and lowest participation rates for the 2017, 2019, or 2019 SAT and ACT?
#min
for col in df.columns:
    if "participation" in col:   
        s = df.loc[df[df[col] == min(df[col])].index[0]]['state']
        print(f'min_{col}: {s}')

In [None]:
#Which states have the highest and lowest participation rates for the 2017, 2019, or 2019 SAT and ACT?
#max
for col in df.columns:
    if "participation" in col:   
        s = df.loc[df[df[col] == max(df[col])].index[0]]['state']
        print(f'max_{col}: {s}')

Which states have the highest and lowest participation rates for the 2017, 2019, or 2019 SAT and ACT?

|__Year__|__State of Minimum Participation in ACT__|__State of Maximum Participation in ACT__|
|---|---|---|
|2017|Maine|Alabama|
|2018|Maine|Alabama|
|2019|Maine|Alabama|




|__Year__|__State of Minimum Participation in SAT__|__State of Maximum Participation in SAT__|
|---|---|---|
|2017|    Iowa    |Connecticut|
|2018|North Dakota|Colorado|
|2019|North Dakota|Colorado|


In [None]:
df.head()

In [None]:
#identify states that have decreasing SAT participation rates.
df[(df['sat_participation_2017'] > df['sat_participation_2018']) & (df['sat_participation_2018'] > df['sat_participation_2019'])]


In [None]:
#identify states that have decreasing ACT participation rates.
(df[(df['act_participation_2017'] > df['act_participation_2018']) & (df['act_participation_2018'] > df['act_participation_2019'])])
#23 states

There are 3 states with decreasing ACT participation rates, and 26 states with decreasing SAT participation rates.


In [None]:
#sorting ACT participation of 2017 for all states(sorting by a column)
df.loc[: ,['state','act_participation_2017']].sort_values('act_participation_2017').head()

Maine, New Hampshire and Delaware have the three lowest rates of ACT participation among all states in 2017.

In [None]:
#make 3 groups of eastern, westren and central states and make a new column which shows this feature for each row in column state 
east = ['Connecticut', 'Delaware', 'Florida', 'Georgia', 'Maine', 'Maryland', 'Massachusetts', 'New Hampshire', 'New York', 'New Jersey', 'North Carolina', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'Vermont', 'Virginia', 'West Virginia', 'District of Columbia']
west = ['Alaska', 'Arizona', 'California', 'Hawaii', 'Idaho', 'Montana', 'Nevada', 'New Mexico', 'Oregon', 'Utah', 'Washington', 'Wyoming']
center = ['North Dakota', 'South Dakota', 'Nebraska', 'Kansas', 'Oklahoma', 'Texas', 'Minnesota', 'Iowa', 'Missouri', 'Arkansas', 'Louisiana', 'Wisconsin', 'Illinois', 'Michigan', 'Indiana', 'Ohio', 'Kentucky', 'Tennessee', 'West Virginia', 'Mississippi', 'Alabama']
print(set(east).intersection(set(center)))
print(set(east).intersection(set(west)))
print(set(west).intersection(set(center)))

[source of westren states](https://en.wikipedia.org/wiki/Western_United_States)

[source of eastern states](https://en.wikipedia.org/wiki/Eastern_United_States)

[source of central states](https://en.wikipedia.org/wiki/Central_United_States)

In [None]:
#clean data
center.remove('West Virginia')

In [None]:
#assign west, east and center to each state and make a list
states_geo = []

for row in df['state']:
    if row in east:
        states_geo.append('E')
    elif row in west:
        states_geo.append('W')
    else:
        states_geo.append('C')


In [None]:
#make a new column
df['states_geo'] = states_geo
df.head()

In [None]:
#mean of ACT participation rate of each geographic area for each year
df_geo_act = df.groupby('states_geo')['act_participation_2017', 'act_participation_2018', 'act_participation_2019'].mean()
df_geo_act

In [None]:
#mean of SAT participation rate of each geographic area for each year
df_geo_sat = df.groupby('states_geo')['sat_participation_2017', 'sat_participation_2018', 'sat_participation_2019'].mean()
df_geo_sat

In [None]:
#fixing issues, make a column for states_geo and reassign it
df_geo_act = df_geo_act.reset_index()
df_geo_sat = df_geo_sat.reset_index()

## Visualize the Data

There's not a magic bullet recommendation for the right number of plots to understand a given dataset, but visualizing your data is *always* a good idea. Not only does it allow you to quickly convey your findings (even if you have a non-technical audience), it will often reveal trends in your data that escaped you when you were looking only at numbers. It is important to not only create visualizations, but to **interpret your visualizations** as well.

**Every plot should**:
- Have a title
- Have axis labels
- Have appropriate tick labels
- Text is legible in a plot
- Plots demonstrate meaningful and valid relationships
- Have an interpretation to aid understanding

Here is an example of what your plots should look like following the above guidelines. Note that while the content of this example is unrelated, the principles of visualization hold:

![](https://snag.gy/hCBR1U.jpg)
*Interpretation: The above image shows that as we increase our spending on advertising, our sales numbers also tend to increase. There is a positive correlation between advertising spending and sales.*

---

Here are some prompts to get you started with visualizations. Feel free to add additional visualizations as you see fit:
1. Use Seaborn's heatmap with pandas `.corr()` to visualize correlations between all numeric features.
    - Heatmaps are generally not appropriate for presentations, and should often be excluded from reports as they can be visually overwhelming. **However**, they can be extremely useful in identify relationships of potential interest (as well as identifying potential collinearity before modeling).
    - Please take time to format your output, adding a title. Look through some of the additional arguments and options. (Axis labels aren't really necessary, as long as the title is informative).
2. Visualize distributions using histograms. If you have a lot, consider writing a custom function and use subplots.
    - *OPTIONAL*: Summarize the underlying distributions of your features (in words & statistics)
         - Be thorough in your verbal description of these distributions.
         - Be sure to back up these summaries with statistics.
         - We generally assume that data we sample from a population will be normally distributed. Do we observe this trend? Explain your answers for each distribution and how you think this will affect estimates made from these data.
3. Plot and interpret boxplots. 
    - Boxplots demonstrate central tendency and spread in variables. In a certain sense, these are somewhat redundant with histograms, but you may be better able to identify clear outliers or differences in IQR, etc.
    - Multiple values can be plotted to a single boxplot as long as they are of the same relative scale (meaning they have similar min/max values).
    - Each boxplot should:
        - Only include variables of a similar scale
        - Have clear labels for each variable
        - Have appropriate titles and labels
4. Plot and interpret scatter plots to view relationships between features. Feel free to write a custom function, and subplot if you'd like. Functions save both time and space.
    - Your plots should have:
        - Two clearly labeled axes
        - A proper title
        - Colors and symbols that are clear and unmistakable
5. Additional plots of your choosing.
    - Are there any additional trends or relationships you haven't explored? Was there something interesting you saw that you'd like to dive further into? It's likely that there are a few more plots you might want to generate to support your narrative and recommendations that you are building toward. **As always, make sure you're interpreting your plots as you go**.

In [None]:
# Code
df.corr()

In [None]:
#Does Not make sense for this dataframes and problem statement
plt.figure(figsize = (16,9))
corr = df.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(corr, mask=mask, square=True, cmap='gist_earth_r', annot=True, vmin=-1, vmax=1);

In [None]:
df.head()

In [None]:
#size
plt.figure(figsize=(10,5))

plt.bar(df_geo_act['states_geo'], df_geo_act['act_participation_2017'], label='2017', width=0.5)
plt.bar(df_geo_act['states_geo'], df_geo_act['act_participation_2018'], label='2018', width=0.5)
plt.bar(df_geo_act['states_geo'], df_geo_act['act_participation_2019'], label='2019', width=0.5)
plt.legend()
# Create a descriptive title
plt.title('ACT Participation Rate')
# Add axis labels
plt.xlabel('Central, East and West States')
plt.ylabel('Mean of Participation Rates');


In [None]:
#size
plt.figure(figsize=(10,5))

plt.bar(df_geo_sat['states_geo'], df_geo_sat['sat_participation_2019'], label='2019', width=0.5)
plt.bar(df_geo_sat['states_geo'], df_geo_sat['sat_participation_2018'], label='2018', width=0.5)
plt.bar(df_geo_sat['states_geo'], df_geo_sat['sat_participation_2017'], label='2017', width=0.5)
plt.legend()
# Create a descriptive title
plt.title('SAT Participation Rate')
# Add axis labels
plt.xlabel('Central, East and West States')
plt.ylabel('Mean of Participation Rates');

In [None]:
plt.plot(df_geo_act['states_geo'], df_geo_act['act_participation_2017'], label='2017')
plt.plot(df_geo_act['states_geo'], df_geo_act['act_participation_2018'], label='2018')
plt.plot(df_geo_act['states_geo'], df_geo_act['act_participation_2019'], label='2019')
plt.legend();

## Conclusions and Recommendations

Based on your exploration of the data, what are you key takeaways and recommendations? Make sure to answer your question of interest or address your problem statement here.

**To-Do:** *Edit this cell with your conclusions and recommendations.*

Don't forget to create your README!

**To-Do:** *If you combine your problem statement, data dictionary, brief summary of your analysis, and conclusions/recommendations, you have an amazing README.md file that quickly aligns your audience to the contents of your project.* Don't forget to cite your data sources!