<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject</div>
<span style="">MicroProject: United States Congress</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/us-congress/">https://discovery.cs.illinois.edu/microproject/us-congress/</a></div>
</h1>

<hr style="color: #DD3403;">

## Introduction

In this microproject, you will use Exploratory Data Analysis (EDA) techniques and the `groupby` function to find various statistics about the current legislators in the United States Congress.  Let's nerd out!

## Dataset: "congress-legislators"

The [@unitedstates project](https://theunitedstates.io/) maintains various high-quality datasets about the United States government.  Specifically, the `congress-legislators` dataset every member "of the United States Congress (1789-Present), congressional committees (1973-Present), committee membership (current only), and presidents and vice presidents of the United States in YAML, JSON, and CSV format."

The URL for the CSV dataset is:
```
https://theunitedstates.io/congress-legislators/legislators-current.csv
```

Load the dataset into a DataFrame called `df`:

In [0]:
...

## Show All Columns

This dataset has **A LOT** of columns!  By default, pandas will not show all of them.  By setting the `display.max_columns` value we can view all the columns:

In [0]:
pd.set_option('display.max_columns', None)
df

<hr style="color: #DD3403;">

## Puzzle 1: Splitting The House of Representatives and The Senate

In this dataset, each row is one member of Congress.  The United States Congress is bicameral, being made up of both the House of Representatives and the Senate.  In the dataset:

- Representatives in the House of Representatives represent a district, which is encoded in the `district` column.  Their `type` column is also `rep`.
- Senators are elected for six years, with one-third of the senators up for election every two years.  The `senate_class` denotes which election class a senator is in.  A seantor's `type` column is `sen`.

Create two new DataFrames, `df_house` and `df_senate`, that contain members of the House of Representatives and the Senate:

In [0]:
df_house = ...
df_house

In [0]:
df_senate = ...
df_senate

## 🔬 MicroProject Checkpoint Tests 🔬

In [0]:
### TEST CASE for Puzzle 1: Splitting The House of Representatives and The Senate
#
# What is this cell?
# - This cell contains test cases for the MicroProject. Even though you can modify this
#   cell, you should treat it like it's a read-only cell since it will be replaced with
#   a fresh version when your code is checked.
#
# - If this cell runs without any error in the output, you PASSED all test cases!
#   We try and make these test cases as useful and complete as possible, but there is
#   a chance your code may be incorrect even though you pass the test cases (these
#   tests should be seen as a way to give you confidence that code you believe is
#   actually correct, not as a robust check to catch all possible errors).
#
# - If this cell results in any errors, check you previous cells, make changes, and
#   RE-RUN your code and then re-run this cell.  Keep repeating this until the cell
#   passed with no errors! :)

tada = "\N{PARTY POPPER}"
assert( len(df) > 500 ), "Your DataFrame `df` appears to not be the full dataset?  Did you modify `df` instead of creating a new `df_house` or `df_senate`?"
assert( len(df_senate) <= 100 ), "Your DataFrame `df_senate` contains more than 100 rows, but there should be 100 (or fewer) senators."
assert( len(df_house) == len(df) - len(df_senate) ), "Your DataFrame `df_house` does not contain everyone who isn't in the senate."
print(f"{tada} Puzzle 1: All Tests Passed! {tada}")

<hr style="color: #DD3403;">

## Puzzle 2: Find the Controlling Party of the House of Representatives

In each body of Congress, the "controlling party" is the party that has the majority of the members.  Using `groupby`, create a DataFrame `df_house_party` that counts the number of members in each political party in the House of Representatives:

In [0]:
df_house_party = ...
df_house_party

### Puzzle 2 Analysis: How many Democrats and Republicans in the House?

Using the result above, fill out the following cell with the number of Democrats and Republications:

In [0]:
democrats_in_house = 0
republicans_in_house = 0

### 🔬 Checkpoint Tests 🔬

In [0]:
### TEST CASE for Puzzle 2: Find the Controlling Party of the House of Representatives
#
# What is this cell?
# - This cell contains test cases for the MicroProject. Even though you can modify this
#   cell, you should treat it like it's a read-only cell since it will be replaced with
#   a fresh version when your code is checked.
#
# - If this cell runs without any error in the output, you PASSED all test cases!
#   We try and make these test cases as useful and complete as possible, but there is
#   a chance your code may be incorrect even though you pass the test cases (these
#   tests should be seen as a way to give you confidence that code you believe is
#   actually correct, not as a robust check to catch all possible errors).
#
# - If this cell results in any errors, check you previous cells, make changes, and
#   RE-RUN your code and then re-run this cell.  Keep repeating this until the cell
#   passed with no errors! :)

tada = "\N{PARTY POPPER}"
R = df[ df.party == "Republican" ].head(1).index[0]
D = df[ df.party == "Democrat" ].head(1).index[0]
assert(len(df_house_party) == 2)
assert((df_house_party.index[0] == "Democrat" or df_house_party.index[0] == "Republican" or "party" in df.columns.tolist()))
assert(len(df) - len(df_senate) - df_house.value_counts(df.columns.tolist()[12])[df.sort_values(df.columns.tolist()[6]).iloc[R][df.columns.tolist()[12]]] == democrats_in_house)
assert(len(df) - len(df_senate) - df_house.value_counts(df.columns.tolist()[12])[df.sort_values(df.columns.tolist()[6]).iloc[D][df.columns.tolist()[12]]] == republicans_in_house)
print(f"{tada} Puzzle 2: All Tests Passed! {tada}")


<hr style="color: #DD3403;">

## Puzzle 3: Find the Youngest Member of each Party in the House of Representatives

In the House of Representatives, who is the youngest member Democrat and who is the youngest Republican?

**HINT**: To help with this, the `sort_values` function may be useful!  See our guide: ["Sorting a DataFrame Using Pandas"](https://discovery.cs.illinois.edu/guides/Modifying-DataFrames/sorting-a-dataframe-with-pandas/) to learn how to use the `sort_values` function.  You will be able to use the `sort_values` function to help solve this problem! :)


In [0]:
...

### Puzzle 3 Analysis: How many Democrats and Republicans in the House?

Using the result above, provide the **full name** (look for a column that contains `full_name`) of the youngest Democrat and Republican:

In [0]:
youngest_democrat_in_house = ""
youngest_republican_in_house = ""

<hr style="color: #DD3403;">

## 🔬 MicroProject Checkpoint Tests 🔬

In [0]:
### TEST CASE for Puzzle 3: Find the Youngest Member of each Party in the House of Representatives
#
# What is this cell?
# - This cell contains test cases for the MicroProject. Even though you can modify this
#   cell, you should treat it like it's a read-only cell since it will be replaced with
#   a fresh version when your code is checked.
#
# - If this cell runs without any error in the output, you PASSED all test cases!
#   We try and make these test cases as useful and complete as possible, but there is
#   a chance your code may be incorrect even though you pass the test cases (these
#   tests should be seen as a way to give you confidence that code you believe is
#   actually correct, not as a robust check to catch all possible errors).
#
# - If this cell results in any errors, check you previous cells, make changes, and
#   RE-RUN your code and then re-run this cell.  Keep repeating this until the cell
#   passed with no errors! :)

tada = "\N{PARTY POPPER}"
assert(df[ df.party == "Democrat" ].sort_values(df.columns.tolist()[6]).tail(1).values[0][5] == youngest_democrat_in_house)
assert(df[ df.party == "Republican" ].sort_values(df.columns.tolist()[6]).tail(1).values[0][5] == youngest_republican_in_house)
print(f"{tada} Puzzle 3: All Tests Passed! {tada}")

<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and return to https://discovery.cs.illinois.edu/microproject/us-congress/ and complete the section **"Commit and Grade Your Notebook"**.

3. If you see a 100% grade result on your GitHub Action, you've completed this MicroProject! 🎉
