# Perform 5
In this assignment, you will demonstrate your ability to write SQL queries to pull data from a relational database. When you finish please go to Kernel --> Restart and Run All, and then double check that your notebook looks correct before saving and submitting your .ipynb file (the notebook file) on gradescope.

## The congress database
A visual schema of the congress database is also available as a pdf along with this practice - we encourage you to start by taking a look to get familiar. This database contains the history of the members of the United States congress through the 115th congress (the data end by 2019) as well as a good deal of voting data from 2015-2016. The visual schema shows each table in the database as a yellow box with the table name at the top and the column names listed below. The arrows showing which keys/identifiers match between different tables for join operations. At the bottom you can see previews of the tables. One in particular, the `cur_members` table, contains data about members of the 115th congress (note - everywhere you see us refer to "current" members below, we are referring to those in the `cur_members` table). Many, though not all, of these members are still serving as of the 117th congress, which began in January 2021.

## How to use SQL and report your results

You are welcome to use any of the following for running your SQL queries in this assignment.

1. The SQLite command line tool, or
2. The basic Python sqlite3 library, or
3. Python sqlite3 and Pandas.

If you use options 2 or 3, you should simply include your code (including the Strings containing your SQL queries) in this notebook and print your results. We recommend using multiline strings for your queries for readability; remember that multiline strings in Python are enclosed by triple quotes for example, 
```
my_multiline_string = """Hi 
There!"""
```

If you use option 1, we recommend that you work with a two window setup: one window with the SQLite command line tool open and connected to the database, and another window open with a plain text editor where you write and edit your queries. You can either execute those queries by saving them as plain text files and using the `.read` command or by simply copying them into the SQLite command line tool. When you are finished, you should copy *both* your SQL queries *and* your results to this notebook to submit. One easy way to do this in a Markdown cell in your notebook is use triple ticks \`\`\` before and after where you copy/paste your SQL query and your results; this indicates to Markdown not to change the formatting of what you write. An example is shown below (double click into this Markdown cell to see how it is written).

```
SELECT id, first_name, last_name
FROM cur_members
WHERE type='sen'
    AND party='Democrat'
LIMIT 3;
```

```
B000944|Sherrod|Brown
C000127|Maria|Cantwell
C000141|Benjamin|Cardin
``` 

## Questions

In [1]:
# Run but do not modify the following code
# to import sqlite3 and pandas, and to connect
# to the congress database
import sqlite3
import pandas as pd
conn = sqlite3.connect("congress")

### Question 1
Who are all of the members of congress from North Carolina (`NC`) in the `cur_members` table? Show all of the information about them from the `cur_members` table.

In [2]:
query = """SELECT*
FROM cur_members
WHERE state='NC';
"""
pd.read_sql(query, conn)

Unnamed: 0,id,first_name,last_name,gender,birthday,religion,type,party,state
0,B001135,Richard,Burr,M,1955-11-30,Methodist,sen,Republican,NC
1,B001251,George,Butterfield,M,1947-04-27,Baptist,rep,Democrat,NC
2,F000450,Virginia,Foxx,F,1943-06-29,Roman Catholic,rep,Republican,NC
3,J000255,Walter,Jones,M,1943-02-10,Catholic,rep,Republican,NC
4,M001156,Patrick,McHenry,M,1975-10-22,,rep,Republican,NC
5,P000523,David,Price,M,1940-08-17,Baptist,rep,Democrat,NC
6,H001067,Richard,Hudson,M,1971-11-04,,rep,Republican,NC
7,P000606,Robert,Pittenger,M,1948-08-15,,rep,Republican,NC
8,M001187,Mark,Meadows,M,1959-07-28,,rep,Republican,NC
9,H001065,George,Holding,M,1968-04-17,,rep,Republican,NC


### Question 2
List the five youngest female (`gender = 'F'`) members of the Congress from the `cur_members` table in descending order of `birthday`. Show all information about them from the `cur_members` table.

In [4]:
query = """SELECT*
FROM cur_members
WHERE gender = 'F'
ORDER BY birthday DESC
LIMIT 5;
"""
pd.read_sql(query, conn)

Unnamed: 0,id,first_name,last_name,gender,birthday,religion,type,party,state
0,S001196,Elise,Stefanik,F,1984-07-02,,rep,Republican,NY
1,G000571,Tulsi,Gabbard,F,1981-04-12,,rep,Democrat,HI
2,H001056,Jaime,Herrera Beutler,F,1978-11-03,,rep,Republican,WA
3,M001202,Stephanie,Murphy,F,1978-09-16,,rep,Democrat,FL
4,B001300,Nanette,Barragán,F,1976-09-15,,rep,Democrat,CA


### Question 3
Show the average, minimum, and maximum age (in years) of representatives (i.e., `type = 'rep'`). Show the same for senators (i.e., `type = 'sen'`). Compute ages in years as of Feb. 22, 2021 (`'2021-02-22'`). You may compute the age of an individual as simply `'2021-02-22' - birthday`.

Note that this is not *exactly* correct due to date rounding error. A more precise calculation of age in years using [sqlite date and time functions](https://sqlite.org/lang_datefunc.html) looks like `(julianday('2021-02-22')-julianday(birthday)) / 365.25`. We will accept solutions using either, you will get very similar results.

In [87]:
query1 = """SELECT type, MIN('2021-02-22' - birthday) AS minage, MAX('2021-02-22' - birthday) AS maxage, AVG('2021-02-22' - birthday) as avgage
    FROM cur_members
    WHERE (type='sen' OR
        type='rep')
    GROUP BY type;

"""
pd.read_sql(query1, conn)

Unnamed: 0,type,minage,maxage,avgage
0,rep,37,92,62.656818
1,sen,44,88,66.74


### Question 4
Which parties (`party`) have had at least 50 senators (`sen`) past and present? For each, show the `party` and the number of senators (`sen`) they have had. 

Hint: Be sure not to double count persons who served multiple terms in the senate. You can use `COUNT(DISTINCT column)` to get the number of distinct values on given `column`.

In [29]:
query1 = """SELECT party, COUNT(DISTINCT person_id) AS count
FROM person_roles
WHERE type='sen'
GROUP BY party
HAVING count>=50;
"""
pd.read_sql(query1, conn)

Unnamed: 0,party,count
0,Democrat,842
1,Federalist,74
2,Jackson,68
3,Republican,877
4,Whig,74


### Question 5
Among the past and present members members of congress who cast at least one vote in the `person_votes` table (i.e., the members whose `person_id`s appear at all in the `person_votes` table), which five members cast the fewest votes? Show their `first_name`s, `last_name`s, and the number of votes they cast in the `person_votes` table. 

In [54]:
query1 = """SELECT p.first_name,p.last_name, COUNT(pv.person_id) AS numvotes
    FROM person_votes AS pv, persons AS p
    WHERE p.id=pv.person_id
    GROUP BY person_id
    ORDER BY numvotes ASC
    LIMIT 5;

"""
pd.read_sql(query1, conn)

Unnamed: 0,first_name,last_name,numvotes
0,John,Boehner,17
1,James,Comer,47
2,Dwight,Evans,47
3,Colleen,Hanabusa,47
4,Alan,Nunnelee,51


### Question 6
In this question, we would like to consider the change in the gender representation of the US House of Representatives over time. Find all of the `start_date`s from the `person_roles` table in which at least 40 women began terms as representatives in the house (i.e., `type='rep'`). Show the `start_date`s and the number of women beginning terms on those dates. Order the results by `start_date` from oldest to most recent.

In [70]:
query1 = """SELECT pr.start_date, COUNT() AS numwomen
    FROM person_roles AS pr, persons AS p
    WHERE p.id=pr.person_id
        AND pr.type='rep'
        AND p.gender='F'
    GROUP BY pr.start_date
    HAVING numwomen>=40;
"""
pd.read_sql(query1, conn)

Unnamed: 0,start_date,numwomen
0,1993-01-05,48
1,1995-01-04,48
2,1997-01-07,53
3,1999-01-06,58
4,2001-01-03,61
5,2003-01-07,62
6,2005-01-04,68
7,2007-01-04,74
8,2009-01-06,78
9,2011-01-05,75
