# Homework #3

Overall rules:

- Refrain from using code comments to explain what has been done. Document your steps by writing appropriate markdown cells in your notebook.
- Avoid duplicating code. Do not copy and paste code from one cell to another. If copying and pasting is necessary, write a suitable function for the task at hand and call that function.
- When providing parameters to a function, never use global variables. Instead, always pass parameters explicitly and always make use of local variables.
- Document your use of LLM models (ChatGPT, Claude, Code Pilot etc). Either take screenshots of your steps and include them with this notebook, or give me a full log (both questions and answers) in a markdown file named `HW3-LLM-LOG.md`.

Failure to adhere to these guidelines will result in a 25-point deduction for each infraction.

## The Dataset

For this homework, we are going to use the [data warehouse](https://clerk.house.gov/Votes/) for the [US House of Representatives](https://www.house.gov/). The data server has data on each vote going back to 1990. The voting information is in XML format. For example, the code below pulls the data for the 2nd roll call from 1990 Congress.


In [1]:
from urllib.request import urlopen
import xmltodict

with urlopen('https://clerk.house.gov/evs/1990/roll002.xml') as url:
    raw = xmltodict.parse(url.read())

raw

{'rollcall-vote': {'vote-metadata': {'majority': 'D',
   'congress': '101',
   'session': '2nd',
   'chamber': 'U.S. House of Representatives',
   'rollcall-num': '2',
   'legis-num': 'MOTION',
   'vote-question': 'On Approving the Journal',
   'vote-type': 'YEA-AND-NAY',
   'vote-result': 'Passed',
   'action-date': '24-Jan-1990',
   'action-time': {'@time-etz': '14:25', '#text': '2:25 PM'},
   'vote-desc': None,
   'vote-totals': {'totals-by-party-header': {'party-header': 'Party',
     'yea-header': 'Yeas',
     'nay-header': 'Nays',
     'present-header': 'Answered “Present”',
     'not-voting-header': 'Not Voting'},
    'totals-by-party': [{'party': 'Republican',
      'yea-total': '78',
      'nay-total': '87',
      'present-total': '2',
      'not-voting-total': '8'},
     {'party': 'Democratic',
      'yea-total': '234',
      'nay-total': '2',
      'present-total': '1',
      'not-voting-total': '19'},
     {'party': 'Independent',
      'yea-total': '0',
      'nay-total': 

### Pull all the data from 1990 to 2023, and store it for questions below.

## Q1

1. Not all of the roll calls are votes. For example, some roll calls are QUORUMs (yoklama). For each year, find out the legislator and his/her state who were absent the most.
2. For each year and for each state find out how many legislators there are. For example, in 1990 California had 45 legislators while Vermont had 1.
3. Create a data frame with the following columns:
   - Year
   - State
   - Name of the Legislator
   - His/her party affiliation (Democrat/Republican/Independent)
   - Number of times he/she voted
   - Number of times he/she did not vote
5. Find out who is the longest serving legislator in the US House representative.

## Q2

For this question, we are going to measure polarization in the US Congress.

For this specific vote example above, the YEAS and NAYS are tabulated as follows:


|             |  YEAs   |  NAYs  |
|-------------|---------|--------|
| Democrats   |    234  |     2  |
| Republicans |     78  |    87  |


We are going to measure **polarization** by the following formula

$$ \frac{|\text{Difference in YEAs}| + |\text{Difference in NAYs}|}{\text{Total number of votes}} $$

For this particular vote the polarization is

$$ \frac{|234-78|+|2-87|}{234+78+2+87} \approx 0.6 $$

1. Measure polarization for each roll call and store it in a data frame with the date information.
2. Plot the results against time.
3. Analyze the results. Did polarization increase, decrease, or stayed the same?

## Q3

For this question, we are going to measure if each legislator voted along his/her part or voted against the party lines. For example, in the example above there are two Democratic legislators broke the party line and voted NAY while 234 other Democrats voted YEA. Those legislators were Jacobs from Indiana and Schroeder from Colorado.

1. For each legislator and for each year, find out the number of times they voted in total.
2. For each legislator and for each year, find out the number of times they voted along the party lines, and the number of times they broke the party line.
3. For each year and for each party, count the number of legislators that never broke the party line in that year.
4. For each year, list the top 5 legislators (and their party affiliation) that broke the party line the most.


## Q4

For this question, we are going to look at the text of each vote question. For example, the vote question for the example roll call is 'On Approving the Journal'. This is an open-ended question, and you must design an experiment and choose a specific machine learning algorithm to find out the answer if there is one.

1. For each party, find out if there are specific issues that they prefer voting 'YEA' or 'NAY'. For example, it is widely believed that Democrats vote 'NAY' on issues restricting abortion while Republicans vote 'YEAH'. For this question you are looking for a quantifiable connection between the text of the vote question and the likelihood of each party voting 'YEA' or 'NAY'.

2. Now, do the same for each legislator to find out the issues each legislator cares about in each year.