## Federal Courts Project Guide

The ultimate goal of this project is to build a centralized database of federal judgeships across the 13 district appellate courts and the 96 District courts in the United States. Because of the wealth of data involved, and the fact that much of this data is Scattered across many pages and sites, the first step involves researching the domain, and developing a focus and range of data you want to obtain and make available.

Here are three possible angles:

1. Current judgeships, vacancies, and nomination proceedings: with this focus you would download tables the Recent vacancies and appointments, and go further into nomination procedures and Q&A's. This would entail a combination of scraping, conversions of PDFs, in using regular expressions to parse the PDFs.

2. Historical judgeships: with this focus you examined changes in federal judgeships over a certain period of time (perhaps 10 to 20 years). This would entail mainly the scraping of many pages and the integration of data about specific judges, ordered by district.

3. Recent Nominations and confirmations:  this to focus specifically on judges newly nominated or appointed under the current administration. The focus would be more directly on the nomination hearings (Q&As), as well as the search for other data sources regarding the judges--news articles, opinions, writings by the judges.



Your primary goal by Tuesday is to come up with a specific research question: what kind of knowledge do you want to investigate, build and make available through this project. What are the central units of analysis? What do you want to reveal about the federal courts?

Your secondary goal is to view the primary source pages and begin scraping. You do not have to have your central research question right at the beginning of the scraping, but it may help to have a direction.

You're goal by Thursday is to have a finalize architecture for your dataframe(s), any finalized list of sources that you Will scrape/obtain.

**Data Architecture**
The question of architecture is central to this project. Because of the many possible angles, and the highly decentralized state of the primary source data, there is a wide range of designs for tables, rows, columns. You may want to begin scraping some of the main pages to get more familiar with what kind of rows and columns might be involved.

**Interpretive architecture**
This depends I how focused your data frame will be. If you pick specific districts, judges and/or confirmation hearings you may want to do more human reading to assess different ways the framing the politics/legal perspective of the judge or the district's decisions. If you choose to cast a wider net data, then you will want to focus on more quantitative categories for framing this: judges age, District, background, length of appointment, length of vacancy, Number of vacancies, etc.



### Coding considerations:
While there is a great amount of data available, much of it is distributed across multiple pages, sometimes and inconsistent format. If you're interested in scraping nominations and downloading PDFs, you will need to at least briefly (or entirely) use **selenium**. If you want to use beautiful soup, you will have to download links, and the loop through multiple pages to get a complete data set--unless your focus is more specific.

### STEP 1
Scrape the first page of judicial vacancies:

http://www.uscourts.gov/judges-judgeships/judicial-vacancies/current-judicial-vacancies

#### Import your scraping libraries
#### Write your scraping code here

### STEP 2
Scrape the first page of judicial confirmations:

http://www.uscourts.gov/judges-judgeships/judicial-vacancies/confirmation-listing


### STEP 3
Investigate the judicial committee's confirmation postings:

https://www.judiciary.senate.gov/nominations/confirmed

This is relatively straightforward, except that the most interesting information is possibly PDFs of the questionnaires for each candidate. To get the PDFs you need to use selenium (see step 4), but first look this data and assess whether you think it will be useful to you. If so, I will give you the code you need to obtain the PDFs and convert them to text. You can then parse them using regular expressions.

#### Don't necessarily code here
#### Think about where you're going first
#### And read below

### STEP 4
Investigate the judicial committee's hearings on nominees: 

https://www.judiciary.senate.gov/hearings

This one is pretty tricky. It is where you can find PDFs with Q&A's from confirmation hearings. It is a multiple page scrape just to get links to various nomination pages, which then have links to PDFs, which is then have redirects to download the PDFs (you have to use selenium here). 

But before you do the scrape just go through the hearings pages by hand and click on where it says "Nominations". Look at the different Q&A's available and see if you think they will be useful to you. If they will be I can give you most of the code you will need to get the PDFs. Also, I have uploaded a file on slack of one hearings PDFs along with text conversions of them. Take a look at the text conversions, because you'll need to parse them using regular expressions.

If you are interested in more historical data, look into the information on these links:

Archives of vacancies/confirmations (if you want to build more historical data)
http://www.uscourts.gov/judges-judgeships/judicial-vacancies/archive-judicial-vacancies

Present and past judges including resumes:

Appeals courts:
https://www.fjc.gov/history/courts/u.s.-court-appeals-district-columbia-circuit-justices-and-judges

District courts:
https://www.fjc.gov/history/courts/u.s.-district-courts-and-federal-judiciary

#### Think about your focus and what your ultimate architecture should be
#### More to come...

In [1]:
import pandas as pd
import re

** I used the federal judges database which is also a source of Washington Post's article. **

https://www.fjc.gov/history/judges/biographical-directory-article-iii-federal-judges-export

In [2]:
df_judge = pd.read_csv('federal-judicial-service.csv', na_values='  ')
df_judge.head()

Unnamed: 0,nid,Sequence,Judge Name,Court Type,Court Name,Appointment Title,Appointing President,Party of Appointing President,Reappointing President,Party of Reappointing President,...,Ayes/Nays,Confirmation Date,Commission Date,"Service as Chief Judge, Begin","Service as Chief Judge, End","2nd Service as Chief Judge, Begin","2nd Service as Chief Judge, End",Senior Status Date,Termination,Termination Date
0,1394646,1,"Abrams, Leslie Joyce",U.S. District Court,U.S. District Court for the Middle District of...,Judge,Barack Obama,Democratic,,,...,100/0,2014-11-18,2014-11-20,,,,,,,
1,1393931,1,"Abrams, Ronnie",U.S. District Court,U.S. District Court for the Southern District ...,Judge,Barack Obama,Democratic,,,...,96/2,2012-03-22,2012-03-23,,,,,,,
2,1376976,1,"Abruzzo, Matthew T.",U.S. District Court,U.S. District Court for the Eastern District o...,Judge,Franklin D. Roosevelt,Democratic,,,...,,1936-02-12,1936-02-15,,,,,1966-02-15,Death,1971-05-28
3,1376981,1,"Acheson, Marcus Wilson",U.S. District Court,U.S. District Court for the Western District o...,Judge,Rutherford B. Hayes,Republican,,,...,,1880-01-14,1880-01-14,,,,,,Appointment to Another Judicial Position,1891-02-09
4,1376981,2,"Acheson, Marcus Wilson",U.S. Circuit Court (1869-1911),U.S. Circuit Courts for the Third Circuit,Judge,Benjamin Harrison,Republican,,,...,,1891-02-03,1891-02-03,,,,,,Death,1906-06-21


In [3]:
df_judge.dtypes

nid                                    int64
Sequence                               int64
Judge Name                            object
Court Type                            object
Court Name                            object
Appointment Title                     object
Appointing President                  object
Party of Appointing President         object
Reappointing President                object
Party of Reappointing President       object
ABA Rating                            object
Seat ID                               object
Statute Authorizing New Seat          object
Recess Appointment Date               object
Nomination Date                       object
Committee Referral Date               object
Hearing Date                          object
Judiciary Committee Action            object
Committee Action Date                 object
Senate Vote Type                      object
Ayes/Nays                             object
Confirmation Date                     object
Commission

** Using the regular expression, I made a new column 'Circuit' where each judge belongs to. **

In [4]:
df_judge['Circuit'] = df_judge['Court Name'].str.extract('for the ([\w\s]+)')
df_judge['Circuit'] = df_judge['Circuit'].str.replace(' Circuit', '')
df_judge['Circuit'].head()

0          Middle District of Georgia
1       Southern District of New York
2        Eastern District of New York
3    Western District of Pennsylvania
4                               Third
Name: Circuit, dtype: object

In [5]:
df_judge['Circuit'].replace(regex=[r'\bFirst\b', r'\b[\w\s]+Maine\b', r'\b[\w\s]+Massachusetts\b', r'\b[\w\s]+New Hampshire\b', r'\b[\w\s]+Rhode Island\b', r'\b[\w\s]+Puerto Rico\b'], value='1', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bSecond\b', r'\b[\w\s]+Connecticut\b', r'\b[\w\s]+New York\b', r'\b[\w\s]+Vermont\b'], value='2', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bThird\b', r'\b[\w\s]+Delaware\b', r'\b[\w\s]+New Jersey\b', r'\b[\w\s]+Pennsylvania\b', r'\b[\w\s]+Virgin Islands\b'], value='3', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bFourth\b', r'\b[\w\s]+Maryland\b', r'\b[\w\s]+North Carolina\b', r'\b[\w\s]+South Carolina\b', r'\b[\w\s]+Virginia\b', r'\b[\w\s]+West Virginia\b'], value='4', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bFifth\b', r'\b[\w\s]+Louisiana\b', r'\b[\w\s]+Mississippi\b', r'\b[\w\s]+Texas\b'], value='5', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bSixth\b', r'\b[\w\s]+Kentucky\b', r'\b[\w\s]+Michigan\b', r'\b[\w\s]+Ohio\b', r'\b[\w\s]+Tennessee\b'], value='6', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bSeventh\b', r'\b[\w\s]+Illinois\b', r'\b[\w\s]+Indiana\b', r'\b[\w\s]+Wisconsin\b'], value='7', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bEighth\b', r'\b[\w\s]+Arkansas\b', r'\b[\w\s]+Iowa\b', r'\b[\w\s]+Minnesota\b', r'\b[\w\s]+Missouri\b', r'\b[\w\s]+Nebraska\b', r'\b[\w\s]+North Dakota\b', r'\b[\w\s]+South Dakota\b'], value='8', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bNinth\b', r'\b[\w\s]+Alaska\b', r'\b[\w\s]+Arizona\b', r'\b[\w\s]+California\b', r'\b[\w\s]+Hawaii\b', r'\b[\w\s]+Idaho\b', r'\b[\w\s]+Montana\b', r'\b[\w\s]+Oregon\b', r'\b[\w\s]+Nevada\b', r'\b[\w\s]+Washington\b'], value='9', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bTenth\b', r'\b[\w\s]+Colorado\b', r'\b[\w\s]+Kansas\b', r'\b[\w\s]+New Mexico\b', r'\b[\w\s]+Oklahoma\b', r'\b[\w\s]+Utah\b', r'\b[\w\s]+Wyoming\b'], value='10', inplace=True)
df_judge['Circuit'].replace(regex=[r'\bEleventh\b', r'\b[\w\s]+Alabama\b', r'\b[\w\s]+Florida\b', r'\b[\w\s]+Georgia\b'], value='11', inplace=True)
df_judge['Circuit'].replace(regex=[r'\b[\w\s]+Columbia[\w\s]?', r'\bFederal\b'], value='12', inplace=True)
df_judge['Circuit'].replace(regex=[r'Albemarle', r'District of Orleans', r'Edenton'], value='NaN', inplace=True)
df_judge['Circuit'].value_counts()

9      562
2      404
6      397
5      392
3      378
4      332
8      332
11     327
7      285
12     217
10     211
1      166
NaN      6
Name: Circuit, dtype: int64

** Then, from the 'Ayes/Nays' column, I break into two columns, 'Ayes' and 'Nays' which counts the number of vote at the Senate confirmation. **

In [6]:
df_judge['Ayes'] = df_judge['Ayes/Nays'].str.split('/',expand=True)[0].astype(float)
df_judge['Nays'] = df_judge['Ayes/Nays'].str.split('/', expand=True)[1].astype(float)
df_judge.head()

Unnamed: 0,nid,Sequence,Judge Name,Court Type,Court Name,Appointment Title,Appointing President,Party of Appointing President,Reappointing President,Party of Reappointing President,...,"Service as Chief Judge, Begin","Service as Chief Judge, End","2nd Service as Chief Judge, Begin","2nd Service as Chief Judge, End",Senior Status Date,Termination,Termination Date,Circuit,Ayes,Nays
0,1394646,1,"Abrams, Leslie Joyce",U.S. District Court,U.S. District Court for the Middle District of...,Judge,Barack Obama,Democratic,,,...,,,,,,,,11,100.0,0.0
1,1393931,1,"Abrams, Ronnie",U.S. District Court,U.S. District Court for the Southern District ...,Judge,Barack Obama,Democratic,,,...,,,,,,,,2,96.0,2.0
2,1376976,1,"Abruzzo, Matthew T.",U.S. District Court,U.S. District Court for the Eastern District o...,Judge,Franklin D. Roosevelt,Democratic,,,...,,,,,1966-02-15,Death,1971-05-28,2,,
3,1376981,1,"Acheson, Marcus Wilson",U.S. District Court,U.S. District Court for the Western District o...,Judge,Rutherford B. Hayes,Republican,,,...,,,,,,Appointment to Another Judicial Position,1891-02-09,3,,
4,1376981,2,"Acheson, Marcus Wilson",U.S. Circuit Court (1869-1911),U.S. Circuit Courts for the Third Circuit,Judge,Benjamin Harrison,Republican,,,...,,,,,,Death,1906-06-21,3,,


In [7]:
import requests
import json
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize

In [8]:
with open('US_12_Dist.json') as json_data:
    geometry_data = json.load(json_data)

In [9]:
df = pd.DataFrame.from_dict(json_normalize(geometry_data['features']), orient='columns')
df['properties.name'] = df['properties.District_N'].str.replace('District of Columbia', '12')
df['properties.name'] = df['properties.name'].astype(int)
df['properties.headline'] = ['Eleventh Circuit', 'Ninth Circuit', 'Eighth Circuit', 'Tenth Circuit', 'Second Circuit', 'Fourth Circuit', 'District of Colombia', 'Seventh Circuit', 'Sixth Circuit', 'Fifth Circuit', 'First Circuit', 'Third Circuit']
df

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit


** First, I analaysed the Trump administration nominee. **

In [10]:
df_Trump = df_judge[df_judge['Appointing President'] == 'Donald J. Trump']
Trump_Nays = df_Trump.groupby('Circuit').mean()['Nays']
Trump_Nays

Circuit
10    26.000000
11    15.200000
12    15.750000
3     43.000000
4     14.000000
5     21.375000
6     19.555556
7     22.250000
8     30.333333
9      0.000000
Name: Nays, dtype: float64

In [11]:
Trump = pd.DataFrame(Trump_Nays)
Trump.to_csv('Trump.csv')
Trump = pd.read_csv('Trump.csv')
Trump.rename(index=str, columns={"Circuit": "properties.name", "Nays": "properties.article"}, inplace=True)
Trump

Unnamed: 0,properties.name,properties.article
0,10,26.0
1,11,15.2
2,12,15.75
3,3,43.0
4,4,14.0
5,5,21.375
6,6,19.555556
7,7,22.25
8,8,30.333333
9,9,0.0


In [12]:
df_merge = pd.merge(df, Trump, how='outer', on=['properties.name','properties.name'])
df_merge

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,15.2
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,0.0
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,30.333333
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,26.0
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,14.0
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,15.75
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,22.25
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,19.555556
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,21.375


In [13]:
def color(row):
    if row['properties.article'] > 40:
        return '#FC4E2A'
    elif row['properties.article'] > 30:
        return '#FD8D3C'
    elif row['properties.article'] > 20:
        return '#FEB24C'
    elif row['properties.article'] > 10:
        return '#FED976'
    else:
        return '#FFEDA0'

In [14]:
df_merge['properties.color'] = df_merge.apply(color, axis=1)

In [15]:
df_merge['properties.group_id'] = 0
df_merge['properties.group_name'] = 'Trump'
df_merge.head()

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color,properties.group_id,properties.group_name
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,15.2,#FED976,0,Trump
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,0.0,#FFEDA0,0,Trump
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,30.333333,#FD8D3C,0,Trump
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,26.0,#FEB24C,0,Trump
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,,#FFEDA0,0,Trump


** Then, compare with other presidencies. **

In [16]:
df_Obama = df_judge[df_judge['Appointing President'] == 'Barack Obama']
Obama_Nays = df_Obama.groupby('Circuit').mean()['Nays']
Obama_Nays

Circuit
1     13.000000
10     3.583333
11     1.789474
12    11.333333
2      9.730769
3     11.187500
4      8.842105
5      0.250000
6     11.125000
7      8.307692
8      7.428571
9     12.809524
Name: Nays, dtype: float64

In [17]:
Obama = pd.DataFrame(Obama_Nays)
Obama.to_csv('Obama.csv')
Obama = pd.read_csv('Obama.csv')
Obama.rename(index=str, columns={"Circuit": "properties.name", "Nays": "properties.article"}, inplace=True)
Obama

Unnamed: 0,properties.name,properties.article
0,1,13.0
1,10,3.583333
2,11,1.789474
3,12,11.333333
4,2,9.730769
5,3,11.1875
6,4,8.842105
7,5,0.25
8,6,11.125
9,7,8.307692


In [18]:
df_merge2 = pd.merge(df, Obama, how='outer', on=['properties.name','properties.name'])
df_merge2

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,1.789474
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,12.809524
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,7.428571
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,3.583333
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.730769
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,8.842105
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,11.333333
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,8.307692
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,11.125
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,0.25


In [19]:
df_merge2['properties.color'] = df_merge2.apply(color, axis=1)
df_merge2

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,1.789474,#FFEDA0
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,12.809524,#FED976
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,7.428571,#FFEDA0
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,3.583333,#FFEDA0
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.730769,#FFEDA0
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,8.842105,#FFEDA0
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,11.333333,#FED976
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,8.307692,#FFEDA0
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,11.125,#FED976
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,0.25,#FFEDA0


In [20]:
df_merge2['properties.group_id'] = 1
df_merge2['properties.group_name'] = 'Obama'
df_merge2

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color,properties.group_id,properties.group_name
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,1.789474,#FFEDA0,1,Obama
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,12.809524,#FED976,1,Obama
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,7.428571,#FFEDA0,1,Obama
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,3.583333,#FFEDA0,1,Obama
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.730769,#FFEDA0,1,Obama
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,8.842105,#FFEDA0,1,Obama
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,11.333333,#FED976,1,Obama
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,8.307692,#FFEDA0,1,Obama
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,11.125,#FED976,1,Obama
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,0.25,#FFEDA0,1,Obama


In [21]:
df_complete = df_merge.append(df_merge2, sort=True)
df_complete

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,properties.article,properties.color,properties.group_id,properties.group_name,properties.headline,properties.name,type
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,15.2,#FED976,0,Trump,Eleventh Circuit,11,Feature
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,0.0,#FFEDA0,0,Trump,Ninth Circuit,9,Feature
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,30.333333,#FD8D3C,0,Trump,Eighth Circuit,8,Feature
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,26.0,#FEB24C,0,Trump,Tenth Circuit,10,Feature
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,,#FFEDA0,0,Trump,Second Circuit,2,Feature
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,14.0,#FED976,0,Trump,Fourth Circuit,4,Feature
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,15.75,#FED976,0,Trump,District of Colombia,12,Feature
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,22.25,#FEB24C,0,Trump,Seventh Circuit,7,Feature
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,19.555556,#FED976,0,Trump,Sixth Circuit,6,Feature
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,21.375,#FEB24C,0,Trump,Fifth Circuit,5,Feature


In [22]:
df_Bush = df_judge[df_judge['Appointing President'] == 'George W. Bush']
Bush_Nays = df_Bush.groupby('Circuit').mean()['Nays']
Bush_Nays

Circuit
1      0.000000
10     5.352941
11     2.647059
12    14.714286
2      0.000000
3      1.565217
4      3.142857
5      3.904762
6      5.368421
7      3.000000
8      4.800000
9      0.526316
Name: Nays, dtype: float64

In [23]:
Bush = pd.DataFrame(Bush_Nays)
Bush.to_csv('Bush.csv')
Bush = pd.read_csv('Bush.csv')
Bush.rename(index=str, columns={"Circuit": "properties.name", "Nays": "properties.article"}, inplace=True)
Bush

Unnamed: 0,properties.name,properties.article
0,1,0.0
1,10,5.352941
2,11,2.647059
3,12,14.714286
4,2,0.0
5,3,1.565217
6,4,3.142857
7,5,3.904762
8,6,5.368421
9,7,3.0


In [24]:
df_merge3 = pd.merge(df, Bush, how='outer', on=['properties.name','properties.name'])
df_merge3

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,2.647059
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,0.526316
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,4.8
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.352941
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,0.0
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,3.142857
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,14.714286
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,3.0
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,5.368421
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,3.904762


In [25]:
df_merge3['properties.color'] = df_merge3.apply(color, axis=1)
df_merge3

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,2.647059,#FFEDA0
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,0.526316,#FFEDA0
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,4.8,#FFEDA0
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.352941,#FFEDA0
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,0.0,#FFEDA0
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,3.142857,#FFEDA0
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,14.714286,#FED976
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,3.0,#FFEDA0
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,5.368421,#FFEDA0
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,3.904762,#FFEDA0


In [26]:
df_merge3['properties.group_id'] = 2
df_merge3['properties.group_name'] = 'Bush'
df_merge3

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color,properties.group_id,properties.group_name
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,2.647059,#FFEDA0,2,Bush
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,0.526316,#FFEDA0,2,Bush
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,4.8,#FFEDA0,2,Bush
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.352941,#FFEDA0,2,Bush
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,0.0,#FFEDA0,2,Bush
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,3.142857,#FFEDA0,2,Bush
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,14.714286,#FED976,2,Bush
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,3.0,#FFEDA0,2,Bush
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,5.368421,#FFEDA0,2,Bush
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,3.904762,#FFEDA0,2,Bush


In [27]:
df_complete = df_complete.append(df_merge3, sort=True)
df_complete

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,properties.article,properties.color,properties.group_id,properties.group_name,properties.headline,properties.name,type
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,15.2,#FED976,0,Trump,Eleventh Circuit,11,Feature
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,0.0,#FFEDA0,0,Trump,Ninth Circuit,9,Feature
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,30.333333,#FD8D3C,0,Trump,Eighth Circuit,8,Feature
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,26.0,#FEB24C,0,Trump,Tenth Circuit,10,Feature
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,,#FFEDA0,0,Trump,Second Circuit,2,Feature
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,14.0,#FED976,0,Trump,Fourth Circuit,4,Feature
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,15.75,#FED976,0,Trump,District of Colombia,12,Feature
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,22.25,#FEB24C,0,Trump,Seventh Circuit,7,Feature
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,19.555556,#FED976,0,Trump,Sixth Circuit,6,Feature
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,21.375,#FEB24C,0,Trump,Fifth Circuit,5,Feature


In [28]:
df_Clinton = df_judge[df_judge['Appointing President'] == 'William J. Clinton']
Clinton_Nays = df_Clinton.groupby('Circuit').mean()['Nays']
Clinton_Nays

Circuit
1           NaN
10     5.000000
11    12.666667
12    16.000000
2      9.857143
3      8.833333
4      1.000000
5      8.000000
6      0.250000
7      0.500000
8      1.333333
9     17.066667
Name: Nays, dtype: float64

In [29]:
Clinton = pd.DataFrame(Clinton_Nays)
Clinton.to_csv('Clinton.csv')
Clinton = pd.read_csv('Clinton.csv')
Clinton.rename(index=str, columns={"Circuit": "properties.name", "Nays": "properties.article"}, inplace=True)
Clinton

Unnamed: 0,properties.name,properties.article
0,1,
1,10,5.0
2,11,12.666667
3,12,16.0
4,2,9.857143
5,3,8.833333
6,4,1.0
7,5,8.0
8,6,0.25
9,7,0.5


In [30]:
df_merge4 = pd.merge(df, Clinton, how='outer', on=['properties.name','properties.name'])
df_merge4

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,12.666667
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,17.066667
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,1.333333
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.0
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.857143
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,1.0
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,16.0
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,0.5
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,0.25
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,8.0


In [31]:
df_merge4['properties.color'] = df_merge4.apply(color, axis=1)
df_merge4

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,12.666667,#FED976
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,17.066667,#FED976
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,1.333333,#FFEDA0
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.0,#FFEDA0
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.857143,#FFEDA0
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,1.0,#FFEDA0
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,16.0,#FED976
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,0.5,#FFEDA0
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,0.25,#FFEDA0
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,8.0,#FFEDA0


In [32]:
df_merge4['properties.group_id'] = 3
df_merge4['properties.group_name'] = 'Clinton'
df_merge4

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,type,properties.name,properties.headline,properties.article,properties.color,properties.group_id,properties.group_name
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,Feature,11,Eleventh Circuit,12.666667,#FED976,3,Clinton
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,Feature,9,Ninth Circuit,17.066667,#FED976,3,Clinton
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,Feature,8,Eighth Circuit,1.333333,#FFEDA0,3,Clinton
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,Feature,10,Tenth Circuit,5.0,#FFEDA0,3,Clinton
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,Feature,2,Second Circuit,9.857143,#FFEDA0,3,Clinton
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,Feature,4,Fourth Circuit,1.0,#FFEDA0,3,Clinton
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,Feature,12,District of Colombia,16.0,#FED976,3,Clinton
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,Feature,7,Seventh Circuit,0.5,#FFEDA0,3,Clinton
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,Feature,6,Sixth Circuit,0.25,#FFEDA0,3,Clinton
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,Feature,5,Fifth Circuit,8.0,#FFEDA0,3,Clinton


In [33]:
df_complete = df_complete.append(df_merge4, sort=True)
df_complete

Unnamed: 0,geometry.coordinates,geometry.type,properties.District_N,properties.article,properties.color,properties.group_id,properties.group_name,properties.headline,properties.name,type
0,"[[[[-87.9870452879999, 35.0075187680001], [-86...",MultiPolygon,11,15.2,#FED976,0,Trump,Eleventh Circuit,11,Feature
1,"[[[[-109.044883728, 36.9986305240002], [-109.0...",MultiPolygon,9,0.0,#FFEDA0,0,Trump,Ninth Circuit,9,Feature
2,"[[[[-89.7169418329998, 36.0015182500001], [-89...",MultiPolygon,8,30.333333,#FD8D3C,0,Trump,Eighth Circuit,8,Feature
3,"[[[-104.052841187, 41.00169754], [-102.9998245...",Polygon,10,26.0,#FEB24C,0,Trump,Tenth Circuit,10,Feature
4,"[[[[-71.9642639159999, 41.3409652710002], [-71...",MultiPolygon,2,,#FFEDA0,0,Trump,Second Circuit,2,Feature
5,"[[[[-75.5417556759999, 39.4506607060001], [-75...",MultiPolygon,4,14.0,#FED976,0,Trump,Fourth Circuit,4,Feature
6,"[[[-77.0261611939999, 38.801475525], [-77.0201...",Polygon,District of Columbia,15.75,#FED976,0,Trump,District of Colombia,12,Feature
7,"[[[[-90.2371749879999, 41.6840248110001], [-90...",MultiPolygon,7,22.25,#FEB24C,0,Trump,Seventh Circuit,7,Feature
8,"[[[[-82.5927886959998, 38.4185943600001], [-82...",MultiPolygon,6,19.555556,#FED976,0,Trump,Sixth Circuit,6,Feature
9,"[[[[-90.8935928339999, 29.0467777250001], [-90...",MultiPolygon,5,21.375,#FEB24C,0,Trump,Fifth Circuit,5,Feature


In [34]:
ok_json = json.loads(df_complete.to_json(orient='records'))

In [35]:
def process_to_geojson(file):
    geo_data = {"type": "FeatureCollection", "features":[]}
    for row in file:
        this_dict = {"type": "Feature", "properties":{}, "geometry": {}}
        for key, value in row.items():
            key_names = key.split('.')
            if key_names[0] == 'geometry':
                this_dict['geometry'][key_names[1]] = value
            if str(key_names[0]) == 'properties':
                this_dict['properties'][key_names[1]] = value
        geo_data['features'].append(this_dict)
    return geo_data

In [36]:
geo_format = process_to_geojson(ok_json)

In [37]:
with open('geo-data.js', 'w') as outfile:
    json.dump(geo_format, outfile)