# Load NY Bills

Need to loop over all of the legislation (10,000s) by 1,000 at a time. Extract the bill IDs, and then extract the bill text one-by-one. After retrieving the bill text, store it to a database on AWS with some associated metadata.

## First, load the API key

In [1]:
import requests
import time
my_key = open('/Users/joeljoel/ny_bill_keys.txt', 'r').readline().strip()

In [3]:
# Set up the database to save the results of the new york bill table
# There will be one table for the New York bills and one for U.S. bills
## Python packages - you may have to pip install sqlalchemy, sqlalchemy_utils, and psycopg2.
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
import psycopg2
import pandas as pd

In [4]:
#In Python: Define a database name
dbname = 'bills_db'
username = 'joeljoel'
## 'engine' is a connection to a database
## Here, we're using postgres, but sqlalchemy can connect to other things too.
engine = create_engine('postgresql://%s@localhost/%s'%(username,dbname))
print(engine.url)

## create a database (if it doesn't exist)
if not database_exists(engine.url):
    create_database(engine.url)
print(database_exists(engine.url))

postgresql://joeljoel@localhost/bills_db
True


## Create a base class for all relevant tables connected to New York bills

In [5]:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

  Base = declarative_base()


## Define the New York Bill table with a unique number, a name, and text

In [6]:
from sqlalchemy import Column, Integer, String
class New_York_Bill(Base):
    __tablename__ = 'ny_bills'
    bill_num = Column(String, primary_key=True)
    bill_name = Column(String)
    bill_text = Column(String)

    def __repr__(self):
        return "<New_York_Bill(bill_num='%s', bill_name='%s', bill_text='%s')>" % (
            self.bill_num, self.bill_name, self.bill_text)

In [7]:
ny_bills_table = New_York_Bill.__table__

In [8]:
# Actually create the table
Base.metadata.create_all(engine)

In [9]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()

In [10]:
# ny_bills_table.drop(engine)
# This seems painful. Drop the table from the command line before running the command below.

In [11]:
#requests.get('http://legislation.nysenate.gov/api/3/bills/2015/A02257?view=only_fullText&key=' + my_key).json()

## Access and store NY bills

In [12]:
# Run through a loop getting files 1,000 at a time until we receive all files
offset = 0
year = 2015
limit = 1000
# limit = 10
key = my_key
my_max = 50000
# my_max = 50
# all requests can be accessed using the nysenate
request_string = 'http://legislation.nysenate.gov/api/3/bills/{0}?limit={1}&offset={2}&key={3}'.format(year, 
                                                                                                        limit, 
                                                                                                        offset,
                                                                                                        key)
# The list of the first n bills (based on limit) bills; bring them in a connect them to json
all_bills = requests.get(request_string).json()
# print(all_bills)
start_time = time.time()
right_now = start_time
while ((all_bills['responseType'] == 'bill-info list') and offset < my_max):
    # print(all_bills['offsetStart'])
    # update offset to get the next 1000 bills
    offset += limit
    request_string = 'http://legislation.nysenate.gov/api/3/bills/{0}?limit={1}&offset={2}&key={3}'.format(year, 
                                                                                                        limit, 
                                                                                                        offset,
                                                                                                        key)
    all_bills = requests.get(request_string).json()
    
    if (all_bills['responseType'] == 'bill-info list'):
        # unfortunately, we need to access the text of the bills one-by-one
        for i, bill in enumerate(all_bills['result']['items']):
            bill_num = bill['printNo']
            single_request = 'http://legislation.nysenate.gov/api/3/bills/{0}/{1}?view=only_fullText&key={2}'.format(
            year, bill_num, my_key)
            bill_data = requests.get(single_request).json()
            bill_text = bill_data['result']['fullText']
            
            # prepare the bill for upload into the table
            one_bill = New_York_Bill(bill_num=bill_num, bill_name=bill['title'], bill_text=bill_text)
            session.add(one_bill)
            # we may break the api if we move too quickly so pause for a tenth of a second
            time.sleep(.01)
            if i % 100 == 0:
                last_one = right_now
                right_now = time.time()
                print(i, bill_num, "delta", right_now - last_one, "cum delta", right_now - start_time)
    time.sleep(2)
    session.commit()

0 A454A delta 3.930670976638794 cum delta 3.930670976638794
100 S628 delta 85.08962512016296 cum delta 89.02029609680176
200 S722 delta 85.84051990509033 cum delta 174.8608160018921
300 A549 delta 85.78867602348328 cum delta 260.64949202537537
400 A628 delta 87.2665159702301 cum delta 347.91600799560547
500 S855 delta 85.38441205024719 cum delta 433.30042004585266
600 S935 delta 86.1633870601654 cum delta 519.4638071060181
700 A746 delta 84.55093789100647 cum delta 604.0147449970245
800 J22 delta 85.22870182991028 cum delta 689.2434468269348
900 J66 delta 84.82132029533386 cum delta 774.0647671222687
0 A908 delta 87.78514575958252 cum delta 861.8499128818512
100 A982 delta 87.8155951499939 cum delta 949.6655080318451
200 S1056 delta 84.06251215934753 cum delta 1033.7280201911926
300 S1097 delta 84.70354199409485 cum delta 1118.4315621852875
400 A1156A delta 86.0862627029419 cum delta 1204.5178248882294
500 A1232A delta 87.80303716659546 cum delta 1292.3208620548248
600 S1209 delta 82.7

## Once bills are all loaded in Postgresql, explore a bit

In [13]:
from sqlalchemy import text
result = session.query(New_York_Bill).from_statement(text("SELECT * FROM ny_bills"))

In [14]:
all_bills = result.all()

In [15]:
len(all_bills)

25613

In [16]:
all_bills[0]

<New_York_Bill(bill_num='A454A', bill_name='Relates to the effectiveness of flexible rating for nonbusiness automobile insurance plans', bill_text='
                           S T A T E   O F   N E W   Y O R K
       ________________________________________________________________________

                                          454

                              2015-2016 Regular Sessions

                                 I N  A S S E M B L Y

                                      (PREFILED)

                                    January 7, 2015
                                      ___________

       Introduced  by  M.  of  A.  CYMBROWITZ  -- read once and referred to the
         Committee on Insurance

       AN ACT to amend the insurance law, in relation to the  effectiveness  of
         flexible rating for nonbusiness automobile insurance plans

         THE  PEOPLE OF THE STATE OF NEW YORK, REPRESENTED IN SENATE AND ASSEM-
       BLY, DO ENACT AS FOLLOWS:

    1    Section 1. 

In [17]:
all_bills[-1]

<New_York_Bill(bill_num='A10753', bill_name='Relates to the appointment of interpreters to be used in parole board proceedings', bill_text='
                           S T A T E   O F   N E W   Y O R K
       ________________________________________________________________________

                                         10753

                                 I N  A S S E M B L Y

                                   December 23, 2016
                                      ___________

       Introduced  by  COMMITTEE ON RULES -- (at request of M. of A. Sepulveda)
         -- read once and referred to the Committee on Codes

       AN ACT to amend the executive law, in relation  to  the  appointment  of
         interpreters to be used in parole board proceedings

         THE  PEOPLE OF THE STATE OF NEW YORK, REPRESENTED IN SENATE AND ASSEM-
       BLY, DO ENACT AS FOLLOWS:

    1    Section 1. Subdivision 8 of section 259-i of  the  executive  law,  as
    2  added  by  a  chapter  of

In [18]:
session.close()