In [1]:
import pandas as pd # works like tidydata
import numpy as np # works with matrices
import requests # conducts web transactions
import json # works with json style data
import dotenv # works with .env files
import os # allows for operating system level commands
import re # regular expressions

API documentation: https://legiscan.com/gaits/documentation/legiscan

Step 1: Find the registration for the API key

Step 2: Bring that key into the code in a way that does NOT copy-paste the key into this file

In [2]:
dotenv.load_dotenv() # finds and loads (silently) the .env file
legiscan_key = os.getenv('legiscan_key')

Step 3: Use the API key to access the data we want

An API is a URL constructed generally as: root / endpoint ? parameters

1. Find the right root
2. Find the right endpoint (this one isn't using endpoints, it's using a parameter called 'op')
3. Find the right parameters
4. Learn how this API wants us to supply the API key

In [3]:
root = 'https://api.legiscan.com'
params = {'key': legiscan_key,
         'op': 'getBill',
         'id': '1167968'}
r = requests.get(root, params=params)
r

<Response [200]>

In [4]:
myjson = json.loads(r.text)

In [5]:
pd.json_normalize(myjson, record_path = ['bill','texts'])

Unnamed: 0,doc_id,date,type,type_id,mime,mime_id,url,state_link,text_size,text_hash,alt_bill_text,alt_mime,alt_mime_id,alt_state_link,alt_text_size,alt_text_hash
0,1868195,2019-01-23,Introduced,1,application/pdf,2,https://legiscan.com/MD/text/SB181/id/1868195,https://mgaleg.maryland.gov/2019RS/bills/sb/sb...,85467,423ba752efdfa002d991006e2b358a7f,0,,0,,0,
1,1917357,2019-02-19,Engrossed,4,application/pdf,2,https://legiscan.com/MD/text/SB181/id/1917357,https://mgaleg.maryland.gov/2019RS/bills/sb/sb...,86322,d23002e61f4eba5cfb4e3c4d25890b18,0,,0,,0,
2,2034836,2019-06-07,Chaptered,6,application/pdf,2,https://legiscan.com/MD/text/SB181/id/2034836,https://mgaleg.maryland.gov/2019RS/Chapters_no...,77961,083666935344581572f6df42e7fe6c5c,0,,0,,0,


The next two questions:

1. How to find the bill ID in a systematic and automated way
2. How to find machine readable bill text without webscraping or pulling off a PDF if at all possible

To do:
* Install packages for Python on your computer. Type on the command line:
    * pip3 install pandas
    * pip3 install numpy
    * pip3 install requests
    * pip3 install python-dotenv
    * pip3 install jupyter
    * pip3 install jupyterlab
* Open the terminal, use cd to move into the folder where you want to save the legisan_api folder. Then type: git clone https://github.com/jkropko/legiscan_api
* On the command line, type: jupyter lab (this launches Jupyter lab) then open the api_access.ipynb file
* On the command line type: touch .env
* Then: open .env
* Inside the .env file type legiscan_key=.... where the dots are your own key. Then save
* Then run everything in legiscan_api to check it works

## How to find the bill ID in a systematic and automated way


In [6]:
#https://api.legiscan.com/?key=APIKEY&op=getSessionList
params = {'key': legiscan_key,
         'op': 'getSessionList'}
r = requests.get(root, params=params)
r

<Response [200]>

In [7]:
myjson = json.loads(r.text)
session_df = pd.json_normalize(myjson, record_path = ['sessions'])

In [8]:
session_df

Unnamed: 0,session_id,state_id,year_start,year_end,prefile,sine_die,prior,special,session_tag,session_title,session_name,dataset_hash,session_hash,name
0,2148,1,2025,2025,1,0,0,0,Regular Session,2025 Regular Session,2025 Regular Session,931bedc30216cb40bd36155181e3ea66,931bedc30216cb40bd36155181e3ea66,2025 Regular Session
1,2103,1,2024,2024,0,1,1,0,Regular Session,2024 Regular Session,Regular Session 2024,8163cf95b36859f0f327c0576e729221,8163cf95b36859f0f327c0576e729221,Regular Session 2024
2,2060,1,2023,2023,0,1,1,1,2nd Special Session,2023 2nd Special Session,Second Special Session 2023,c9c74795f8e51bd3b8e4da60ce85d424,c9c74795f8e51bd3b8e4da60ce85d424,Second Special Session 2023
3,2048,1,2023,2023,0,1,1,1,1st Special Session,2023 1st Special Session,First Special Session 2023,72239fe0272a1f14b4f5de3370c32172,72239fe0272a1f14b4f5de3370c32172,First Special Session 2023
4,2014,1,2023,2023,0,1,1,0,Regular Session,2023 Regular Session,Regular Session 2023,d17b533ce7196680956ab4bbbb4cfb99,d17b533ce7196680956ab4bbbb4cfb99,Regular Session 2023
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
885,1435,52,2017,2018,0,1,1,0,Regular Session,2017-2018 Regular Session,115th Congress,0f28f2be28536b4e03913d72f88a5451,0f28f2be28536b4e03913d72f88a5451,115th Congress
886,1156,52,2015,2016,0,1,1,0,Regular Session,2015-2016 Regular Session,114th Congress,84eb8ad3003a508f91ad26247ad731ef,84eb8ad3003a508f91ad26247ad731ef,114th Congress
887,1026,52,2013,2014,0,1,1,0,Regular Session,2013-2014 Regular Session,113th Congress,3c139f1f45314574d030410e6acb24d0,3c139f1f45314574d030410e6acb24d0,113th Congress
888,84,52,2011,2012,0,1,1,0,Regular Session,2011-2012 Regular Session,112th Congress,0474b6bbcefe5106043ce4161d446ba0,0474b6bbcefe5106043ce4161d446ba0,112th Congress


In [9]:
session_ids = session_df['session_id']
session_ids

0      2148
1      2103
2      2060
3      2048
4      2014
       ... 
885    1435
886    1156
887    1026
888      84
889      77
Name: session_id, Length: 890, dtype: int64

In [10]:
session_ids[0]

2148

In [11]:
params = {'key': legiscan_key,
         'op': 'getMasterList',
         'id': session_ids[0]}
r = requests.get(root, params=params)
r

<Response [200]>

In [12]:
myjson = json.loads(r.text)

In [13]:
#myjson

In [14]:
myjson = myjson['masterlist']

In [15]:
del myjson['session']

In [16]:
bill_df = pd.DataFrame(myjson).T

In [17]:
bill_df

Unnamed: 0,bill_id,number,change_hash,url,status_date,status,last_action_date,last_action,title,description
0,1886114,HB1,f57d9a6d3695793c45d5f3542e0a936e,https://legiscan.com/AL/bill/HB1/2025,2025-02-04,1,2025-02-04,"Pending House Ports, Waterways & Intermodal Tr...","Seafood, to assess a fee on seafood dealers de...","Seafood, to assess a fee on seafood dealers de..."
1,1886274,HB2,9dca43a78d8c1dcaa115a86572931b5c,https://legiscan.com/AL/bill/HB2/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,"Vaccines, parental consent for minor to receiv...","Vaccines, parental consent for minor to receiv..."
2,1886289,HB3,b9415781e0e4b077cf3fdafbf8789b88,https://legiscan.com/AL/bill/HB3/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,Crimes and offenses; conviction of illegal ali...,Crimes and offenses; conviction of illegal ali...
3,1886100,HB4,56ffa299090fcee18f61127ba19e5fb6,https://legiscan.com/AL/bill/HB4/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,"Crimes and offenses, further provides for obsc...","Crimes and offenses, further provides for obsc..."
4,1886187,HB5,e12d32ead83183811b7ebc55c84a17ed,https://legiscan.com/AL/bill/HB5/2025,2025-02-04,1,2025-02-04,Pending House Ways and Means General Fund,Alabama State Law Enforcement Agency; salary a...,Alabama State Law Enforcement Agency; salary a...
...,...,...,...,...,...,...,...,...,...,...
64,1885910,SB7,86e88436358829044acdbb8e9da5b400,https://legiscan.com/AL/bill/SB7/2025,2025-02-04,1,2025-02-04,Pending Senate Judiciary,Elections; Alabama Voting Rights Act Commissio...,Elections; Alabama Voting Rights Act Commissio...
65,1886012,SB8,ca931228c107045ccdf0d5b28a9e2c2d,https://legiscan.com/AL/bill/SB8/2025,2025-02-04,1,2025-02-04,Pending Senate Education Policy,Teacher certification; American History and Ci...,Teacher certification; American History and Ci...
66,1886486,SB9,5fb0f8da770a152c25c402ffb58cb49a,https://legiscan.com/AL/bill/SB9/2025,2025-02-04,1,2025-02-04,Pending Senate Judiciary,"Alabama Athletic Commission, Attorney General ...","Alabama Athletic Commission, Attorney General ..."
67,1886487,SB10,ddc551aaf8551e8f8f20b026a6ce5e7e,https://legiscan.com/AL/bill/SB10/2025,2025-02-04,1,2025-02-04,Pending Senate Healthcare,"Alabama Clean Indoor Air Act, renamed Vivian D...","Alabama Clean Indoor Air Act, renamed Vivian D..."


In [18]:
bill_ids = bill_df['bill_id']

In [19]:
bill_ids

0     1886114
1     1886274
2     1886289
3     1886100
4     1886187
       ...   
64    1885910
65    1886012
66    1886486
67    1886487
68    1888831
Name: bill_id, Length: 69, dtype: object

In [20]:
bill_df

Unnamed: 0,bill_id,number,change_hash,url,status_date,status,last_action_date,last_action,title,description
0,1886114,HB1,f57d9a6d3695793c45d5f3542e0a936e,https://legiscan.com/AL/bill/HB1/2025,2025-02-04,1,2025-02-04,"Pending House Ports, Waterways & Intermodal Tr...","Seafood, to assess a fee on seafood dealers de...","Seafood, to assess a fee on seafood dealers de..."
1,1886274,HB2,9dca43a78d8c1dcaa115a86572931b5c,https://legiscan.com/AL/bill/HB2/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,"Vaccines, parental consent for minor to receiv...","Vaccines, parental consent for minor to receiv..."
2,1886289,HB3,b9415781e0e4b077cf3fdafbf8789b88,https://legiscan.com/AL/bill/HB3/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,Crimes and offenses; conviction of illegal ali...,Crimes and offenses; conviction of illegal ali...
3,1886100,HB4,56ffa299090fcee18f61127ba19e5fb6,https://legiscan.com/AL/bill/HB4/2025,2025-02-04,1,2025-02-04,Pending House Judiciary,"Crimes and offenses, further provides for obsc...","Crimes and offenses, further provides for obsc..."
4,1886187,HB5,e12d32ead83183811b7ebc55c84a17ed,https://legiscan.com/AL/bill/HB5/2025,2025-02-04,1,2025-02-04,Pending House Ways and Means General Fund,Alabama State Law Enforcement Agency; salary a...,Alabama State Law Enforcement Agency; salary a...
...,...,...,...,...,...,...,...,...,...,...
64,1885910,SB7,86e88436358829044acdbb8e9da5b400,https://legiscan.com/AL/bill/SB7/2025,2025-02-04,1,2025-02-04,Pending Senate Judiciary,Elections; Alabama Voting Rights Act Commissio...,Elections; Alabama Voting Rights Act Commissio...
65,1886012,SB8,ca931228c107045ccdf0d5b28a9e2c2d,https://legiscan.com/AL/bill/SB8/2025,2025-02-04,1,2025-02-04,Pending Senate Education Policy,Teacher certification; American History and Ci...,Teacher certification; American History and Ci...
66,1886486,SB9,5fb0f8da770a152c25c402ffb58cb49a,https://legiscan.com/AL/bill/SB9/2025,2025-02-04,1,2025-02-04,Pending Senate Judiciary,"Alabama Athletic Commission, Attorney General ...","Alabama Athletic Commission, Attorney General ..."
67,1886487,SB10,ddc551aaf8551e8f8f20b026a6ce5e7e,https://legiscan.com/AL/bill/SB10/2025,2025-02-04,1,2025-02-04,Pending Senate Healthcare,"Alabama Clean Indoor Air Act, renamed Vivian D...","Alabama Clean Indoor Air Act, renamed Vivian D..."


In [22]:
params = {'key': legiscan_key,
         'op': 'getBill',
         'id': bill_ids[0]}
r = requests.get(root, params=params)
myjson = json.loads(r.text)

  'id': bill_ids[0]}


In [23]:
toreturn = {}
toreturn['bill_id'] = myjson['bill']['bill_id']
toreturn['session_id'] = myjson['bill']['session_id']
toreturn['title'] = myjson['bill']['title']
toreturn['description'] = myjson['bill']['description']
toreturn['textlink'] = myjson['bill']['texts'][0]['state_link']

{'status': 'OK',
 'bill': {'bill_id': 1886114,
  'change_hash': 'f57d9a6d3695793c45d5f3542e0a936e',
  'session_id': 2148,
  'session': {'session_id': 2148,
   'state_id': 1,
   'year_start': 2025,
   'year_end': 2025,
   'prefile': 1,
   'sine_die': 0,
   'prior': 0,
   'special': 0,
   'session_tag': 'Regular Session',
   'session_title': '2025 Regular Session',
   'session_name': '2025 Regular Session'},
  'url': 'https://legiscan.com/AL/bill/HB1/2025',
  'state_link': 'https://alison.legislature.state.al.us/bill-search',
  'completed': 0,
  'status': 1,
  'status_date': '2025-02-04',
  'progress': [{'date': '2025-02-04', 'event': 1},
   {'date': '2025-02-04', 'event': 9}],
  'state': 'AL',
  'state_id': 1,
  'bill_number': 'HB1',
  'bill_type': 'B',
  'bill_type_id': '1',
  'body': 'H',
  'body_id': 11,
  'current_body': 'H',
  'current_body_id': 11,
  'title': 'Seafood, to assess a fee on seafood dealers dealing in imported seafood; imported seafood safety fund created.',
  'descri

In [None]:
textlink = myjson['bill']['texts'][0]['state_link']

  'id': bill_ids[0]}


## How to find machine readable bill text without webscraping or pulling off a PDF if at all possible

In [20]:
import io
import requests
from PyPDF2 import PdfReader

response = requests.get(url=textlink, timeout=120)
on_fly_mem_obj = io.BytesIO(response.content)
pdf_file = PdfReader(on_fly_mem_obj)

In [21]:
print(pdf_file)

<PyPDF2._reader.PdfReader object at 0xffff69f563f0>


In [22]:
print(pdf_file.pages[3].extract_text())

HB1 INTRODUCED
Page 3place of business,  then  an additional  dealer's licenses must
license shall  be purchased for each separate place of
business, providing the location of each. A vehicle used
solely for transporting  seafoods  seafood  to or from an Alabama
seafood dealer is not considered a place of business. Each
vehicle from which seafood is sold to or purchased from any
person,  firm, or corporation  other than an Alabama seafood
dealer, is a place of business and shall be licensed under
this section.  The  A seafood dealer shall purchase a license
for each  such  vehicle for a fee of one hundred dollars ($100)
per license and the operator of the vehicle shall have the
original license in his or her possession when selling or
buying seafood from that vehicle. Seafood dealers may purchase
seafoods  seafood  only from commercial fishermen validly
licensed in Alabama, Alabama seafood dealers, and any
nonresident seller who is validly licensed to sell  seafoods
seafood  under the 

In [29]:
textlist = [x.extract_text() for x in pdf_file.pages]

In [37]:
fulltext = ' '.join(textlist)

In [55]:
def clean_text(text):
    # Remove page headers/footers and line numbers
    cleaned_text = re.sub(r'HB\d+ INTRODUCED|Page \d+|K\d+[A-Z]+-\d+|PFD:\s?\d{2}-[A-Za-z]{3}-\d{2,4}|RFD:.*|First Read:.*|ZAK.*\d{4}-\d{4}', '', text)
    
    # Remove multiple line breaks and redundant spaces
    cleaned_text = re.sub(r'\n+', '\n', cleaned_text)
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text)
    
    # Remove line numbers and extra spaces around punctuation
    cleaned_text = re.sub(r'\d+', '', cleaned_text)
    cleaned_text = re.sub(r'\s([.,;])', r'\1', cleaned_text)

    # Strip any leading or trailing spaces
    cleaned_text = cleaned_text.strip()

    # Remove section numbering like (a) and (b)
    cleaned_text = re.sub(r'\s*\([a-zA-Z]\)\s*', ' ', cleaned_text)

    # Remove dollar signs, legal sign, empty paranetheses
    cleaned_text = cleaned_text.replace('($)', '')
    cleaned_text = cleaned_text.replace('($,)', '')
    cleaned_text = cleaned_text.replace('§', '')
    cleaned_text = cleaned_text.replace('()', '')

    return cleaned_text

In [56]:
clean_text(fulltext)

'HB By Representative Brown      // SYNOPSIS: Under existing law, a seafood dealer must purchase a license to lawfully operate in this state. This bill would assess a fee on certain seafood dealer licensees to be deposited into the Imported Seafood Safety Fund. This bill would also create the Imported Seafood Safety Fund to be used by the Alabama Department of Public Health to inspect imported seafood products for substances that are harmful to humans. A BILL TO BE ENTITLED AN ACT Relating to seafood; to amend Section -- of the Code of Alabama, to assess a fee on certain seafood dealer licensees for deposit into the Imported Seafood Safety Fund; and to create the Imported Seafood Safety Fund for certain imported seafood related uses. BE IT ENACTED BY THE LEGISLATURE OF ALABAMA: Section. Section --, Code of Alabama, is amended to read as follows:                             amended to read as follows: "--  Any person, firm, or corporation who engages in the selling, brokering, trading, 