</center></div><div style = 'background-color:indigo'><center><h1 style='font-size: 50px; font-weight: bold; color:goldenrod; border-top: 3px solid goldenrod; padding-top: 10px'>AI Legislative Policy Analysis (CaLPA-AI)</h1><div style='font-size: 35px; font-weight: bold; color: goldenrod'> Part 1 - Preliminary Operations</div><div style='font-size: 30px; font-weight: bold; color: goldenrod; border-bottom: 3px solid goldenrod; padding-bottom: 20px'>v.1.1 (MIT License), Dr. Kostas Alexandridis, GISP, April 2025</div></center></div>

In [None]:
import notebookHeadings
from notebookHeadings import mdt
mdt(level = 0, prjPart = 1, prjComponent = "AI", prjVersion = "1.1")

This is the main notebook for the AI California Legislative Policy Analysis (CALPA) project. The goal of this project is to analyze California legislative bills using natural language processing (NLP) techniques. This notebook will cover the preliminary data processing steps, including data loading, cleaning, and preparation for analysis.
The project is divided into several parts, each focusing on a specific aspect of the analysis. The first part will cover the data loading and cleaning process, while subsequent parts will focus on feature extraction, model training, and evaluation.

<h1 style='font-weight:bold; color:orangered; border-bottom: 2px solid orangered'>1. Preliminaries</h1>

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>1.1. Referencing Libraries and Initialization</h2>

In [None]:
mdt(1, "1. Preliminaries")
mdt(2, "1.1. Referencing Libraries and Initialization")

If needed to reset the kernel, please run the following cell:

In [None]:
#%reset

Instantiating python libraries for the project

In [1]:
# Import required libraries
import os
from dotenv import load_dotenv
import time
from datetime import date
from datetime import datetime
import json
import mimetypes
import glob
import base64
import zipfile
import io
import requests
import pandas as pd

Load the local python modules containing classes and functions for the project from the local directory.
- `calpa`: This module contains the main classes and functions for the project, including the `LegiScan` class for the LegScan API.

In [2]:
# Load the calpa module located in the scripts/python/calpa directory
from calpa import calpa

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>1.2. Project and Workspace Variables</h2>

<h3 style='font-weight:bold; color:lime; padding-left: 50px'>Load Environment Variables</h3>

In [None]:
mdt(2, "1.2. Project and Workspace Variables")
mdt(3, "Load Environment Variables")

Define and maintain project, workspace and metadata. Below we load the environment variables from the `.env` file. The environment variables are used to configure the project and workspace settings. The `dotenv` library is used to load the environment variables from the `.env` file into the Python environment. The environment also contains the LegiScan API key, which is used to access the LegiScan API. The API key is stored in the `LEGISCAN_API_KEY` environment variable. The `dotenv` library is used to load the environment variables from the `.env` file into the Python environment.

In [3]:
# Load environment variables from .env file
load_dotenv()

True

<h3 style='font-weight:bold; color:lime; padding-left: 50px'>Main Class Instantiation</h3>

In [None]:
mdt(3, "Main Class Instantiation")

Instantiate the main class for the project:
- `legiscan`: This class is used to access the LegiScan API and retrieve legislative data.

In [4]:
# Instantiate the LegiScan class
legiscan = calpa.LegiScan()

Create project metadata for the project

In [5]:
# Create project metadata for the AI project
prjMetadata = calpa.projectMetadata(prjPart=1, prjComponent="AI", prjVersion="1.1")

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 AI Legislative Policy Analysis (CaLPA-AI)
 California Legislative Policy Analysis for Artificial Intelligence Related Bills
 Part 1 - Preliminary Operations
 Version 1.1 (MIT License), Dr. Kostas Alexandridis, GISP
 GitHub Repository: https://github.com/ktalexan/CaLPA
 Last Updated: Apr 29, 2025
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Dates: 2010-12-02 through 2025-04-29
Periods: 2009-2010, 2011-2012, 2013-2014, 2015-2016, 2017-2018, 2019-2020, 2021-2022, 2023-2024, 2025-2026


Create the project directories dictionary

In [6]:
# Create the project directories dictionary
prjDirs = calpa.projectDirectories(os.getcwd())

Directory Global Settings:

General:
- Project (pathPrj): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI
- Admin (pathAdmin): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\admin
- Metadata (pathMetadata): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\metadata
- Analysis (pathAnalysis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\analysis
- Obsidian Vault (pathObsidian): C:\Users\ktale\Knowledge Management\Policy and Governance\Legislation
Scripts:
- Python Calpa Module (pathScriptsCalpa): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\calpa
- Markdown Scripts (pathScriptsMd): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\markdown
- RIS Scripts (pathScriptsRis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\ris
Data:
- Main Data (pathData): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\data
- Documents (pathDataDocs): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\data\docs
- LegiScan (pathDataLegis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\d

<h3 style='font-weight:bold; color:lime; padding-left: 50px'>Lookup Data and Variables</h3>

In [None]:
mdt(3, "Lookup Data and Variables")

If needed, you can access the project lookup and dictionary variables. They are embedded in the `codebook` module of the `calpa` package. The codebook module contains the following variables:
1. **LegiScan API Call Dictionaries**: These are codebook dictionaries that map the definitions of the LegiScan API fields for a number of API calls. The dictionaries available are (alphabetically ordered):
   - `dictGetAmendment`: Contains all the fields returned by the LegiScan API for the `getAmendment` call.
   - `dictGetBill`: Contains all the fields returned by the LegiScan API for the `getBill` call.
   - `dictGetBillText`: Contains all the fields returned by the LegiScan API for the `getBillText` call.
   - `dictGetPerson`: Contains all the fields returned by the LegiScan API for the `getPerson` call.
   - `dictGetRollCall`: Contains all the fields returned by the LegiScan API for the `getRollCall` call.
   - `dictGetSessionList`: Contains all the fields returned by the LegiScan API for the `getSessionList` call.
   - `dictGetSupplement`: Contains all the fields returned by the LegiScan API for the `getSupplement` call.
2. **Lookup Variables**: These are lists containing the names or definitions of codes used inside the LegiScan API calls (usually IDs or codes). The lookup variables available are (alphabetically ordered):
   - `lookupBillCode`: Bill code definitions used in the LegiScan API.
   - `lookupBillTextType`: IDs and definitions of the bill types used in the LegiScan API.
   - `lookupBillType`: IDs and definitions of the bill types used in the LegiScan API.
   - `lookupBodyType`: Definitions of body types used in the LegiScan API.
   - `lookupEventType`: Definitions of event types used in the LegiScan API.
   - `lookupMimeType`: Definitions of mime types used in the LegiScan API.
   - `lookupPartyType`: Definitions of party types used in the LegiScan API.
   - `lookupProgressType`: Definitions of progress types used in the LegiScan API.
   - `lookupReasonType`: Definitions of reason types used in the LegiScan API.
   - `lookupRoleType`: Definitions of role types used in the LegiScan API.
   - `lookupSastType`: Definitions of SAST types used in the LegiScan API.
   - `lookupSponsorType`: Definitions of sponsor types used in the LegiScan API.
   - `lookupStateType`: Definitions of state types used in the LegiScan API (California only).
   - `lookupStatusType`: Definitions of status types used in the LegiScan API.
   - `lookupSupplementType`: Definitions of supplement types used in the LegiScan API.
   - `lookupVoteType`: Definitions of vote types used in the LegiScan API.

If you need to access the codebook variables, you can do so by running the following cell. The codebook variables are stored in the `codebook` module of the `calpa` package. Calling the variables (assuming that the `calpa` package is imported) will return the variables as a dictionary. You can access the variables by using the dictionary keys. For example, to access the `lookupBillCode` variable, you can use the following code:

>```python
># Load the calpa module located in the scripts/python/calpa directory
>from calpa import calpa
>
># Getting the lookup variables from the codebook module directly
>calpa.codebook.lookupBillCode
>
># Assigning the lookup variables to a stored in the session
>lookupBillCode = calpa.codebook.lookupBillCode
>```


In [7]:
# Codebook lookup variables
codebookLookupVars = [var for var in dir(calpa.codebook) if var.startswith('lookup')]
codebookDictVars = [var for var in dir(calpa.codebook) if var.startswith('dict')]
print(f"Codebook Lookup Variables:\n{codebookLookupVars}\n")
print(f"Codebook Dictionary Variables:\n{codebookDictVars}\n")

Codebook Lookup Variables:
['lookupBillCode', 'lookupBillTextType', 'lookupBillType', 'lookupBodyType', 'lookupEventType', 'lookupMimeType', 'lookupPartyType', 'lookupProgressType', 'lookupReasonType', 'lookupRoleType', 'lookupSastType', 'lookupSponsorType', 'lookupStateType', 'lookupStatusType', 'lookupSupplementType', 'lookupVoteType']

Codebook Dictionary Variables:
['dictGetAmendment', 'dictGetBill', 'dictGetBillText', 'dictGetPerson', 'dictGetRollCall', 'dictGetSessionList', 'dictGetSupplement']



<h1 style='font-weight:bold; color:orangered; border-bottom: 2px solid orangered'>2. Baseline LegiScan Data</h1>

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>2.1. Session List</h2>

In [None]:
mdt(1, "2. Baseline LegiScan Data")
mdt(2, "2.1. Session List")

Using the LegiScan API, we will retrieve the list of sessions for California. This will be used to get the session ID for the current session and the previous session. The session ID is needed to retrieve the bills for each session.

In [8]:
# Get the list of sessions from LegiScan
sessionList = legiscan.getSessionList()

Convert the session list to a pandas dataframe

In [9]:
# Convert the sessionList to a pandas DataFrame
sessionDf = pd.DataFrame(sessionList)
sessionDf.head()

Unnamed: 0,2009-2010,2011-2012,2013-2014,2015-2016,2017-2018,2019-2020,2021-2022,2023-2024,2025-2026
session_id,30,82,993,1120,1400,1624,1791,2016,2172
state_id,5,5,5,5,5,5,5,5,5
state_abbr,CA,CA,CA,CA,CA,CA,CA,CA,CA
year_start,2009,2011,2013,2015,2017,2019,2021,2023,2025
year_end,2010,2012,2014,2016,2018,2020,2022,2024,2026


We need to compare the session list we obtained from the Legiscan API with the previous session list (stored in the disk under `data/lookup/sessionList.json`). Here, we open the stored session list into a new dictionary called `sessionListStored`.

In [10]:
# Obtain the stored sessions list from JSON dictionary on disk (data/lookup directory)
sessionListStored = legiscan.getStoredData(dataType = "session")

Now that we have both the session lists (the one from the legiscan api, `sessionList`, and the stored version, `sessionListStored`), we can compare them. We will check if the session list from the LegiScan API is the same as the session list stored in the disk. If they are not the same, we will first identify which sessions need updating, and will later update the stored session list with the new session list from the LegiScan API. We will also check if there are any new sessions that have been added to the LegiScan API since the last time we retrieved the session list.

The function method `matchHash` from the legiscan module class, uses the hash values to compare the two lists. In this case the relevant JSON keys are `sesion_hash` for each `session_id`.

In [11]:
# Compare the sessionList and sessionListStored dictionaries for any changes
unmatchedSessions = legiscan.matchHash(sessionList, sessionListStored, hashType = "session", silent = True)

# if the unmatchedSessions is empty, print "All sessions match", and delete the unmatchedSessions variable
if unmatchedSessions is None:
    print("All sessions match")
    del unmatchedSessions
else:
    print("Unmatched sessions found")
    # Print the unmatched sessions
    print(unmatchedSessions)

All sessions match


Export the LegiScan query records to the `data/legis/json` directory as a JSON file for future reference.

In [12]:
# Export the sessionList to a JSON file in the data/legiscan/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "sessionList.json"), "w", encoding="utf-8") as f:
    json.dump(sessionList, f, ensure_ascii=False, indent=4)
del f

If needed update the stored session list with the new session list from the LegiScan API.

In [13]:
# Update the stored sessions list with the new sessionList
with open(os.path.join(prjDirs["pathDataLookup"], "sessionListStored.json"), "w", encoding="utf-8") as f:
    json.dump(sessionList, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>2.2. Session People</h2>

In [None]:
mdt(2, "2.2. Session People")

In this step, we will obtain the list of California legislature members (Senate and Assembly) for each of the legislative sessions. This will be used to get the list of members for each session. The session ID is needed to retrieve the members for each session. We will use the LegiScan API to retrieve the list of members for each session.

The `legiscan.getSessionPeople` method retrieves the list of members for each session. The session ID is passed as an argument to the method. The method returns a list of members for each session. The list of members is stored in a dictionary called `sessionPeople`. The dictionary contains the session ID as the key and the list of members as the value.

In [14]:
# Get the list of session people from LegiScan
sessionPeople = {}
for key, value in sessionList.items():
    sessionId = value["session_id"]
    sessionPeople[key] = legiscan.getSessionPeople(sessionId)
del key, value, sessionId

Similarly with the legislative session list, we will compare the session people list we obtained from the LegiScan API with the previous session people list (stored in the disk under `data/lookup/sessionPeople.json`). Here, we open the stored session people list into a new dictionary called `sessionPeopleStored`.

In [15]:
# Obtain the stored session People list from JSON dictionary on disk (data/lookup directory)
sessionPeopleStored = legiscan.getStoredData(dataType = "people")

This time, the task is not that simple, since `sessionPeople` lists are nested for each session. The comparison of the Legislature members needs to be done in a loop for each session. In the following code segment, we perform this task in sequential steps:

1. Create a dictionary named `unmatchedPeople` to hold the unmatched session people (will be nested for each session).
2. Loop through the `sessionPeople` and `sessionPeopleStored` dictionaries to compare the session people lists for each session.
3. For each session, compare the session people lists and store the unmatched session people in the `unmatchedSessionPeople` dictionary, based on the `person_hash` key attribute on both lists. We will use the `matchHash` method from the `legiscan` module to compare the two lists.
4. If there are any unmatched session people, we will update the `unmatchedPeople` dictionary with the unmatched session people (for each session).
5. Finally, we will check if there are any unmatched session people in the `unmatchedPeople` dictionary. If there are, we will update the `sessionPeopleStored` dictionary with the unmatched session people and save it to the disk. 

In [16]:
# Compare the sessionPeople and sessionPeopleStored dictionaries for any changes
# Create a dictionary to store unmatched people
unmatchedPeople = {}
# Iterate through each session and compare the people lists
for key, value in sessionPeople.items():
    unmatchedPeople[key] = {}
    unmatched = legiscan.matchHash(sessionPeople[key]["people"], sessionPeopleStored[key]["people"], hashType = "person", silent = True)
    # If there are unmatched people, store them in the unmatchedPeople dictionary
    unmatchedPeople[key] = unmatched if unmatched is not None else None
del key, value, unmatched

# if the unmatchedPeople is empty, print "All people match", and delete the unmatchedPeople variable
if all(not value for value in unmatchedPeople.values()):
    print("All people match")
    # Delete the unmatchedPeople variable
    del unmatchedPeople
else:
    print("Unmatched people found")
    # Print the unmatched sessions
    print(unmatchedPeople)

All people match


Export the LegiScan query data for the session people to the `data/legis/json' directory for future reference.

In [17]:
# Export the sessionPeople to a JSON file in the data/legiscan/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "sessionPeople.json"), "w", encoding="utf-8") as f:
    json.dump(sessionPeople, f, ensure_ascii=False, indent=4)
del f

If needed update the stored session people list with the new session people list from the LegiScan API.

In [18]:
# Update the stored session People list with the new sessionPeople
with open(os.path.join(prjDirs["pathDataLookup"], "sessionPeopleStored.json"), "w", encoding="utf-8") as f:
    json.dump(sessionPeople, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>2.3. Dataset List</h2>

In [None]:
mdt(2, "2.3. Dataset List")

In this session we will obtain the list of datasets and their attributes for each of the California Legislative Session from LegiScan. This process is needed to obtain the dataset `access_key` for each session, and consequently to use it in query dataset data in a further step.

The `legiscan.getDatasetList` method retrieves the list of datasets for each session. The session ID is passed as an argument to the method. The method returns a list of datasets for each session. The list of datasets is stored in a dictionary called `datasetList`. The dictionary contains the session ID as the key and the list of datasets as the value.

In [19]:
# Get the list of datasets from LegiScan for each legislative session
datasetList = legiscan.getDatasetList()

Obtain the stored dataset list from the disk. The list will be used to compare with the LegiScan API dataset list. The stored dataset list is stored in the `data/lookup/datasetList.json` file. The dataset list contains the list of datasets for each session. The dataset ID is needed to retrieve the datasets for each session. We will use the LegiScan API to retrieve the list of datasets for each session. Here, we open the stored dataset list into a new dictionary called `datasetListStored`.

In [20]:
# Obtain the stored dataset list from JSON dictionary on disk (data/lookup directory)
datasetListStored = legiscan.getStoredData(dataType = "dataset")

Now that we have both the dataset lists (the one from the legiscan api, `datasetList`, and the stored version, `datasetListStored`), we can compare them. We will check if the dataset list from the LegiScan API is the same as the dataset list stored in the disk. If they are not the same, we will first identify which datasets need updating, and will later update the stored dataset list with the new dataset list from the LegiScan API. We will also check if there are any new datasets that have been added to the LegiScan API since the last time we retrieved the session list.

The function method `matchHash` from the legiscan module class, uses the hash values to compare the two lists. In this case the relevant JSON keys are `dataset_hash` for each `session_id`.

In [21]:
# Compare the datasetList and datasetListStored dictionaries for any changes
unmatchedDatasets = legiscan.matchHash(datasetList, datasetListStored, hashType = "dataset", silent = True)

# if the unmatchedSessions is empty, print "All sessions match", and delete the unmatchedSessions variable
if unmatchedDatasets is None:
    print("All datasets match")
    del unmatchedDatasets
else:
    print("Unmatched datasets found")
    # Print the unmatched sessions
    print(unmatchedDatasets)

All datasets match


Export the LegiScan query records to the `data/legis/json` directory as a JSON file for future reference.

In [22]:
# export the datasetList to a JSON file in the data/legis/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "datasetList.json"), "w", encoding="utf-8") as f:
    json.dump(datasetList, f, ensure_ascii=False, indent=4)
del f

Update the stored dataset list with the new dataset list from the LegiScan API.

In [23]:
# Update the stored dataset list with the new datasetList
with open(os.path.join(prjDirs["pathDataLookup"], "datasetListStored.json"), "w", encoding="utf-8") as f:
    json.dump(datasetList, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>2.4. Master List</h2>

In [None]:
mdt(2, "2.4. Master List")

This step is to obtain the master list of datasets for each session. The master list contains the list of bills for each legislative session. The master list is used to get the list of bills for each session. The session ID is needed to retrieve the bills for each session. We will use the LegiScan API to retrieve the list of bills for each session.

There are two options for this method. The first obtains the master list with bill attributes (when `raw = False`), and the second obtains the raw master list containing only the bill_ID and hash (when `raw = True`)

We will use the `legiscan.getMasterList(sessionID, raw)` method and will store the results in a dictionary called `masterList` or `masterListRaw` depending on the option provided in the method invocation.

In [24]:
# Get the Raw Master List from LegiScan for each legislative session
masterListRaw = {}
for key, value in sessionList.items():
    sessionId = value["session_id"]
    masterListRaw[key] = legiscan.getMasterList(sessionId, raw = True)
del key, value, sessionId

In [25]:
# Get the Master List from LegiScan for each legislative session
masterList = {}
for key, value in sessionList.items():
    sessionId = value["session_id"]
    masterList[key] = legiscan.getMasterList(sessionId, raw = False)
del key, value, sessionId

Obtain the stored master lists (both the raw and full) from the disk. The lists will be used to compare each bill with the LegiScan API master list. The stored dataset list is stored in the `data/lookup/mastertList.json` or `data/lookup/masterListRaw.json` files. The master lists contain the list of bills for each session. The dataset ID is needed to retrieve the bills for each session. We will use the LegiScan API to retrieve the list of datasets for each session. Here, we open the stored dataset list into a new dictionary called `masterListStored` and `masterListRawStored`.

In [26]:
# Get the stored raw master list from JSON dictionary on disk (data/lookup directory)
masterListRawStored = legiscan.getStoredData(dataType = "master", raw = True)
# Get the stored master list from JSON dictionary on disk (data/lookup directory)
masterListStored = legiscan.getStoredData(dataType = "master", raw = False)

Now that we have both the master lists (the one from the legiscan api, `masterListRaw`, and the stored version, `masterListRawStored`), we can compare them. We will check if the master list from the LegiScan API is the same as the master list stored in the disk. If they are not the same, we will first identify which bills need updating, and will later update the stored master list with the new master list from the LegiScan API. We will also check if there are any new bills that have been added to the LegiScan API since the last time we retrieved the session list.

The function method `matchHash` from the legiscan module class, uses the hash values to compare the two lists. In this case the relevant JSON keys are `chanbe_hash` for each `session_id`.

In [27]:
# Compare the masterList and masterListStored dictionaries for any changes
# Create a dictionary to store unmatched bills
unmatchedMasterList = {}
# Iterate through each session and compare the bills lists
for key, value in masterList.items():
    unmatchedMasterList[key] = {}
    unmatched = legiscan.matchHash(masterList[key]["bills"], masterListStored[key]["bills"], hashType = "change", silent = True)
    # If there are unmatched bills, store them in the unmatchedMasterList dictionary
    unmatchedMasterList[key] = unmatched if unmatched is not None else None
del key, value, unmatched

if all(not value for value in unmatchedMasterList.values()):
    print("All master lists match")
    # Delete the unmatchedMasterList variable
    del unmatchedMasterList
else:
    print("Unmatched master lists found")
    # Print the unmatched sessions
    print(unmatchedMasterList)

All master lists match


In [28]:
# Compare the masterList and masterListStored dictionaries for any changes
# Create a dictionary to store unmatched bills
unmatchedMasterListRaw = {}
# Iterate through each session and compare the bills lists
for key, value in masterListRaw.items():
    unmatchedMasterListRaw[key] = {}
    unmatched = legiscan.matchHash(masterListRaw[key]["bills"], masterListRawStored[key]["bills"], hashType = "change", silent = True)
    # If there are unmatched bills, store them in the unmatchedMasterList dictionary
    unmatchedMasterListRaw[key] = unmatched if unmatched is not None else None
del key, value, unmatched

if all(not value for value in unmatchedMasterListRaw.values()):
    print("All master lists match")
    # Delete the unmatchedMasterList variable
    del unmatchedMasterListRaw
else:
    print("Unmatched master lists found")
    # Print the unmatched sessions
    print(unmatchedMasterListRaw)

All master lists match


Export both the LegiScan query records (raw and full master list) to the `data/legis/json` directory as a JSON file for future reference.

In [29]:
# export the raw master list to a JSON file in the data/legis/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "masterListRaw.json"), "w", encoding="utf-8") as f:
    json.dump(masterListRaw, f, ensure_ascii=False, indent=4)
del f

In [30]:
# export the master list to a JSON file in the data/legis/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "masterList.json"), "w", encoding="utf-8") as f:
    json.dump(masterList, f, ensure_ascii=False, indent=4)
del f

If needed update the stored session people list with the new session people list from the LegiScan API.

In [31]:
# Update the stored raw master list with the new masterListRaw
with open(os.path.join(prjDirs["pathDataLookup"], "masterListRawStored.json"), "w", encoding="utf-8") as f:
    json.dump(masterListRaw, f, ensure_ascii=False, indent=4)
del f

In [32]:
# Update the master list with the new masterList
with open(os.path.join(prjDirs["pathDataLookup"], "masterListStored.json"), "w", encoding="utf-8") as f:
    json.dump(masterList, f, ensure_ascii=False, indent=4)
del f

<h1 style='font-weight:bold; color:orangered; border-bottom: 2px solid orangered'>3. Bill Monitoring Operations</h1>

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>3.1. LegiScan AI Search Query</h2>

In [None]:
mdt(1, "3. Bill Monitoring Operations")
mdt(2, "3.1. LegiScan AI Search Query")

First, and foremost, we need to update the list of bills to be monitored, based on the LegiScan API search query. The search query is used to retrieve the list of bills that match the term `artificial intelligence` within a given relevance threshold for each legislative session.

We will use the `legiscan.aiSearchQuery` method to retrieve the list of bills that match the search query. The search query is encoded as an internal argument to the method. The method returns a list of bills that match the search query. The list of bills is stored in a dictionary called `aiBillList`.

In [33]:
# Create a dictionary to store the AI search results
aiBillList = {}
billCount = 0
# Loop through each session in the sessionList
for myPeriod, mySession in sessionList.items():
    # Get the session ID for the current session
    mySessionId = mySession["session_id"]
    # Run the AI search query for the current session
    aiQuery = legiscan.aiSearchQuery(sessionId = mySessionId, threshold = 75)
    # and add it to the aiBillList dictionary
    if aiQuery != {}:
        print(f"{myPeriod}\n")
        aiBillList[myPeriod] = aiQuery
        billCount += len(aiBillList[myPeriod])
print(f"Total number of bills added: {billCount}")
del myPeriod, mySession, mySessionId, billCount, aiQuery

# reorder the aiBillList dictionary by key in ascending order
aiBillList = dict(sorted(aiBillList.items(), key=lambda item: item[0], reverse=False))

SB836: Bill 577638 added to monitor list with stance watch
SB860: Bill 581712 added to monitor list with stance watch
AB1465: Bill 581806 added to monitor list with stance watch
3 bills
2013-2014

ACR215: Bill 1111231 added to monitor list with stance watch
AB2662: Bill 1090551 added to monitor list with stance watch
SB1470: Bill 1092270 added to monitor list with stance watch
3 bills
2017-2018

AB594: Bill 1205261 added to monitor list with stance watch
AB1576: Bill 1216111 added to monitor list with stance watch
SB348: Bill 1210745 added to monitor list with stance watch
SJR6: Bill 1237237 added to monitor list with stance watch
AB459: Bill 1199608 added to monitor list with stance watch
AB976: Bill 1214383 added to monitor list with stance watch
SB444: Bill 1214166 added to monitor list with stance watch
SB730: Bill 1215535 added to monitor list with stance watch
AB156: Bill 1140154 added to monitor list with stance watch
AB485: Bill 1200933 added to monitor list with stance watch
S

Export the LegiScan query records to the `data/legis/json` directory as a JSON file for future reference.

In [34]:
# Export the aiBillList to a JSON file in the data/legis/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "aiBillList.json"), "w", encoding="utf-8") as f:
    json.dump(aiBillList, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>3.2. Stored Monitor Bill List</h2>

In [None]:
mdt(2, "3.2. Stored Monitor Bill List")

Get the stored monitoring list from the disk

In [40]:
# Get the stored AI bill list from JSON dictionary on disk (data/lookup directory)
aiBillListStored = legiscan.getStoredData(dataType = "bills", project = "AI")

Now that we have both the AI bill lists (the one from the legiscan api, `aiBillList`, and the stored version, `aiBillListStored`), we can compare them. We will check if the AI bill list from the LegiScan API is the same as the AI bill list stored in the disk. If they are not the same, we will first identify which bills need updating, and will later update the stored AI bill list with the new bill list from the LegiScan API. We will also check if there are any new bills that have been added to the LegiScan API since the last time we retrieved the session list.

The function method `matchHash` from the legiscan module class, uses the hash values to compare the two lists. In this case the relevant JSON keys are `change_hash` for each `session_id`.

In [41]:
# Create a dictionary to store unmatched AI bills
unmatchedAiBillList = {}
# Loop through each session in the aiBillList and compare it with the stored aiBillList
for myPeriod, mySession in aiBillList.items():
    sessionUnmatched = legiscan.matchHash(mySession, aiBillListStored[myPeriod], hashType = "change", silent = True)
    # If there are unmatched AI bills, store them in the unmatchedAiBillList dictionary
    if sessionUnmatched is not None:
        # If the sessionUnmatched is not empty, add it to the unmatchedAiBillList dictionary
        unmatchedAiBillList[myPeriod] = sessionUnmatched
        print(f"Unmatched AI bills found for {myPeriod}")
        print(sessionUnmatched)
    else:
        # If the sessionUnmatched is empty, print "All AI bills match for {myPeriod}"
        print(f"All AI bills match for {myPeriod}")
del myPeriod, mySession, sessionUnmatched

# if the unmatchedAiBillList is empty, print "All bills match", and delete the unmatchedAiBillList variable. Otherwise, print "Unmatched bills found" and print the unmatched sessions.
if unmatchedAiBillList == {}:
    print("All bills match")
    del unmatchedAiBillList
else:
    print("Unmatched bills found")
    # Print the unmatched sessions
    print(unmatchedAiBillList)

All AI bills match for 2013-2014
All AI bills match for 2017-2018
All AI bills match for 2019-2020
All AI bills match for 2021-2022
All AI bills match for 2023-2024
All AI bills match for 2025-2026
All bills match


If needed, update the stored LegiScan AI search query list with the new LegiScan AI search query list from the LegiScan API.

In [37]:
# Update the stored aiBillList with the new aiBillList
with open(os.path.join(prjDirs["pathDataLookup"], "aiBillListStored.json"), "w", encoding="utf-8") as f:
    json.dump(aiBillList, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>3.3. Get LegiScan Bills</h2>

In [None]:
mdt(2, "3.3. Get LegiScan Bills")

Get the LegiScan bills for each session. The session ID is needed to retrieve the bills for each session. We will use the LegiScan API to retrieve the list of bills for each session.

In [43]:
# Define a dictionary to store AI bills
aiBills = {}
j = 0
# Iterate through the AI bill list and fetch the bill details from LegiScan
for myPeriod, myBillList in aiBillListStored.items():
    # Set the key to the legislative session period
    aiBills[myPeriod] = {}
    i = 1
    l = len(myBillList)
    print(f"\n{myPeriod} Legislative Session ({l} bills)")
    # Iterate through the bills for the current legislative session
    for myBillNumber, myBill in myBillList.items():
        # Get the bill ID
        myBillId = myBill["bill_id"]
        if i < l:
            print(f"{myBillNumber} ({myBillId})({i}/{l})", end = ", ")
        else:
            print(f"{myBillNumber} ({myBillId})({i}/{l})")
        # add the legiscan query to the aiBills dictionary
        aiBills[myPeriod][myBillNumber] = legiscan.getBill(billId = myBillId)
        i += 1
    j += i - 1
print(f"\nCompleted fetching AI bills from LegiScan (Total: {j} bills)")
del myPeriod, myBillList, myBillNumber, myBillId, myBill, i, l, j


2013-2014 Legislative Session (3 bills)
SB836 (577638)(1/3), SB860 (581712)(2/3), AB1465 (581806)(3/3)

2017-2018 Legislative Session (3 bills)
ACR215 (1111231)(1/3), AB2662 (1090551)(2/3), SB1470 (1092270)(3/3)

2019-2020 Legislative Session (16 bills)
AB594 (1205261)(1/16), AB1576 (1216111)(2/16), SB348 (1210745)(3/16), SJR6 (1237237)(4/16), AB459 (1199608)(5/16), AB976 (1214383)(6/16), SB444 (1214166)(7/16), SB730 (1215535)(8/16), AB156 (1140154)(9/16), AB485 (1200933)(10/16), SCR13 (1205432)(11/16), ACR125 (1272951)(12/16), AB3317 (1347660)(13/16), SB752 (1215850)(14/16), AB2269 (1341577)(15/16), AB3339 (1347682)(16/16)

2021-2022 Legislative Session (11 bills)
SB1216 (1593803)(1/11), AB587 (1450081)(2/11), AB13 (1385509)(3/11), AB1545 (1459096)(4/11), SB54 (1385430)(5/11), AB1400 (1458951)(6/11), SB1018 (1590455)(7/11), SR11 (1453532)(8/11), AB2826 (1594657)(9/11), AB2224 (1592488)(10/11), AB1651 (1559219)(11/11)

2023-2024 Legislative Session (71 bills)
SB1288 (1848589)(1/71), S

Export the LegiScan query records to the `data/legis/json` directory as a JSON file for future reference.

In [44]:
# Export the AI bills to a JSON file in the data/legiscan/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "aiBills.json"), "w", encoding="utf-8") as f:
    json.dump(aiBills, f, ensure_ascii=False, indent=4)
del f

<h2 style='font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px'>3.4. Get Bill Text</h2>

In [None]:
mdt(2, "3.4. Get Bill Text")

Get the LegiScan bill text for each session. This process involves two legiscan functions:

- `legiscan.getBillText`: This function retrieves the bill text for each bill in the session. The document ID is passed as an argument to the function. The function runs the LegiScan API call, and returns the bill text JSON information for each bill (which includes the base64 encoded bill text).
- `legiscan.summarizeBillText`: This function summarizes the bill text for each bill in the session. The bill JSON information is passed as an argument to this function. The function performs a number of tasks:
    - Looks up the `texts` JSON object group of the bill, and finds the last bill text version (is the last one in the list, with the latest bill date), and retrieves the `doc_id` identifier of the bill text. 
    - Uses the `legiscan.getBillText` function above to retrieve the encoded (base64) bill text. It then proceeds to decode the encoded bill text, and converts it to a string.
    - Uses an `Azure OpenAI` API call to create a TL;DR summary of the bill text, along with a list of keywords (tags) for the bill text. 
    - Finally, it constructs and returns a dictionary with the bill number, summarized bill text, the tags, and the bill text itself.
 

We will execute the `legiscan.createBillTextSummary` function (which in turns runs the `legiscan.summarizeBillText` function) for each bill in each legislative session, and generates a new dictionary obtaining the resulting data.

In [46]:
aiBillsSummaries = legiscan.createBillTextSummary(aiBills)


2013-2014 Legislative Session (3 bills)
SB836 (1/3), SB860 (2/3), AB1465 (3/3)

2017-2018 Legislative Session (3 bills)
ACR215 (1/3), AB2662 (2/3), SB1470 (3/3)

2019-2020 Legislative Session (16 bills)
AB594 (1/16), AB1576 (2/16), SB348 (3/16), SJR6 (4/16), AB459 (5/16), AB976 (6/16), SB444 (7/16), SB730 (8/16), AB156 (9/16), AB485 (10/16), SCR13 (11/16), ACR125 (12/16), AB3317 (13/16), SB752 (14/16), AB2269 (15/16), AB3339 (16/16)

2021-2022 Legislative Session (11 bills)
SB1216 (1/11), AB587 (2/11), AB13 (3/11), AB1545 (4/11), SB54 (5/11), AB1400 (6/11), SB1018 (7/11), SR11 (8/11), AB2826 (9/11), AB2224 (10/11), AB1651 (11/11)

2023-2024 Legislative Session (71 bills)
SB1288 (1/71), SB896 (2/71), AB2013 (3/71), SB1047 (4/71), AB2652 (5/71), AB3030 (6/71), AB2355 (7/71), SB893 (8/71), SB1120 (9/71), AB1831 (10/71), AB2885 (11/71), SB892 (12/71), SB1235 (13/71), SB942 (14/71), SB970 (15/71), SB721 (16/71), AB2876 (17/71), SB933 (18/71), AB331 (19/71), AB2811 (20/71), SB313 (21/71), A

Export the LegiScan query records to the `data/legis/json` directory as a JSON file for future reference.

In [47]:
# Export the AI bills summaries to a JSON file in the data/legiscan/json directory
with open(os.path.join(prjDirs["pathDataLegis"], "json", "aiBillsSummaries.json"), "w", encoding="utf-8") as f:
    json.dump(aiBillsSummaries, f, ensure_ascii=False, indent=4)
del f

In [48]:
# Export the AI bill summaries to a JSON file in the data/lookup directory for reference
with open(os.path.join(prjDirs["pathDataLookup"], "aiBillsSummariesStored.json"), "w", encoding="utf-8") as f:
    json.dump(aiBillsSummaries, f, ensure_ascii=False, indent=4)
del f

<div style = 'background-color:indigo'><center><h1 style='font-weight:bold; color:goldenrod; border-top: 2px solid goldenrod; border-bottom: 2px solid goldenrod; padding-top: 5px; padding-bottom: 10px'>End of Script</h1></center></div>

In [None]:
mdt(5)