</center></div><div style = 'background-color:indigo'><center><h1 style='font-size: 50px; font-weight: bold; color:goldenrod; border-top: 3px solid goldenrod; padding-top: 10px'>AI Legislative Policy Analysis (CaLPA-AI)</h1><div style='font-size: 35px; font-weight: bold; color: goldenrod'> Part 2 - Markdown Documents Analysis</div><div style='font-size: 30px; font-weight: bold; color: goldenrod; border-bottom: 3px solid goldenrod; padding-bottom: 20px'>v.1.1 (MIT License), Dr. Kostas Alexandridis, GISP, April 2025</div></center></div>

In [None]:
import notebookHeadings
from notebookHeadings import mdt
mdt(level = 0, prjPart = 2, prjComponent = "AI", prjVersion = "1.1")

This is the main notebook for the AI California Legislative Policy Analysis (CALPA) project. The goal of this project is to analyze California legislative bills using natural language processing (NLP) techniques. This notebook will cover the preliminary data processing steps, including data loading, cleaning, and preparation for analysis.
The project is divided into several parts, each focusing on a specific aspect of the analysis. The first part will cover the data loading and cleaning process, while subsequent parts will focus on feature extraction, model training, and evaluation.

<h1 style="font-weight:bold; color:orangered; border-bottom: 2px solid orangered">1. Preliminaries</h1>

In [None]:
mdt(1, "1. Preliminaries")

<h2 style="font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px">1.1 Referencing Libraries and Initialization</h2>

In [None]:
mdt(2, "1.1. Referencing Libraries and Initialization")

If needed to reset the kernel, please run the following cell:

In [None]:
#%reset

Instantiating python libraries for the project

In [1]:
# Import required libraries
import os
from dotenv import load_dotenv
import time
from datetime import date
from datetime import datetime
import json
import mimetypes
import glob
import base64
import zipfile
import io
import requests
import pandas as pd

Load the local python modules containing classes and functions for the project from the local directory.
- `calpa`: This module contains the main classes and functions for the project, including the `LegiScan` class for the LegScan API.

In [2]:
# Load the calpa module located in the scripts/python/calpa directory
from calpa import calpa

<h2 style="font-weight:bold; color:dodgerblue; border-bottom: 1px solid dodgerblue; padding-left: 25px">1.2. Project and Workspace Variables</h2>

<h3 style="font-weight:bold; color:lime; padding-left: 50px">Load Environment Variables</h3>

In [None]:
mdt(2, "1.2., Project and Workspace Variables")
mdt(3, "Load Environment Variables")

Define and maintain project, workspace and metadata. Below we load the environment variables from the `.env` file. The environment variables are used to configure the project and workspace settings. The `dotenv` library is used to load the environment variables from the `.env` file into the Python environment. The environment also contains the LegiScan API key, which is used to access the LegiScan API. The API key is stored in the `LEGISCAN_API_KEY` environment variable. The `dotenv` library is used to load the environment variables from the `.env` file into the Python environment.

In [3]:
# Load environment variables from .env file
load_dotenv()

True

<h3 style="font-weight:bold; color:lime; padding-left: 50px">Main Class Instantiation</h3>

In [None]:
mdt(3, "Main Class Instantiation")

Instantiate the main class for the project:
- `legiscan`: This class is used to access the LegiScan API and retrieve legislative data.

In [4]:
# Instantiate the LegiScan class
legiscan = calpa.LegiScan()

Create project metadata for the project

In [5]:
# Create project metadata for the AI project
prjMetadata = calpa.projectMetadata(prjPart=2, prjComponent="AI", prjVersion="1.1")

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 AI Legislative Policy Analysis (CaLPA-AI)
 California Legislative Policy Analysis for Artificial Intelligence Related Bills
 Part 2 - Markdown Documents Analysis
 Version 1.1 (MIT License), Dr. Kostas Alexandridis, GISP
 GitHub Repository: https://github.com/ktalexan/CaLPA
 Last Updated: Apr 29, 2025
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Dates: 2010-12-02 through 2025-04-29
Periods: 2009-2010, 2011-2012, 2013-2014, 2015-2016, 2017-2018, 2019-2020, 2021-2022, 2023-2024, 2025-2026


Create the project directories dictionary

In [6]:
# Create the project directories dictionary
prjDirs = calpa.projectDirectories(os.getcwd())

Directory Global Settings:

General:
- Project (pathPrj): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI
- Admin (pathAdmin): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\admin
- Metadata (pathMetadata): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\metadata
- Analysis (pathAnalysis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\analysis
- Obsidian Vault (pathObsidian): C:\Users\ktale\Knowledge Management\Policy and Governance\Legislation
Scripts:
- Python Calpa Module (pathScriptsCalpa): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\calpa
- Markdown Scripts (pathScriptsMd): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\markdown
- RIS Scripts (pathScriptsRis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\ris
Data:
- Main Data (pathData): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\data
- Documents (pathDataDocs): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\data\docs
- LegiScan (pathDataLegis): c:\Users\ktale\OneDrive\Documents\GitHub\CaLPA-AI\d

<h3 style="font-weight:bold; color:lime; padding-left: 50px">Load Lookup DataFrames</h3>

In [None]:
mdt(3, "Load Lookup DataFrames")

If needed, you can access the project lookup and dictionary variables. They are embedded in the `codebook` module of the `calpa` package. The codebook module contains the following variables:
1. **LegiScan API Call Dictionaries**: These are codebook dictionaries that map the definitions of the LegiScan API fields for a number of API calls. The dictionaries available are (alphabetically ordered):
   - `dictGetAmendment`: Contains all the fields returned by the LegiScan API for the `getAmendment` call.
   - `dictGetBill`: Contains all the fields returned by the LegiScan API for the `getBill` call.
   - `dictGetBillText`: Contains all the fields returned by the LegiScan API for the `getBillText` call.
   - `dictGetPerson`: Contains all the fields returned by the LegiScan API for the `getPerson` call.
   - `dictGetRollCall`: Contains all the fields returned by the LegiScan API for the `getRollCall` call.
   - `dictGetSessionList`: Contains all the fields returned by the LegiScan API for the `getSessionList` call.
   - `dictGetSupplement`: Contains all the fields returned by the LegiScan API for the `getSupplement` call.
2. **Lookup Variables**: These are lists containing the names or definitions of codes used inside the LegiScan API calls (usually IDs or codes). The lookup variables available are (alphabetically ordered):
   - `lookupBillCode`: Bill code definitions used in the LegiScan API.
   - `lookupBillTextType`: IDs and definitions of the bill types used in the LegiScan API.
   - `lookupBillType`: IDs and definitions of the bill types used in the LegiScan API.
   - `lookupBodyType`: Definitions of body types used in the LegiScan API.
   - `lookupEventType`: Definitions of event types used in the LegiScan API.
   - `lookupMimeType`: Definitions of mime types used in the LegiScan API.
   - `lookupPartyType`: Definitions of party types used in the LegiScan API.
   - `lookupProgressType`: Definitions of progress types used in the LegiScan API.
   - `lookupReasonType`: Definitions of reason types used in the LegiScan API.
   - `lookupRoleType`: Definitions of role types used in the LegiScan API.
   - `lookupSastType`: Definitions of SAST types used in the LegiScan API.
   - `lookupSponsorType`: Definitions of sponsor types used in the LegiScan API.
   - `lookupStateType`: Definitions of state types used in the LegiScan API (California only).
   - `lookupStatusType`: Definitions of status types used in the LegiScan API.
   - `lookupSupplementType`: Definitions of supplement types used in the LegiScan API.
   - `lookupVoteType`: Definitions of vote types used in the LegiScan API.

If you need to access the codebook variables, you can do so by running the following cell. The codebook variables are stored in the `codebook` module of the `calpa` package. Calling the variables (assuming that the `calpa` package is imported) will return the variables as a dictionary. You can access the variables by using the dictionary keys. For example, to access the `lookupBillCode` variable, you can use the following code:

>```python
># Load the calpa module located in the scripts/python/calpa directory
>from calpa import calpa
>
># Getting the lookup variables from the codebook module directly
>calpa.codebook.lookupBillCode
>
># Assigning the lookup variables to a stored in the session
>lookupBillCode = calpa.codebook.lookupBillCode
>```


In [7]:
# Codebook lookup variables
codebookLookupVars = [var for var in dir(calpa.codebook) if var.startswith('lookup')]
codebookDictVars = [var for var in dir(calpa.codebook) if var.startswith('dict')]
print(f"Codebook Lookup Variables:\n{codebookLookupVars}\n")
print(f"Codebook Dictionary Variables:\n{codebookDictVars}\n")

Codebook Lookup Variables:
['lookupBillCode', 'lookupBillTextType', 'lookupBillType', 'lookupBodyType', 'lookupEventType', 'lookupMimeType', 'lookupPartyType', 'lookupProgressType', 'lookupReasonType', 'lookupRoleType', 'lookupSastType', 'lookupSponsorType', 'lookupStateType', 'lookupStatusType', 'lookupSupplementType', 'lookupVoteType']

Codebook Dictionary Variables:
['dictGetAmendment', 'dictGetBill', 'dictGetBillText', 'dictGetPerson', 'dictGetRollCall', 'dictGetSessionList', 'dictGetSupplement']



<h3 style="font-weight:bold; color:lime; padding-left: 50px">Load Stored Data</h3>

In [None]:
mdt(3, "Load Stored Data")

Load the stored data from the `data` directory. This includes the following data files:
- `sessionListStored`: This file contains the list of legislative sessions.
- `sessionPeopleStored`: This file contains the list of legislative session people.
- `datasetListStored`: This file contains the list of legislative datasets.
- `datasetListRawStored`: This file contains the list of raw legislative datasets.
- `masterListStored`: This file contains the list of legislative master datasets.
- `aiBillListStored`: This file contains the list of AI legislative bills.
- `aiBills`: This file contains the the AI legislative bills data.
- `aiBillsSummariesStored`: This file contains the list of AI legislative bill text summaries.

In [8]:
# Obtain the stored sessions list from JSON dictionary on disk (data/lookup directory)
sessionListStored = legiscan.getStoredData(dataType = "session")

# Obtain the stored session People list from JSON dictionary on disk (data/lookup directory)
sessionPeopleStored = legiscan.getStoredData(dataType = "people")

# Obtain the stored dataset list from JSON dictionary on disk (data/lookup directory)
datasetListStored = legiscan.getStoredData(dataType = "dataset")

# Get the stored raw master list from JSON dictionary on disk (data/lookup directory)
masterListRawStored = legiscan.getStoredData(dataType = "master", raw = True)
# Get the stored master list from JSON dictionary on disk (data/lookup directory)
masterListStored = legiscan.getStoredData(dataType = "master", raw = False)

# Get the AI monitoring list from disk (data/lookup directory)
aiBillListStored = legiscan.getStoredData(dataType = "bills", project = "AI")

# Get the AI full list of bills from dism (data/legis/json directory)
aiBills = legiscan.getStoredData(dataType = "data", project = "AI")

# Get the AI bill summries list from disk (data/lookup directory)
aiBillsSummariesStored = legiscan.getStoredData(dataType = "summaries", project = "AI")

<h1 style="font-weight:bold; color:orangered; border-bottom: 2px solid orangered">2. Markdown Data Processing</h1>

In [None]:
mdt(1, "2. Markdown Data Processing")

The main markdown processing function is located in the `legiscan` class of the `calpa` module. The function is called `aiBillMarkdown()`. This function takes a legislative bill dictionary (obtained from the LegiScan API) and processes it to extract the relevant information. The function returns a markdown file containing the legislative bill information. The markdown file is stored in the `markdown/AI` directory of the project, and (optionally) mirrors the file to the relevant Obsidian vault directory. The markdown file is named using the legislative bill ID and its awareness includes the legislative bill period (it stores the file under the relevant legislative session subfolder). The markdown file has the following sections:
1. **YAML properties**: This section contains the YAML properties of the legislative bill. The YAML properties are listed in a formal markdown format.
2. **Markdown Information Section**: This section contains the main markdown content of the legislative bill information in the form of callout information boxes. Has the following sections:
    - **TL;DR Summary Callout**: Contains the Azure OpenAI TL;DR Summary for the legislative bill.
    - **Bill Metadata Callout**: Contains the detailed metadata of the legsiative bill.
    - **Bill Citation Callout**: Formats and displays the APA 7th bibliographic citation of the bill.
3. **AI Notes Section**: This section contains the AI notes for the legislative bill. The AI notes are obtained from the `markdown/notes` directory of the project, imported by the function, and formatted for display in the markdown file.
4. **Embedded State Bill Webpage**: This section contains the embedded state bill webpage. The embedded state bill webpage is obtained from the bill metadata and displayed in an embedded `iframe` object inside the markdown file.

In order to process the markdown files, the following process is implemented in the code below. Specifically, we loop through each legislative period in our bill dataset, and consequently trough each legislative bill in the period. For each legislative bill, we call the `aiBillMarkdown()` function to process the markdown file. The function takes the legislative bill dictionary as input and returns the markdown file as output.

In [12]:
# Process the markdown files for each legislative session and bill
# Set the total bill count to 0
j = 0
for myPeriod, myBillList in aiBills.items():
    # Loop through each period
    print(f"\n{myPeriod} Legislative Session:")
    # Set the bill count to 1
    i = 1
    # Get the number of bills in the period
    l = len(myBillList)
    for billId, bill in myBillList.items():
        # for each bill in the period, generate the markdown file
        legiscan.createBillMarkdown(
            billPeriod = myPeriod,
            billId = billId,
            prjPart = "AI",
            billsDict = aiBills,
            billsSummariesDict = aiBillsSummariesStored,
            obsidianSync = True
        )
        if i < l:
            # If not the last bill, add a comma and space
            print(f"{billId} ({i}/{l})", end=", ")
        else:
            # If the last bill, add a period
            print(f"{billId} ({i}/{l})")
        i += 1
    j += i - 1
# Print the total number of bills in the period
print(f"\nCompleted processing the bill list (Total: {j} bills)")
del j, i, l, myPeriod, myBillList, billId, bill


2013-2014 Legislative Session:
SB836 (1/3), SB860 (2/3), AB1465 (3/3)

2017-2018 Legislative Session:
ACR215 (1/3), AB2662 (2/3), SB1470 (3/3)

2019-2020 Legislative Session:
AB594 (1/16), AB1576 (2/16), SB348 (3/16), SJR6 (4/16), AB459 (5/16), AB976 (6/16), SB444 (7/16), SB730 (8/16), AB156 (9/16), AB485 (10/16), SCR13 (11/16), ACR125 (12/16), AB3317 (13/16), SB752 (14/16), AB2269 (15/16), AB3339 (16/16)

2021-2022 Legislative Session:
SB1216 (1/11), AB587 (2/11), AB13 (3/11), AB1545 (4/11), SB54 (5/11), AB1400 (6/11), SB1018 (7/11), SR11 (8/11), AB2826 (9/11), AB2224 (10/11), AB1651 (11/11)

2023-2024 Legislative Session:
SB1288 (1/71), SB896 (2/71), AB2013 (3/71), SB1047 (4/71), AB2652 (5/71), AB3030 (6/71), AB2355 (7/71), SB893 (8/71), SB1120 (9/71), AB1831 (10/71), AB2885 (11/71), SB892 (12/71), SB1235 (13/71), SB942 (14/71), SB970 (15/71), SB721 (16/71), AB2876 (17/71), SB933 (18/71), AB331 (19/71), AB2811 (20/71), SB313 (21/71), AB3204 (22/71), AB2905 (23/71), SB1220 (24/71), A

<div style = 'background-color:indigo'><center><h1 style='font-weight:bold; color:goldenrod; border-top: 2px solid goldenrod; border-bottom: 2px solid goldenrod; padding-top: 5px; padding-bottom: 10px'>End of Script</h1></center></div>

In [None]:
mdt(5)