## Proof of concept demonstration of LLM Q&A over municipal codes

### Overview

#### Common problems with systematic queries, coding, summarization of laws

1. Time and expense (or limitations on what can be done given fixed headcount+budget)

2. Difficulty tracking and maintaining quality metrics
    
3. Front-line staff (paralegals, junior lawyers, RAs) may want to work at higher level
    
4. Need to specify queries up front

5. Changes to queries **or** changes to underlying laws/documents require re-do

#### Can large language models & embedding models help?

LLM: _(system prompt, context, query) --> **magic** --> output_

Embedding model: _(text block) --> **magic** --> [.453436, .754757, ...]_

1. Naive approach - present query directly to LLM (or chatbot interface), relying on parametric memory of training data

2. (Probably) naive approach #2 - fine tune model on specialized data (might be helpful in conjunction with other ideas below)

3. Naive approach #3 - Place hand-selected relevant sections of the local code into the LLM context window

4. Use an off-the-shelf Retrieval Augmented Generation (RAG) system to automate finding potentially-relevant context

5. Make RAG better:

    a. Custom document parser based on structure and semantics of code

    b. collect related definitions, second-order info, etc. during parsing and include in context for final generation

    c. Use LLM on retrieved outputs for reranking and filtering

    d. For final generation, ask LLM to show steps (and other prompt engineering tricks)
    
    e. Provide reliable, retrieved citations and excerpts (explainability & verifiability)

#### Some issues

1. Tech infrastructure

2. Access to ordinances

3. Inconsistent formats + layouts

4. New tech / knowledge transfer

#### Discussion

_What would be helpful to do next?_

### § 0: Dependencies, data setup, and API authentication

In [1]:
%pip install -q -U openai
%pip install -q -U llama-index

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
import requests, os

# URL points to a copy of Title 7 of the Municipal Code of Chicago (Health and Safety).
# Original source: https://codelibrary.amlegal.com/codes/chicago/latest/chicago_il/0-0-0-2640851
# Downloaded as docx and converted to text using 'textutil -convert txt' on Mac
# (which yields a cleaner file than directly downloading the text version from the website).
# Saved to my Google Drive 'Public' folder, with URL modified to allow direct download.

CHICAGO_CODE_URL = "https://drive.google.com/uc?export=download&id=18GMl2jxKtRYIzWNEHy1zrdXqhSeLeKQE"
LOCAL_PATH = '../data/poc/chicago_code.txt'

dir_path = os.path.dirname(LOCAL_PATH)

if not os.path.exists('data/poc/chicago_code.txt'):
    print("Downloading Title 7 of the Chicago Municipal Code...")

    chicago_code = requests.get(CHICAGO_CODE_URL).text
    
    if not os.path.exists(dir_path):
        os.makedirs(dir_path)
    with open(LOCAL_PATH, 'w') as f:
        f.write(chicago_code)
else:
    print("Title 7 of the Chicago Municipal Code already downloaded.")
    with open(LOCAL_PATH, 'r') as f:
        chicago_code = f.read()


Title 7 of the Chicago Municipal Code already downloaded.


In [2]:
# The following environment variable should be set:
# OPENAI_API_KEY: API key for OpenAI GPT-3.5 and GPT-4

helpmsg = """
If running in a Google Colab notebook, you can set this environment variable
using 'secrets' (see https://colab.research.google.com/notebooks/secrets.ipynb).
If running locally, you can set it in your shell or in a .env file 
(see https://pypi.org/project/python-dotenv/).
"""

openai_api_key = os.getenv('OPENAI_API_KEY')
assert openai_api_key is not None, "OPENAI_API_KEY environment variable not set.\n" + helpmsg

### § 1: Relying Solely on Parametric Memory – Just Ask the LLM

In [3]:
from openai import OpenAI

openai_client = OpenAI(api_key=openai_api_key)

def gpt35(prompt):
    chat_completion = openai_client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        model="gpt-3.5-turbo",
    )
    return chat_completion.choices[0].message.content

def gpt4(prompt):
    chat_completion = openai_client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        model="gpt-4",
    )
    return chat_completion.choices[0].message.content

llms = {"GPT-3.5 Turbo": gpt35, "GPT-4": gpt4}

In [4]:
from textwrap import fill

prompt = """
Does the Municipal Code of Chicago contain restrictions on drug paraphernalia?
What section of the code contains these restrictions, if any?
What are the penalties for violating any such restrictions?
"""

for name, llm in llms.items():
    print(f"==> Testing {name}'s parametric knowledge of the Chicago Municipal Code...\n")
    response = llm(prompt)
    for line in response.splitlines():
        print(fill(line, width=80))
    print()

==> Testing GPT-3.5 Turbo's parametric knowledge of the Chicago Municipal Code...

Yes, the Municipal Code of Chicago contains restrictions on drug paraphernalia.
These restrictions can be found in Section 4-244 of the code.

The penalties for violating these restrictions can include fines of up to $500
for each offense. Additionally, individuals found in violation of these
restrictions may also face criminal charges and potential imprisonment.

==> Testing GPT-4's parametric knowledge of the Chicago Municipal Code...

Yes, the Municipal Code of Chicago does contain restrictions on drug
paraphernalia.

These offenses are addressed in Section 7-24 of the Municipal Code of Chicago.
According to Section 7-24-225 of the Code, it is unlawful for any person to use,
or to possess with intent to use, drug paraphernalia to plant, propagate,
cultivate, grow, harvest, manufacture, compound, convert, produce, process,
prepare, test, analyze, pack, repack, store, contain, conceal, inject, ingest,
i

All models show the issues relying on parametric memory for legal questions: hallucination, vagueness, unreliable citations, etc.

### § 2: In-Context Learning (Manually Supplying Relevant Context)

In [5]:
# pull out the relevant section of the code (markers searched by hand)
start = chicago_code.find("ARTICLE II. DRUG PARAPHERNALIA")
stop = chicago_code.find("ARTICLE II-A. CONTROLLED SUBSTANCES")
article_text = chicago_code[start:stop]
for line in article_text.splitlines():
        print(fill(line, width=80))

ARTICLE II. DRUG PARAPHERNALIA* (7-24-091 et seq.)
* Editor's note – Coun. J. 5-12-99, p. 3327, renumbered Articles II – VI as
Articles III – VIII to accommodate inclusion of §§ 7-24-091 – 7-24-097, as Art.
II.

7-24-091 Possession or delivery.
   1.   Except as authorized by law, any person who delivers, furnishes,
transfers, or possesses with intent to deliver, furnish or transfer, drug
paraphernalia, as defined in Section 720 ILCS 600/2, subparagraph (d)(5) of the
Drug Paraphernalia Control Act, and including glass tubing designed and utilized
for the ingestion of crack or cocaine, knowing, or under circumstances where one
reasonably should know, that such drug paraphernalia or glass tubing will be
used to plant, propagate, cultivate, sow, harvest, compound, convert, produce,
process, prepare, test, analyze, pack, repack, store, contain, conceal, inject,
ingest, inhale, or otherwise introduce into the human body a controlled
substance in violation of the Illinois Controlled Substanc

In [6]:
prompt = f"""
Here is some context -- an Article from Title 7 of the Chicago Municipal Code:

---
{article_text}
---

Based on the context provided, please answer the following questions.
In each case, explain your reasining step-by-step, and provide specific
references to the relevant section of the code where applicable.

1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?
2. What are the restrictions on drug paraphernalia, if any?
3. Are any such restrictions targeted specifically to minors?
4. What are the penalties for violating any such restrictions?

Responses:
"""

responses = {}
for name, llm in llms.items():
    print(f"==> Testing {name}'s in-context learning...\n")
    response = llm(prompt)
    responses[name] = response
    for line in response.splitlines():
        print(fill(line, width=80))
    print()

==> Testing GPT-3.5 Turbo's in-context learning...

1. Yes, the Municipal Code of Chicago does contain provisions relating to drug
paraphernalia. This can be found in the article titled "ARTICLE II. DRUG
PARAPHERNALIA" within Title 7 of the Chicago Municipal Code.

2. The restrictions on drug paraphernalia include possession, delivery,
manufacture, accomplice liability, delivery to persons under 18 years of age on
school grounds, and license revocation. These restrictions are outlined in
sections 7-24-091 through 7-24-095 of the Municipal Code.

3. Yes, there are restrictions specifically targeted to minors. Section 7-24-094
prohibits any person 18 years of age or over from possessing a hypodermic needle
on school grounds with the intent to deliver it to a person under 18 years of
age for the purpose of injecting controlled substances into the human body.

4. The penalties for violating the restrictions on drug paraphernalia in the
Chicago Municipal Code include fines and imprisonment.

### § 3: Evaluation of model responses

In [8]:
questions = [
    "1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?",
    "2. What are the restrictions on drug paraphernalia, if any?",
    "3. Are any such restrictions targeted specifically to minors?",
    "4. What are the penalties for violating any such restrictions?",
]

reference_response = responses["GPT-4"]

control_response = """
1. Yes, the Municipal Code of Chicago does contain provisions relating to drug paraphernalia, according to Title 7.
2. The restrictions on drug paraphernalia include possession and use.
3. Yes, there are restrictions targeted specifically to minors. Reference: Illinois law.
4. I am the egg-man. They are the egg-men. I am the walrus. Goo goo g'joob.
"""

In [9]:
def evaluate_response(question, response, reference, llm=gpt4):
    prompt = f"""
    A language model was presented with the following question:
    ---
    {question}
    ---

    The language model's response is in the following block, numbered the same as the question:
    ---
    {response}
    ---
    
    This block contains the reference answer, again numbered the same as the question:
    ---
    {reference}
    ---

    Please evaluate the relevant part of the model response compared to the reference answer,
    for the specific numbered question, and return a single numerical score corresponding to the item
    below that best matches your evaluation:
    1. The model response is an accurate and complete answer to the question, based on the reference answer.
    2. The model response is mostly accurate and complete, but contains some inaccuracies or omissions.
    3. The model response is somewhat accurate and complete, but contains several inaccuracies or omissions.
    4. The model response is not accurate or complete, but contains some relevant information.
    5. The model response is not accurate or complete, and contains no relevant information.

    Your evaluation score:
    """

    feedback = llm(prompt)

    try:
        feedback = int(feedback)
        if not 1 <= feedback <= 5: feedback = None
    except:
        feedback = None

    return feedback


In [10]:
for name, llm in llms.items():
    print(f"\n==> Testing {name}'s in-context learning against reference answers...\n")
    response = responses[name]
    for (question, reference) in zip(questions, reference_response):
        feedback = evaluate_response(question, response, reference)
        if feedback is not None:
            print(f"Feedback for {question}: {feedback}")

print(f"\n==> Testing control (bad answers) against reference answers...\n")
for (question, reference) in zip(questions, reference_response):
    feedback = evaluate_response(question, control_response, reference)
    if feedback is not None:
        print(f"Feedback for {question}: {feedback}")


==> Testing GPT-3.5 Turbo's in-context learning against reference answers...

Feedback for 1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?: 1
Feedback for 2. What are the restrictions on drug paraphernalia, if any?: 1
Feedback for 3. Are any such restrictions targeted specifically to minors?: 1
Feedback for 4. What are the penalties for violating any such restrictions?: 1

==> Testing GPT-4's in-context learning against reference answers...

Feedback for 1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?: 1
Feedback for 2. What are the restrictions on drug paraphernalia, if any?: 1
Feedback for 3. Are any such restrictions targeted specifically to minors?: 1
Feedback for 4. What are the penalties for violating any such restrictions?: 1

==> Testing control (bad answers) against reference answers...

Feedback for 1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?: 1
Feed

### § 4: Finding relevant context automatically (basic RAG)

In this section, we use a basic Retrieval Augmented Generation setup (with default text segmentation settings) to automatically find the relevant sections of the code to place into the LLM context.

The model is GPT-3.5 turbo.

In [11]:
from llama_index.llms.openai import OpenAI
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=5,
)
response_synthesizer = get_response_synthesizer()

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
#    postprocessor=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/171 [00:00<?, ?it/s]

In [12]:
r = query_engine.query(
        "The question focuses on provisions in the Municipal Code of Chicago relating to drug paraphernalia.\n" +
        questions[1] + 
        "Please explain your reasining step-by-step, and provide specific"
        "references to the relevant section of the code where applicable."
    )

In [13]:
for question in questions:
    r = query_engine.query(
        question + 
        "Please explain your reasining step-by-step, and provide specific"
        "references to the relevant section of the code where applicable."
    )
    print(f"\n==> Results for question: {question}")
    for line in r.response.splitlines():
        print(fill(line, width=80))


==> Results for question: 1. Does the Municipal Code of Chicago contain provisions relating to drug paraphernalia?
Yes, the Municipal Code of Chicago contains provisions relating to drug
paraphernalia. The relevant sections are 7-24-091 to 7-24-097 in Article II.
Section 7-24-091 prohibits the possession or delivery of drug paraphernalia,
defining the specific actions that are considered unlawful. Section 7-24-092
addresses the manufacture of drug paraphernalia, outlining the penalties for
those involved in its production. Section 7-24-093 discusses accomplice
liability concerning drug paraphernalia, holding individuals accountable for
providing such items to others. Section 7-24-094 pertains to the delivery of
drug paraphernalia to persons under 18 years of age on school grounds. Lastly,
Section 7-24-095 states that violations of these provisions by holders of
business or liquor licenses may lead to license revocation.

==> Results for question: 2. What are the restrictions on drug p