# Code for "Getting in the Door: Streamlining Intake in Civil Legal Services with Large Language Models"

This notebook contains the code for the paper "Getting in the Door: Streamlining Intake in Civil Legal Services with Large Language Models", available at:
[https://arxiv.org/abs/2410.03762](https://arxiv.org/abs/2410.03762)

It loads a number of scenarios (from scenarios.tsv) and intake rules (intake_J1.txt, intake_J2.txt, intake_J3.txt), and then uses various LLMs to predict whether a certain situation falls under the provided intake rules. The results are exported to a file that can be further analyzed.

To run this project, you need to provide your own LLM API keys. To do so, copy the .env-example into .env, and add the keys for the models you want to use.

If you use the code, please cite us at:
```
@misc{steenhuis2024gettingdoorstreamliningintake,
      title={Getting in the Door: Streamlining Intake in Civil Legal Services with Large Language Models}, 
      author={Quinten Steenhuis and Hannes Westermann},
      year={2024},
      eprint={2410.03762},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2410.03762}, 
}
```




# Utility Functions

In [None]:
#Load dotenv
from dotenv import load_dotenv
load_dotenv()

In [2]:
#Utility functions to call an LLM model

import litellm
from litellm import completion
import json
import re
import time

def do_the_call(messages, model):
    if model != "gpt-4-0613":
        response = completion(
                model=model,
                temperature=0,
                messages=messages,
                response_format={ "type": "json_object" },
            )
    else:
        response = completion(
                model=model,
                temperature=0,
                messages=messages,
            )
    if response.choices[0].message.content == None:
        print (response)
        
    stri =  response.choices[0].message.content
    return stri

def llm_call(system, users, model):
    print (f"Calling LLM {model}")
    messages = [{"role": "system", "content": system}]
    for user in users:
        messages.append({"role": "user", "content": user})
    for i in range(3):
        try:
            stri = do_the_call(messages, model)
            return json.loads(stri)
        except Exception as e:
            print (e)
            return {
                "qualifies": "failed",
                "narrative": "LLM failed to return response"
            }

# Load and Clean the Data

In [9]:
import pandas as pd

# Load the TSV file
file_path = 'scenarios.tsv'
df = pd.read_csv(file_path, sep='\t')

In [None]:
# Drop the first row
df_cleaned = df.drop(0).reset_index(drop=True)

# Extract the necessary columns and convert them into a list of dictionaries
data_objects = []
for _, row in df_cleaned.iterrows():
    obj = {
        "ID": int(row['ID']),
        "topic": row['topic'],
        "description": row['description'],
        "outcomes": [row['J1 outcome'], row['J2 outcome'], row['J3 outcome']]
    }
    data_objects.append(obj)

print (len(data_objects))

print (data_objects)


In [None]:
intake_rules = []
for num in range(1, 4):
    with open(f'intake_J{num}.txt') as f:
        intake_rules.append(f.read())

intake_rules

# Prompt

In [12]:
system_prompt = '''
Based on the qualification criteria,
assess whether the user meets at least the *minimum* criteria to get a full intake.

Do not decide if getting legal help is wise or if the user can try other steps first unless it is specifically mentioned in the intake criteria.
Rely only on the information about in/out criteria that are provided.

You can have one of 3 possible responses:

1. The user qualifies (true)
2. The user does not qualify (false)
3. You need more information to determine if the user qualifies (null)

Use direct address. Provide a qualified answer in the narrative, without guaranteeing the user will be accepted as a client.

Use simple words in your questions and answers. Do not use legal jargon even if the original rules do. Aim for a 6th grade reading level.

Answer in the form of a JSON object like this:

Example:
{"qualifies": null,
"narrative": "Did you (follow-up question)?"} # if more information is needed

Example:
{"qualifies": true,
"narrative": "You probably qualify because (reason)"}

Example:
{"qualifies" false,
"narrative": "You probably do not qualify because (reason)"}


'''

def get_prompts(criteria, situation):
    criteria_prompt = f"The only criteria you will rely on in your answer are as follows: \n```{ criteria }\n```. Do not add any other requirements."
    situation_prompt = f"The situation of the user is as follows: \n```{ situation }\n```"
    return system_prompt, [criteria_prompt, situation_prompt]

# Run the experiments

In [None]:
import json
litellm.drop_params=True

jurisdictions = ["Eastern Missouri (St Louis)", "Western Missouri (Central office, Kansas City)", "Mid-Missouri"]
Outcome_mapping = {
    True: "Accept",
    False: "Deny",
    None: "Question",
    "failed": "Failed"
}

#Add models here to test multiple models.
models = ["gpt-4o-2024-08-06"]
results = []

for model in models:
    for i, jurisdiction in enumerate(jurisdictions):
        for data_object in data_objects:
            system, users = get_prompts(intake_rules[i], data_object['description'])
            response_object = llm_call(system, users, model)
            print (response_object)
            print (data_object['outcomes'][i])
            mapped_response = Outcome_mapping[response_object['qualifies']]
            results.append({
                "Description_id": data_object['ID'],
                "Description": data_object['description'],
                "Jurisdiction_id": i,
                "Jurisdiction": jurisdiction,
                "Topic": data_object['topic'],
                "Predicted": mapped_response,
                "Target": data_object['outcomes'][i],
                "Narrative": response_object['narrative'],
                "correct": mapped_response == data_object['outcomes'][i],
                "Model": model
            })

#Export the results to a TSV file
results_df = pd.DataFrame(results)
results_df.to_csv('results.tsv', sep='\t', index=False)

# Export to TSV for further analysis

In [14]:
#Export the results to a TSV file
results_df = pd.DataFrame(results)
results_df.to_csv('results.tsv', sep='\t', index=False)