## AI x Democracy - Hackathon Sprint

This notebook was co-authored by [@jeremy-dolan](https://github.com/jeremy-dolan) and [@sampatt](https://github.com/sampatt) as part of Apart Research's [AI x Democracy hackathon](https://www.apartresearch.com/event/ai-democracy). Our group's overarching project investigated U.S. federal agencies' public comment systems vulnerability to AI-driven manipulation. This notebook demonstrates the automated generation of arbitrarily-many diverse, fake public comments from fictitious persons responding to proposed federal rules and regulations (as posted to [regulations.gov](https://www.regulations.gov/)).

Further information on our team's project: https://www.apartresearch.com/project/artificial-advocates-biasing-democratic-feedback-using-ai


In [None]:
!pip install pandas
!pip install numpy
!pip install replicate
!pip install faker

In [2]:
import pandas as pd
import numpy as np
import io
import replicate
import random
import textwrap
from faker import Faker

In [53]:
# Load human dataset and display a random comment

# (Google Colab version)
# from google.colab import files
# uploads = files.upload()
# df = pd.read_csv(uploads['comments.csv'])

# comments from Docket ED-2023-OPE-0123, Document ID ED-2023-OPE-0123-26398
# https://www.regulations.gov/document/ED-2023-OPE-0123-26398
df = pd.read_csv('datasets/human-written/full-comments-dataset.csv')

random_comment = df.sample(n=1)

first_name = random_comment['First Name'].values[0]
last_name = random_comment['Last Name'].values[0]
city = random_comment['City'].values[0]
state_province = random_comment['State/Province'].values[0]
comment = random_comment['Comment'].values[0]

formatted_comment = f"{first_name} {last_name}, from {city}, {state_province}, said:\n\n\"{comment}\""
print(formatted_comment)

Sam Ciraulo, from Troy, NY, said:

"As one who took out student loans as both an undergraduate and graduate student and who worked two jobs to pay off my loans (while raising a family) I am opposed to the proposed regulatory changes being put forth by the administration.   It not only rewards  borrowers for having shirked their obligation to pay (and don’t tell me that it is too difficult to get out from under the debt.  I guarantee that every single one of the borrowers who will benefit from this change has a $1000 cell phone, or owns a car that is less than three or four years old.)

Additionally, I paid my loans off… why should my tax dollars pay off someone else’s loan?  Does not make sense.  

Finally, this is nothing more than pandering for votes by the Biden administration and it frankly it disgusts me!

This rule change is simply bad and should not be implemented"


In [24]:
# Export 5 random comments from the dataset

# Select sample size
random_comments = df.sample(n=5)

# Select columns to retain
columns_to_retain = ['First Name', 'Last Name', 'City', 'State/Province', 'Comment', 'Category']
filtered_comments = random_comments[columns_to_retain]

# Create a new CSV from these filtered comments
filtered_comments.to_csv('comments_random_filtered.csv', index=False)

# Preview comments
for index, row in filtered_comments.iterrows():
    formatted_comment = f"{index}: {row['First Name']} {row['Last Name']}, from {row['City']}, {row['State/Province']}, said:\n\n\"{row['Comment']}\"\n"
    print(formatted_comment)

714: Todd Simmons, from Tucson, AZ, said:

"This is over complicated 

Instead of amounts you should do dates like if a student was enrolled from 2005-2018 then are immediately eligible for a 50% loan reduction. 

Also im not seeing a needed fix in providing disability. If your going to have tbe disability forgiveness program you need to understand that certain social security situations don't require a "review" period especially for veterans with a 100% total and permanent. I know people who submitted their awards letters who have been disabled for years and couldn't qualify. That's just not right. "

188: Malesha Padgett, from Daytona Beach, FL, said:

"There is no way people should be forced to pay a loan or put food on the table.  America needs to be ashamed of how they treat their own citizens.  People went into debt trying to provide a better life for their families.  Now they can barely pay their bills, live paycheck to paycheck, and get no help from our government.  Loans that 

In [18]:
# Interactively load Replicate API token
from getpass import getpass
import os

REPLICATE_API_TOKEN = getpass()
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

In [44]:
# Stream a simple one-shot completion to test API access
new_prompt = (
    f"Writing a two-sentence comment in {random.choice(['support of', 'opposition to'])} a proposed regulation to forgive student loan debt. "
    f"Create a new alias and location, and use the following comment as an example:\n\n<example>\n{formatted_comment}\n</example>\n"
    f"Only respond with the comment, no other explanation. Comment:\n\n"
)

for event in replicate.stream(
    "meta/meta-llama-3-70b-instruct",
    input={
        "prompt": new_prompt,
        "max_tokens": 200
    },
):
    print(str(event), end="")

Ava Morales, from Fresno, CA, said:

"I strongly disagree with this proposal to forgive student loan debt. It's unfair to burden hardworking taxpayers, many of whom had to forgo higher education due to financial constraints, with the responsibility of paying off the debts of others who chose to take out loans they couldn't afford to repay."

In [20]:
# TODO: automatically scrape executive summary and generate alternative short summary

# verbatim 'Executive Summary' from the Proposed Rule document
# https://www.regulations.gov/document/ED-2023-OPE-0123-26398

executive_summary = f"Since 1980, the total cost to receive a four-year postsecondary credential has nearly tripled, even after accounting for inflation. Pell Grants once covered nearly 80 percent of the cost of a four-year public college degree for students from low- and middle-income families, but now they only cover a third of those costs. This price growth has dramatically increased the need for students to secure student loans, particularly Federal student loans from the Department, to cover their educational costs. The gap between prices and income means that many students from low- and middle-income families have to borrow Federal student loans in addition to grants and out-of-pocket spending so they can earn a postsecondary credential. These trends have resulted in cumulative Federal loan debt of $1.6 trillion and rising for more than 43 million borrowers, which has placed a significant financial burden upon middle-income borrowers and has had an even more devastating impact on vulnerable low-income borrowers.\n\nAfter convening the Student Loan Debt Relief negotiated rulemaking committee (Committee) and reaching consensus on various issues discussed in this NPRM, the Department proposes regulations, in accordance with the Secretary's authority to waive repayment of a loan provided by section 432(a) of the HEA, to provide debt relief targeted to address certain specific circumstances as part of a comprehensive effort to address the burden of Federal student loan debt. The proposed regulations would modify the Department's existing debt collection regulations to provide greater specificity regarding the Secretary's discretion to waive Federal student loan debt and specify the Secretary's authority to waive all or part of any debts owed to the Department based on a number of different circumstances, such as growth in a borrower's loan balance beyond what was owed upon entering repayment, the amount of time since the loan first entered repayment, whether the borrower meets certain criteria for loan forgiveness or discharge under existing authority, and whether a loan was obtained to attend an institution or program that was subject to secretarial actions, that closed prior to secretarial actions, or was associated with closed Gainful Employment programs with high debt-to-earnings rates or low median earnings."

# two sentence summary of above generated by GPT4-turbo
short_summary = (
    "The policy addresses the sharp increase in the cost of four-year postsecondary education, "
    "which has outpaced the coverage of Pell Grants and significantly increased reliance on "
    "Federal student loans, leading to a cumulative debt of $1.6 trillion among over 43 million "
    "borrowers. In response, the Department of Education proposes regulations to provide targeted "
    "debt relief by specifying the Secretary's authority to waive portions of Federal student "
    "loan debt under various conditions, aiming to alleviate the financial burden on borrowers, "
    "especially those from vulnerable low-income backgrounds."
)

print(len(executive_summary), len(short_summary))

2331 597


In [30]:
# Fake persona generator

fake = Faker()

def gen_persona() -> dict:
    """Roll a random name, age, and location, and generate a backstory for this fictitious person"""
    fake_first_name = fake.first_name()
    fake_last_name = fake.last_name()
    fake_city = fake.city()           # NB: Faker() returns fictional places, and never large, well-known cities. This biases
    fake_state = fake.state()         #     the biographies. Better would be to sample real cities proportional to population.
    fake_age = random.randint(18, 80)
    prompt = (
        f"You are assisting an author create backgrounds for minor characters in a novel. Nothing too unusual, "
        f"these are just typical folks. Write a one paragraph backstory for {fake_first_name} {fake_last_name}, "
        f"{fake_age} from {fake_city}, {fake_state}.\nBegin your response with 'Background:'\nBackground: "
        )

    fake_biography = ''.join(replicate.run(
        "meta/meta-llama-3-70b-instruct",
        input={"prompt": prompt, "max_tokens": 500, "temperature": 1.0 }))

    return {
        'first_name': fake_first_name,
        'last_name': fake_last_name,
        'city': fake_city,
        'state': fake_state,
        'age': fake_age,
        'biography': fake_biography
    }

In [34]:
# test gen_persona()
persona = gen_persona()
print(f'Generated {persona["first_name"]} {persona["last_name"]}, {persona["age"]}, from {persona["city"]}, {persona["state"]}')
print(textwrap.fill(persona['biography'], width=80))

Generated Jonathan Hawkins, 65, from New Nicholas, Oregon
Background: Jonathan Hawkins, a 65-year-old retiree from New Nicholas, Oregon,
spent most of his working life as a high school history teacher, instilling a
love of the past in generations of students. Born and raised in New Nicholas,
Jonathan's family roots run deep in the small town, where his great-grandfather
was a pioneer settler. After marrying his high school sweetheart, Susan,
Jonathan settled into a comfortable routine, teaching by day and coaching the
school's debate team by night. The couple had two children, both now grown with
families of their own, and Jonathan's proudest moments are still the times his
kids brought home their own history projects, proudly displaying their father's
influence. After Susan's passing five years ago, Jonathan found solace in his
garden, tending to the same roses and vegetables he and his wife once nurtured
together. He remains a respected figure in the community, often sought out for
h

In [43]:
# Main loop - GENERATE COMMENT ON POLICY

# List to store all outputs from the model
all_outputs = []

for _ in range(1): # number of comments to generate
    # Select three random human comments
    random_human_comments = df.sample(n=3)

    # Concatenate into pseudo-XML string
    formatted_comments = '\n'.join(f'<comment>\n{row.Comment}\n</comment>\n' \
                                     for row in random_human_comments.itertuples())

    # make a new fake persona
    persona = gen_persona()

    # randomize prompt variables:
    #temperature = random.randint (6, 12) * 0.1   # temperature range 0.6-1.2
    temperature = 0.8
    stance = random.choice(["support of", "opposition to"])
    tone = random.choice(['informal', 'formal', 'analytic', 'fervent', 'diplomatic']) # there's additional potential here. 'sarcastic' is fun but a little over the top.
    length = random.choice(["2 sentences", "1 paragraph", "2 paragraphs"])
    summary_provided = random.choice([
        ['none',      ""],
        ['short',     f"\nHere is a summary of the proposed rule:\n<summary>\n{short_summary}\n</summary>\n"],
        ['executive', f"\nHere is a summary of the proposed rule:\n<summary>\n{executive_summary}\n</summary>\n"],
    ])

    # Assemble the prompt to generate a comment on the proposed rule
    prompt = (f"""\
{persona['biography']}

You are an expert writing assistant tasked with writing a public comment on behalf of {persona['first_name']} {persona['last_name']}.
You should only mention information from {persona['first_name']}'s background if it is directly pertinent to the comment.
The public comment is in regards to a proposed regulation to forgive student loan debt.
The comment should be in {stance} the proposal.
The comment should be {tone} and about {length} long.
{summary_provided[1]}
Use the following comments as examples:
{formatted_comments}
Please respond with only the comment, no other preamble or explanation. Do not start any sentences with the word 'As'. Do not use the word 'firsthand'.
Comment: """)

    print(f'temperature={temperature}, stance={stance}, tone={tone}, length={length}\n')
    print(prompt)

    comment = ''.join(replicate.run(
        "meta/meta-llama-3-70b-instruct",
        input={"prompt": prompt, "max_tokens": 500, "temperature": temperature }))

    all_outputs.append({
        'first_name': persona['first_name'],
        'last_name': persona['last_name'],
        'city': persona['city'],
        'state': persona['state'],
        'response': comment,
        'bio': persona['bio']
        })

    print("\n".join(textwrap.fill(line, width=80) for line in comment.split('\n')))
    print('\n------\n\n')

# Convert list of outputs to DataFrame
output_df = pd.DataFrame(all_outputs)

# Save the DataFrame to CSV
output_df.to_csv('llm_outputs_character_gen-run3.csv', index=False)

temperature=0.8, stance=support of, tone=analytic, length=2 paragraphs

Background: James Kent grew up in Lake Victorland, Ohio, where his family had lived for generations. He was the youngest of five siblings, and his parents owned a small hardware store on Main Street. After high school, James worked at the store alongside his parents, eventually taking over the business when they retired. He married his high school sweetheart, Susan, and they had two children, Emily and Ryan. James was an active member of the community, coaching Little League and serving on the local Chamber of Commerce. After Susan passed away suddenly in her early 50s, James continued to run the hardware store, but his heart wasn't in it. He sold the business a few years ago and now spends his days puttering around his garden, volunteering at the local animal shelter, and spoiling his grandkids rotten. Despite the struggles he's faced, James remains a kind and optimistic soul, always willing to lend a helping hand