---
title: Easily Build Customized LLMs
date: "2025-07-15"
image: llm_photo.png
categories: [Python, OPENAI, LLMs, Jupyter, OBBBA]
---

## Introduction

One of the use cases for localized LLMs is to help digest new bills, policies, and or court decisions in an efficient and expedient manner. Journalists frequently encounter brand new material before it becomes public and cannot store the document on any server.

In this model, you need to add a complex, detailed, and involved document, in this case it’s the summary of the “One Big Beautiful Bill Act” or some might call the “One Big Bad Bill Act.” Whatever you call it, we’ll refer to it as the OBBBA. It does a pretty good job and you can see how you structure your query makes a difference on the response. Some prompts will be better than others.

For best results, I have found that adding other analysis improves the responses and contexts, but here we are just going to have the OBBA summary and text from the government website as a demonstration on how this can be used and test it to see if it would be comprehensive enough for journalists to use.

This model uses minimally trained model from OPENAI, and fairly inexpensive of all the models to do the query. Since the subject matter is so narrow - we do not need a big model - just big enough to be fairly responsive to our queries.

Adding more PDFs will get a richer and more comprehensive output. Working on your queries or prompts will also improve the results.
 

### Python Packages Required

In [1]:
#Install in packages (pip) in terminal - if missing
#!pip install python-dotenv
from dotenv import load_dotenv
#pip install duckdb
import duckdb
#pip install llama_index_core
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os #built in package
#pip install openai
import openai
#pip install textwrap
import textwrap 
#pip install llama_index.vector_stores.duckdb
import llama_index.vector_stores.duckdb
#pip install llama-index-embeddings-openai
from llama_index.embeddings.openai import OpenAIEmbedding
#pip install llama-index-llms-openai
from llama_index.llms.openai import OpenAI
#pip install gradio
import gradio as gr

The LLM creates a vector store everytime it runs. So we delete it before storing the new vector

In [2]:
file_path = 'persist/my_vector_store.duckdb'

# Check if file exists
if os.path.exists(file_path):
  #Delete the file
  os.remove(file_path)
  print("File deleted successfully")
else:
  print("File doesn't exist - first run - it's all good")

File deleted successfully


Next we find the OPENAI key and set up the environment so that we are able to use its index capability and using llama indexing provided by Meta's open source

In [3]:
from dotenv import load_dotenv
#load_dotenv()

load_dotenv(dotenv_path="secrets/.env")

api_key = os.getenv('OPENAI_API_KEY')

from openai import OpenAI
client = OpenAI(api_key=api_key)

Here we import the indexing packages to store the indexing in DuckDB.

In [4]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.duckdb import DuckDBVectorStore
from llama_index.core import StorageContext

vector_store = DuckDBVectorStore("my_vector_store.duckdb", persist_dir="persist/")
documents = SimpleDirectoryReader("/Users/Eileen/Desktop/GoData/Blog/posts/LLM_Demo/OBBBA/").load_data()

This is where storage_context points to your indexed PDFs in the storage you specified above

In [5]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

## Deployment 

This is a basic interface so that you can query anything about the PDF(s) you stored. In this case we stored two documents from [https://www.congress.gov/](https://www.congress.gov/):

[https://www.congress.gov/bill/119th-congress/house-bill/1/text](https://www.congress.gov/bill/119th-congress/house-bill/1/text)

[https://www.congress.gov/bill/119th-congress/house-bill/1](https://www.congress.gov/bill/119th-congress/house-bill/1])

We printed them to PDFs and stored them under the OBBBA directory.

Some responses will not be what you expect and even provide inaccurate in incomplete responses. The reponses have limited output, so it's best to narrow your questions as much as possible and to provide the sections and provisions for verifications.  The prompts below have been tested and can be verified with the above documents. The prompt area is live, so have fun and dig in.


This is a basic interface so that you can query anything about the PDF(s) stored.

**Suggested prompts for potential stories:**

What is OBBBA?

How are Medicaid programs affected? List provision or section number for verification. Output in list.

How are SNAP programs affected? List provision or section number for verification. Output list. 

What are all the environmental reductions, rescinds, and rescissions? List provisions and section number for verification. Output to list.

What are all the environmental increases, and or establishment appropriations (not rescissions)? List provisions and section number for verification. Output to list.

What are all the welfare reductions, and rescissions. List provisions and section number for verification. Output to list.

What are all the law enforcement reductions, and rescissions. List provisions and section number for verification. Output to list.

What are all the  law enforcement increases and or appropriations? List provisions and section number for verification. Output to list.

What are all the welfare appropriations, and increases?. List provisions and section number for verification. Output to list.

What is the DOJ fund named ”BIDEN.” What is it used for? list by item. What provision is it under? How  much is being appropriated for it?

How is FEMA being appropriated? List the provision or section number  along with description for verification. Is FEMA still going to be used for disaster recovery?


In [6]:
import gradio as gr

def greet(query):
    query_engine = index.as_query_engine()
    response = query_engine.query(query)
    return str(response)

gr.Interface(
  fn=greet,
  inputs=gr.Textbox(lines=1, placeholder="Enter your query here...",
  label="Your Query"),
  outputs=gr.Textbox(label="Response")
  ).launch(share=False, pwa=True)

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


