---
title: Easily Build Customized LLMs
date: "2025-07-15"
image: llm_photo.png
categories: [Python, OPENAI, LLMs, Jupyter, OBBBA]
---


## Introduction

Whether you refer to the One Big Beautiful Bill or the One Big Bad Bill - this LLM demonstrates how to query the bill that is now law. Just how many unexposed nuggets buried in this bill is anyone's guess. In the
this case, it's a useful way to query it. It does a pretty good job and you can see how you structure your query makes a difference on the response. Some prompts will be better than others. We listed some prompts below to get you started.

One of the use cases for localized LLMs is to help digest new bills, policies, and or court decisions in an efficient and expedient manner. Journalists frequently encounter brand new material before it becomes public and cannot store the document on any server. 

For best results, we have found that adding other analysis improves the responses and contexts, but here we are just going to have the OBBA summary from the government website as a demonstration on how this can be used and test it to see if it would be comprehensive enough for journalists to use.  

It's important to check to make sure the responses are accurate. The 2 source documents are listed below. Including the section number in the query will make verifying the information easier - since you can just look it up. 

This model uses minimally trained model from OPENAI, and fairly inexpensive of all the models to do the query. Since the subject matter is so narrow - we do not need a big model - just big enough to be fairly responsive to our queries. 

Adding more PDFs will get a richer and more comprehensive output. Working on your queries or prompts will also improve the results. 
 

### Python Packages Required

In [22]:
#Install in packages (pip) in terminal - if missing
#!pip install python-dotenv
from dotenv import load_dotenv
#pip install duckdb
import duckdb
#pip install llama_index_core
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os #built in package
#pip install openai
import openai
#pip install textwrap
import textwrap 
#pip install llama_index.vector_stores.duckdb
import llama_index.vector_stores.duckdb
#pip install llama-index-embeddings-openai
from llama_index.embeddings.openai import OpenAIEmbedding
#pip install llama-index-llms-openai
from llama_index.llms.openai import OpenAI
#pip install gradio
import gradio as gr

The LLM creates a vector store everytime it runs. So we delete it before storing the new vector

In [23]:
file_path = 'persist/my_vector_store.duckdb'

# Check if file exists
if os.path.exists(file_path):
  #Delete the file
  os.remove(file_path)
  print("File deleted successfully")
else:
  print("File doesn't exist - first run - it's all good")

File deleted successfully


Next we find the OPENAI key and set up the environment so that we are able to use its index capability and using llama indexing provided by Meta's open source

In [24]:
from dotenv import load_dotenv
#load_dotenv()
# Point to the secrets/.env file
load_dotenv(dotenv_path="secrets/.env")

api_key = os.getenv('OPENAI_API_KEY')


from openai import OpenAI
client = OpenAI(api_key=api_key)

Here we import the indexing packages to store the indexing in DuckDB.

In [25]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.duckdb import DuckDBVectorStore
from llama_index.core import StorageContext


vector_store = DuckDBVectorStore("my_vector_store.duckdb", persist_dir="persist/")
documents = SimpleDirectoryReader(input_dir="OBBBA/").load_data()

This is where storage_context points to your indexed PDFs in the storage you specified above

In [26]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

## Deployment 

This is a basic interface so that you can query anything about the PDF(s) you stored. In this case we stored two documents from https://www.congress.gov/: 

[https://www.congress.gov/bill/119th-congress/house-bill/1/text](https://www.congress.gov/bill/119th-congress/house-bill/1/text)  

[https://www.congress.gov/bill/119th-congress/house-bill/1](https://www.congress.gov/bill/119th-congress/house-bill/1)  

We printed them to PDFs and stored them under the OBBBA directory. 

 Some responses will not be what you expect and even provide inaccurate in incomplete responses. This is the nature of LLMs and is good to see it's limitations. The prompts below have been tested and provide pretty accurate information from what I can see. The prompt area is live, so have fun while the funding lasts. 

**Useful Prompts:**

**"What is the summarization of OBBBA?"**

**"What are the recissions, modifications and extensions of the programs and or organizations listed in the OBBBA? List  by appearance with section number and title description. Output in list"**

**"What are the SNAP and Medicaid of the programs and or organizations listed in the OBBBA? List  by appearance with section number and title description. Output in LIst"**

**"What is the DOJ fund named "BIDEN." What  is it used for? list by item. What provision is it under?"**


In [29]:
import gradio as gr

# Create a custom theme with blue as the primary color
theme = gr.themes.Default()  
#theme = gr.themes.Default(primary_hue="blue", font=["Helvetica", "sans-serif"])

def greet(query):
    query_engine = index.as_query_engine()
    response = query_engine.query(query)
    return str(response)

gr.Interface(
  fn=greet,
  inputs=gr.Textbox(lines=1, placeholder="Enter you query here...",
  label="Your Query"),
  outputs=gr.Textbox(label="Response")).launch(share=True, pwa=True)

* Running on local URL:  http://127.0.0.1:7864
* Running on public URL: https://7b2e0b9c17dd2f97fb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


