# Lesson 6: APIs to use AI models


In this lesson, you will learn how to use the OpenAI API. You'll also see how the `print_llm_response` and `get_llm_response` functions you have been using work to pass your prompt to the OpenAI API and retrieve the response.

As always, you'll start by loading some functions you need:

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI

In [35]:
# Get the OpenAI API key from the .env file
load_dotenv('.env', override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI(api_key = openai_api_key)

In [None]:
def get_llm_response(prompt):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are an AI assistant.",
            },
            {"role": "user", "content": prompt},
        ],
        temperature=0.0,
    )
    response = completion.choices[0].message.content
    return response

In [39]:
pip install pypdf



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [40]:
pip install pymupdf



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [77]:
%pip install pytesseract pillow


Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [80]:
!pip install pdfplumber openai python-dotenv ipywidgets



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [109]:
import pdfplumber
import ipywidgets as widgets
from IPython.display import display

In [118]:
# Step 2: File upload widget for Jupyter
upload_widget = widgets.FileUpload(accept='.pdf', multiple=False)
display(upload_widget)


FileUpload(value=(), accept='.pdf', description='Upload')

In [None]:
# Step 3: Save uploaded PDF file locally
def save_uploaded_file(upload_widget):
    if upload_widget.value and len(upload_widget.value) > 0:
        file_info = upload_widget.value[0]
        file_name = file_info['name']
        content = file_info['content']
        with open(file_name, "wb") as f:
            f.write(content)
        return file_name
    return None


In [None]:
# After uploading, run this cell manually
file_path = save_uploaded_file(upload_widget)
print("Saved file:", file_path)

Saved file: SLV053.pdf


In [None]:
# Step 4: Extract raw text from PDF using pdfplumber
def extract_text_from_pdf(file_path):
    text = ""
    with pdfplumber.open(file_path) as pdf:
        for page in pdf.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\n"
    return text


In [122]:
# Step 5: Extract form fields (includes radio buttons / checkboxes) using PyPDF2
def extract_pdf_form_fields(file_path):
    reader = PdfReader(file_path)
    fields = reader.get_fields()
    if fields:
        result = {}
        for key, val in fields.items():
            # val can be a dictionary or direct value
            if isinstance(val, dict):
                value = val.get('/V') or val.get('/AS') or None
                # Some values are NameObjects, convert to str ignoring '/'
                if value and hasattr(value, 'original_bytes'):
                    value = value.original_bytes.decode('utf-8')
                if isinstance(value, str) and value.startswith('/'):
                    value = value[1:]
                result[key] = value
            else:
                result[key] = val
        return result
    return {}




In [123]:
# Step 6: OpenAI LLM call for JSON extraction from raw text
def get_llm_response(prompt):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an AI document parser. Output valid JSON only."},
            {"role": "user", "content": prompt},
        ],
        temperature=0.0,
    )
    return completion.choices[0].message.content


You can now use this function to ask a question to an LLM:

In [None]:
# Step 7: Convert PDF text to JSON using LLM with given fields
def pdf_text_to_json(raw_text):
    form_fields = extract_pdf_form_fields(file_path)
    raw_text = extract_text_from_pdf(file_path)

    # Convert form fields dictionary into a string representation for the prompt
    form_fields_str = "\n".join([f"{k}: {v}" for k,v in form_fields.items()])

    prompt = f"""
You are a document parser. Your task is to convert the provided PDF text and form data
into structured JSON.

Rules:

1. Include all fields in the output JSON, even if their value is empty or null.
2. For checkboxes or radio buttons:
   -Include only the items that are selected (marked with '✔' or similar).
   - If no items in a checkbox/radio group are selected, set the field to null.
   - Do NOT output boolean true/false for unselected items.
   - Extract only the items that are marked as selected, indicated by a leading '✔' or similar mark.
    - Ignore items without any selection mark.
    - If no items are selected in a checklist, set it to null.

3. Split full names into 'FirstName' and 'LastName' if possible.
4. Group logically related fields together.
5. Output valid JSON only. No explanation, no extra text.
6.Extract data if it is split into two coloumns and/or numbered columns data

Here are the filled form fields (checkboxes, radios, etc.):

{form_fields_str}

Here is the extracted text from the PDF:

{raw_text}
"""
    return get_llm_response(prompt)


In [125]:
# Step 8: Run everything and show combined result
if file_path:
    # Extract raw text
    raw_text = extract_text_from_pdf(file_path)
    print("----- Raw Extracted Text -----")
    print(raw_text)
    
    # Extract PDF interactive form fields
    form_fields = extract_pdf_form_fields(file_path)
    print("----- PDF Form Fields -----")
    print(form_fields)

    # Convert raw text to JSON with LLM
    extracted_json = pdf_text_to_json(raw_text)
    print("----- Extracted JSON from Text -----")
    print(extracted_json)
else:
    print("Please upload a PDF file above and rerun this cell.")

----- Raw Extracted Text -----
An official website of the United States government Here's how you know
MENU
Export License Application
Status COMPLETED - APPROVED W/CONDITIONS
Contact Information
Reference Number
SLV0530
1. Contact Person (First Name, Last Name)
Shelley Vybiral
2. Telephone Number 3. Fax Number
6302003543 -
Email
shelley.vybiral@cmcelectronics.us
4. Creation Date
05/30/2025
5. Type of Application
Export License Application
Document Checklist
6. Documents submitted with application
✔ Export Items (BIS-748P-A)
End Users (BIS-748P-B)
BIS-711
✔ Import/End-User Certificate
Technical Specification
Letter of Explanation
Foreign Availability
Other Tell us what to improve
purchase order
7. Documents on file with applicant
BIS-711
Letter of Assurance
Import/End-User Certificate
Nuclear Certification
Other
-
License Information
9. Special Purpose
-
10. Resubmission ACN 11. Replacement License Number
- -
13. Import Certificate Country Import Certificate Number
- -
Applicant Inform

## Modifying the system message to change the LLM behavior 

Try changing/adding details in the "content" of the system message to change the LLM response
* For example, "You are a sarcastic AI assistant."
* Be sure to run the function cell each time you change the system message before you prompt the LLM.

Now give your prompt to the LLM:

In [None]:
prompt = "What is the capital of France?"
response = get_llm_response(prompt)
print(response)

Vary the system prompt a few times to see the behavior change!

## Modify the temperature to change the randomness of the output

Try changing the temperature value to make the response of the model more random and different each time
* For example, set the temperature to 1.0 or 0.7 and see what happens
* Be sure to run the function cell each time you change the temperature before you prompt the LLM.

In [None]:
def get_llm_response(prompt):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are an AI assistant.", 
            },
            {"role": "user", "content": prompt},
        ],
        temperature=0.0, # change this to a value between 0 and 2
    )
    response = completion.choices[0].message.content
    return response

Now give your prompt to the LLM

In [None]:
prompt = "What is the capital of France?"
response = get_llm_response(prompt)
print(response)

Change the temperature to a value greater than 0 and run the prompt cell a few times to see the response change!

## Using LLMs through the `aisetup` package

If you have installed aisetup on your own computer, you'll need to run an extra line of code to get your own API key into the notebook and accessible to the `print_llm_response` and `get_llm_response` functions:

In [None]:
from aisetup import authenticate, print_llm_response, get_llm_response

authenticate("YOUR API KEY HERE")

# Print the LLM response
print_llm_response("What is the capital of France")

# Store the LLM response as a variable and then print
response = get_llm_response("What is the capital of France")
print(response)

**Note:** Please follow best practices and **don't** expose your API KEY in any code you write! 

You can try this method instead:

In [None]:
from aisetup import authenticate, print_llm_response, get_llm_response
from dotenv import load_dotenv
import os

load_dotenv('.env', override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
authenticate(openai_api_key)

# Print the LLM response
print_llm_response("What is the capital of France")

# Store the LLM response as a variable and then print
response = print_llm_response("What is the capital of France")
print(response)

## Extra practice 

Ask the chatbot for help understanding how the `load_dotenv` code works. Ask for step-by-step instructions on how you can create and setup a `.env` file on your own computer.