# Structured output
Structured output is a models capability to output JSON, acquired during fine-tuning. 

To get the intution, lets implement a natural langugage processing parser that allows us to create structured tabular data from unstructued data. We can accomplish this by: 
- pipe user input into the LLM -> LLM outputs JSON -> Python picks it up and formats the JSON into HTML
- Without LLMs, this is not such an easy task to tackle. Its easy to build a demo, but not easy to build-high quality prodcut that handles edge cases well. 



In [1]:
import os
from openai import OpenAI
from openai.types.chat import ChatCompletion  


def eval(prompt: str, message: str, model: str = "gpt-4o") -> ChatCompletion:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": message},
    ]

    return client.chat.completions.create(
        model=model,
        messages=messages
    )

In [2]:
prompt = """
You are a data parsing assistant. 
User provides unstructued data containing addresses. 
Your goal is to output it as JSON.
"""
data =  """
The Ottawa Public Library is at 150 Elgin Street, Ottawa.
Down the street, Sarah Wilson runs her bakery at 240 Laurier Avenue, Ottawa.
Over in Kanata, Tech Corp's office is at 1385 Terry Fox Drive.
"""


res = eval(prompt=prompt, message=data)
json_data = res.choices[0].message.content

print(json_data)

```json
[
    {
        "name": "Ottawa Public Library",
        "address": "150 Elgin Street, Ottawa"
    },
    {
        "name": "Sarah Wilson's Bakery",
        "address": "240 Laurier Avenue, Ottawa"
    },
    {
        "name": "Tech Corp's Office",
        "address": "1385 Terry Fox Drive, Kanata"
    }
]
```


We can see that the model didn't return JSON, it returned markdown formated string containing JSON. The reason is that we didn't enable structured output in the API call.

In [4]:
def eval(prompt: str, message: str, model: str = "gpt-4o") -> ChatCompletion:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": message},
    ]

    return client.chat.completions.create(
        model=model,
        messages=messages,
        # Enable strctured output
        response_format={"type": "json_object"},
    )
prompt = """
You are a data parsing assistant. 
User provides unstructued data containing addresses. 
Your goal is to output it as JSON.
"""
data =  """
The Ottawa Public Library is at 150 Elgin Street, Ottawa.
Down the street, Sarah Wilson runs her bakery at 240 Laurier Avenue, Ottawa.
Over in Kanata, Tech Corp's office is at 1385 Terry Fox Drive.
"""


res = eval(prompt=prompt, message=data)
json_data = res.choices[0].message.content

print(json_data)


{
  "addresses": [
    {
      "name": "Ottawa Public Library",
      "address": "150 Elgin Street, Ottawa"
    },
    {
      "name": "Sarah Wilson's Bakery",
      "address": "240 Laurier Avenue, Ottawa"
    },
    {
      "name": "Tech Corp",
      "address": "1385 Terry Fox Drive, Kanata"
    }
  ]
}


Now, running the same code returns plain JSON. This is not only great because we don't need to parse anything extra but, but it also guarantees that the LLM won't include any free-from text such as "Sure, here is your data!{}"


The problem is, we don't have the data shaped defined; lets call it *schema*. Our schema is now up to the LLM, and it might change based on user input. Lets reformatt the data to see it in action. 

In [5]:
# Messy web-scraped format with typical HTML artifacts and inconsistent formatting
data_2 = """
[Search Results]
* Ottawa Public Library *
Contact Us > Main Branch
Located at: &nbsp;150 Elgin Street, Ottawa
Status: OPEN NOW! üìö
---------------------
<div class="business-listing">
Sarah's Bakery & Caf√© [‚≠êÔ∏è4.8]
Address line 1: 240 
Address line 2: Laurier Avenue
City: Ottawa
</div>
...Read More...
---------------------
TECH CORP GLOBAL
www.techcorp.com/contact
üìç 1385 Terry Fox Drive
Kanata, Ontario
[Click to view map]
Email: info@techcorp.com
"""

res = eval(prompt=prompt, message=data_2)
json_data = res.choices[0].message.content

print(json_data)


{
  "addresses": [
    {
      "name": "Ottawa Public Library - Main Branch",
      "address": {
        "street": "150 Elgin Street",
        "city": "Ottawa"
      },
      "status": "Open Now"
    },
    {
      "name": "Sarah's Bakery & Caf√©",
      "rating": "4.8",
      "address": {
        "line_1": "240 Laurier Avenue",
        "city": "Ottawa"
      }
    },
    {
      "name": "Tech Corp Global",
      "website": "www.techcorp.com/contact",
      "address": {
        "street": "1385 Terry Fox Drive",
        "city": "Kanata",
        "province": "Ontario"
      },
      "email": "info@techcorp.com"
    }
  ]
}
