In [None]:
!pip install openai==1.55.3 httpx==0.27.2 --force-reinstall --quiet


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 1.24.0 requires anyio<4,>=3.1.0, but you have anyio 4.7.0 which is incompatible.[0m[31m
[0m

In [None]:
from openai import OpenAI
import json
import requests
import pandas
from markdown import markdown
from IPython.display import Markdown


# Interacting with LLMs

In this class, we'll use the GPT API via OpenAI's package.
</br></br>
First we need to setup the package with an API key. We're using a temporary API from the Economics Observatory to avoid the need to sign up for your own OpenAI key.

This key is limited:
- Maximum spend of <$1 total
- Restricted to gpt-4o-mini
- No support for images, fine-tuning, etc

In [None]:

client = OpenAI(
    base_url="https://qkn7siqnh6.execute-api.eu-west-2.amazonaws.com/", # If you are using the OpenAI API directly, you'd omit this line
    api_key="your github username"                                               # ... and use an api_key direct from OpenAI
)

</br></br>

We can now interact with the LLM with just a few lines of code.

In [None]:
prompt = "Write a haiku about the LSE centre building."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a critically acclaimed poet."},
        {"role": "user", "content": prompt},
    ],
)

response.choices[0].message.content

NameError: name 'client' is not defined

# Using the API: Source synthesis

</br></br>

LLMs generates text in response to a prompt. They're a useful tool for automating the extraction of 'fuzzy' information from text - that is information that is not easily quantified or categorised by a computer. For example, it's difficult to write a program that can extract the main arguments from a text, or the ingredients from a recipe.


In [None]:
debate = """
James Cartlidge
(South Suffolk) (Con)
(Urgent Question): To ask the Secretary of State for Defence if he will make a statement on the impact of the Government’s Chagos negotiations on the UK-US defence relationship.
The Minister for the Armed Forces
(Luke Pollard)
 Share this specific contribution
I congratulate the hon. Gentleman on securing this urgent question. The Secretary of State has asked me to respond on behalf of the Department.

On 3 October, the UK and Mauritius reached an historic agreement to secure the important UK-US military base on Diego Garcia, which plays a crucial role in regional and international security. The agreement secures the effective operation of the joint facility on Diego Garcia well into the next century. The agreement is strongly supported by our closest friends and allies, including the United States. It has been supported by all relevant US Departments and agencies, following a rigorous scrutiny process.

This base is a key part of UK-US defence relationships, as it enables the United Kingdom and the United States to support operations that demonstrate our shared commitments to regional stability, provide a rapid response to crises and counter some of the most challenging security threats we face. The President of the United States applauded the agreement. To quote him directly:

“It is a clear demonstration that through diplomacy and partnership, countries can overcome long-standing historical challenges to reach peaceful and mutually beneficial outcomes.”

Several other countries and organisations, including India, the African Union, the UN Secretary-General and others, have welcomed and applauded this historic political agreement.

Our primary goal throughout these negotiations, which started over two years ago under the previous Government, was to protect the joint UK-US military base on Diego Garcia. There will be clear commitments in the treaty to robust security arrangements, including arrangements preventing the presence of foreign security forces on the outer islands, so that the base can continue to operate securely and effectively. The operation of the base will continue unchanged, with strong protections from malign influence.

For the first time in 50 years, the base will be undisputed and legally secure. Continued uncertainty would be a gift to our adversaries. That is why the agreement has been welcomed by all parts of the US system, and other critical regional security partners. Agreeing the deal now, on our terms, meant that we were able to secure strong protections that will allow the base to operate as it has done. We look forward to engaging with the upcoming US Administration on this and many other aspects of the UK-US special relationship.

Finally, hon. Members can be reassured that the long-term protection of the base on Diego Garcia has been the shared UK and US priority throughout, and this agreement secures its future. We would not have signed off on an agreement that compromised any of our security interests, or those of the US and our allies and partners.
Column 26is located here
James Cartlidge
 Share this specific contribution
Thank you, Mr Speaker, for granting this urgent question.

At a time when we face the most challenging military threats for years, surely our top priority should be to preserve the strongest possible US-UK relations, given that this is so vital to our national security, yet it appears that the Government are seeking to agree a deal surrendering the sovereignty of the Chagos islands before President Trump is formally in post. We know that the new US Administration are concerned about the Government’s deal because presumptive nominee US Secretary of State Marco Rubio has said that the deal

“poses a serious threat to our national security interests”.

He has also suggested that

“it would provide an opportunity for communist China to gain valuable intelligence on our naval support facility”.

Let us be clear: our military base on Diego Garcia is a vital strategic asset for the UK in the Indian ocean, and it is critical to our presence and posture in the Indo-Pacific region. In particular, it is an especially important base for the United States, and we believe that anything that damages its defence posture, particularly in relation to China, also undermines our national security. We understand that the new Mauritius Government have now launched a review of the deal.

Will the Minister therefore confirm that the Government’s policy really is to try to rush through their Chagos deal before President Trump’s inauguration? Does he not see how that would be hugely disrespectful to the new Administration and President Trump’s democratic mandate? Given that we now know it is common for the MOD to state the cost of overseas bases, will he be transparent and finally tell the House how much we will have to pay to rent back the vital military base that we currently own?

Finally, although we would prefer the Government to cancel the whole deal, at the very least will the Minister pause any further ratification until the new US Administration are in place and the Mauritius Government have concluded their review?

"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are an analyst who summarises parliamentary debates."},
        {"role": "user", "content": " Summarise this debate. Give me very brief responses with heading for what the debate is about, and what the main arguments are (with who made them)." + debate},
    ],
)

Markdown(response.choices[0].message.content)

</br></br>
</br></br>

## Example: Identifying ingredients in a recipe

In this example, we'll use an LLM to parse a natural language recipe. We'll extract the ingredients, kitchen appliances required, cooking method and a suggestion for how to make it healthier.

</br></br>

In [None]:
recipe = """
# Miso-Butter Roast Chicken With Acorn Squash Panzanella

Pat chicken dry with paper towels, season all over with 2 tsp. salt, and tie legs together with kitchen twine. Let sit at room temperature 1 hour.
Meanwhile, halve squash and scoop out seeds. Run a vegetable peeler along ridges of squash halves to remove skin. Cut each half into \u00bd\"-thick wedges; arrange on a rimmed baking sheet.
Combine sage, rosemary, and 6 Tbsp. melted butter in a large bowl; pour half of mixture over squash on baking sheet. Sprinkle squash with allspice, red pepper flakes, and \u00bd tsp. salt and season with black pepper; toss to coat.
Add bread, apples, oil, and \u00bc tsp. salt to remaining herb butter in bowl; season with black pepper and toss to combine. Set aside.
Place onion and vinegar in a small bowl; season with salt and toss to coat. Let sit, tossing occasionally, until ready to serve.
Place a rack in middle and lower third of oven; preheat to 425\u00b0F. Mix miso and 3 Tbsp. room-temperature butter in a small bowl until smooth. Pat chicken dry with paper towels, then rub or brush all over with miso butter. Place chicken in a large cast-iron skillet and roast on middle rack until an instant-read thermometer inserted into the thickest part of breast registers 155\u00b0F, 50\u201360 minutes. (Temperature will climb to 165\u00b0F while chicken rests.) Let chicken rest in skillet at least 5 minutes, then transfer to a plate; reserve skillet.
Meanwhile, roast squash on lower rack until mostly tender, about 25 minutes. Remove from oven and scatter reserved bread mixture over, spreading into as even a layer as you can manage. Return to oven and roast until bread is golden brown and crisp and apples are tender, about 15 minutes. Remove from oven, drain pickled onions, and toss to combine. Transfer to a serving dish.
Using your fingers, mash flour and butter in a small bowl to combine.
Set reserved skillet with chicken drippings over medium heat. You should have about \u00bc cup, but a little over or under is all good. (If you have significantly more, drain off and set excess aside.) Add wine and cook, stirring often and scraping up any browned bits with a wooden spoon, until bits are loosened and wine is reduced by about half (you should be able to smell the wine), about 2 minutes. Add butter mixture; cook, stirring often, until a smooth paste forms, about 2 minutes. Add broth and any reserved drippings and cook, stirring constantly, until combined and thickened, 6\u20138 minutes. Remove from heat and stir in miso. Taste and season with salt and black pepper.
Serve chicken with gravy and squash panzanella alongside."""

In [None]:
prompt = "From this recipe, tell me what are the ingredients and kitchen appliances needed? Give me a suggestion of how to make it healthier. Recipe:"

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are an assistant that summarises recipes. Return brief bullet points for responses."},
        {"role": "user", "content": prompt + recipe},
    ]
)

Markdown(response.choices[0].message.content)

</br></br>

## Structuring the prompt and response

This is great, but we've given the LLM normal unstructured text and got back only marginally more structured text. We can do better by giving the LLM instructions to return data in a structured format - JSON. This will allow us to easily extract the information we want from the response.

In [None]:
prompt = """From this recipe, extract the following details and return them in a structured JSON format:
1. A list of ingredients.
2. A list of kitchen appliances required.
3. A suggestion on how to make the recipe healthier.

The response **must** follow this exact JSON structure:
{
    "ingredients": ["List of ingredients as strings"],
    "kitchen_appliances": ["List of kitchen appliances as strings"],
    "health_suggestion": "A single string suggestion to make the recipe healthier"
}

Recipe:"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a JSON generator for recipes. Always return JSON responses following the specified format."},
        {"role": "user", "content": prompt + recipe},
    ]
)

# Print the JSON response
print(response.choices[0].message.content)

The response we get back is still just text, but we can turn it into a JSON object with a few lines of code.

In [None]:
parsed_recipe = json.loads(response.choices[0].message.content)
parsed_recipe

### Exercise: Customise the prompt

What else can we learn about the recipe? Add more questions to the prompt to extract more information.

# Looping over multiple recipes

We've learnt about one recipe but the value of LLMs come from their ability to process large amounts of text quickly. We can loop over multiple recipes and extract the information we want.

</br></br>

recipes.json contains 5000 recipes. We'll loop over some to parse their information.

In [None]:

req = requests.get("https://raw.githubusercontent.com/jhellingsdata/RADataHub/refs/heads/main/misc/LLM_practical/recipes.json")
recipes = req.json()

How do these recipes look?

In [None]:
recipes[100]

'# Stuffed Eggplants and Zucchini in a Rich Tomato Sauce (Baatingan w Kusaa Bil Banadoura)\n\n\n## Instructions\n\nTo make the sauce, put the oil into a saucepan or casserole pan with a lid—about 10 inches/25cm wide—and place over medium heat. Add the onions and cook for about 10 minutes, stirring frequently, until soft and caramelized. Add the rest of the sauce ingredients, along with 2½ tsp of salt and a good grind of black pepper. Simmer over medium heat for about 10 minutes, stirring from time to time, then remove from the heat and set aside.\nTo make the stuffing, while the sauce is cooking, place all the ingredients in a large bowl with 1½ tsp of salt and a good grind of black pepper. Mix well, using your hands to make sure that everything is well incorporated. If making in advance, keep in the fridge until ready to use.\nTrim the stalks from the eggplants, then insert a manakra (or peeler or corer) into the eggplant; you want it to be very close to the skin—about ⅛ inch/3mm away

Now we can prepare our loop to extract the information we want from each recipe.

In [None]:
openai_responses = []

for i, recipe in enumerate(recipes[:10]):
    print(f"Processing recipe {i+1} of {len(recipes)}")
    prompt = """From this recipe, extract the following details and return them in a structured JSON format:
        1. Title of the recipe.
        2. The cuisine of the recipe.
        3. A list of kitchen appliances required.
        4. Is this recipe vegetarian or not?
        5. A suggestion on how to make the recipe healthier.
        6. An int indicator (1-10) of how easy the recipe is to make.
        7. An estimated time to prepare the recipe (minutes).

        The response **must** follow this exact JSON structure:
        {
            "title": "Title of the recipe as a string",
            "cuisine": "Cuisine of the recipe as a string",
            "ingredients": ["List of ingredients as strings"],
            "kitchen_appliances": ["List of kitchen appliances as strings"],
            "vegetarian": "Boolean value indicating if the recipe is vegetarian or not",
            "health_suggestion": "A single string suggestion to make the recipe healthier",
            "difficulty": "An int indicator (1-10) of how easy the recipe is to make"
            "prep_time": "An int indicating the estimated time in minutes to prepare the recipe (minutes)"
        }

            Recipe:"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        response_format={
            "type": "json_object"
        },
        messages=[
            {"role": "system", "content": "You are a JSON generator for recipes. Always return JSON responses following the specified format."},
            {"role": "user", "content": prompt + recipe},
        ]
    )

    openai_responses.append(json.loads(response.choices[0].message.content))


Processing recipe 1 of 5000


NameError: name 'client' is not defined

Let's take a look at some of them

In [None]:
openai_responses[0]

We've parsed some recipes!

Earlier, I processed the whole dataset. Let's load this file to continue our analysis.

In [None]:
req = requests.get("https://raw.githubusercontent.com/jhellingsdata/RADataHub/refs/heads/main/misc/LLM_practical/full_5k-recipe_responses.json")
openai_responses = req.json()

## Challenge: What charts can you make from this?