# Introduction to multimodal models in Opper

This notebook will show you how to use multimodal models in Opper. We will use the OpenAI gpt-4o model to extract the recipe from an image and translate the weight units to metric.

In [1]:
%pip install -U opperai

from opperai import fn
from opperai.types import ImageContent
from pydantic import BaseModel, Field
from typing import List, Optional

# import os
# os.environ['OPPER_API_KEY'] = 'YOUR_API_KEY'

Note: you may need to restart the kernel to use updated packages.


Our end goal with this excercise is to use gpt-4o to extract a recipe from an image and transform it into a more structured format that can be used for further processing. The image we are working with is 

<img src="dumplings.png" width="500">

An image two pages of a book picturing a dumplings recipe with ingredients and instructions.

First we define a function that given an image returns a description of the image. To pass in an image as an argument we use the special `ImageContent` class.

In [2]:
@fn(model="openai/gpt-4o")
def describe(image=ImageContent) -> str:
    """given an image return the description of that image"""

res = describe(ImageContent.from_path("dumplings.png"))

print(res)

The image features a recipe for 'Canice's Pork and Chive Dumplings' located on pages 60 and 61. It includes a list of ingredients and directions for preparing the dish. The ingredients list includes medium ground pork, chives, garlic, ginger, light soy sauce, Shaoxing rice cooking wine, Chinese chili oil, sesame oil, broth, cornstarch, sugar, white pepper, and salt. The directions are detailed in steps from finely mincing the ingredients to folding, freezing, and cooking the dumplings. Additionally, there are cooking tips for frying and boiling the dumplings, as well as serving suggestions. Page 61 features three images showing the process of assembling the dumplings.


Since we want to extract the recipe we need to define a Pydantic model that will be used to parse the recipe. We do this by defining a `Recipe` class that inherits from `BaseModel`.

In [3]:
class Recipe(BaseModel):
    title: str
    ingredients: List[str]
    instructions: List[str]


@fn(model="openai/gpt-4o")
def extract(image=ImageContent) -> Recipe:
    """given an image, extract the recipe"""

res = extract(ImageContent.from_path("dumplings.png"))

print(res)

title='Canice’s Pork and Chive Dumplings' ingredients=['1.5 lb medium grown pork (optional: sub in 0.5 lb of finely chopped shrimp)', '1 package wrappers (1 lb)', '1 bunch flowering chives or Chinese chives', '2-4 garlic cloves, to taste', '2 tbsp fresh ginger', '2 tbsp light soy sauce', '2 tbsp Shaoxing rice cooking wine (the brown varieties have more flavour, avoid the clear wines)', '2 tbsp Chinese chili oil', '1-2 tbsp sesame oil', 'A few tablespoons of broth (optional)', '1 tbsp cornstarch', '1 tsp sugar', '1 tsp white pepper', '1 tsp salt'] instructions=['1. Finely mince or food process the chives, garlic and ginger.', '2. Add to bowl with pork, dump in all the seasoning (and broth if you have it). Stir, in only one direction, until smooth, even a little sticky. ‘Beating in’ the liquid incorporates it into the meat and makes it springy, instead of shrinking while cooking and leaving you with a saggy, empty bag of skin.', '3. Start folding: put about 1 tbsp filling in the centre o

To capture the ingredients a bit more detailed we define an `Ingredient` model that will be used to parse the ingredients. This model will have a `item`, `amount` and `unit` field as well as an optional `notes` field to capture any notes about the ingredient.

In [4]:
class Ingredient(BaseModel):
    item: str
    amount: float
    unit: str
    notes: Optional[str] = None

class Recipe(BaseModel):
    title: str
    ingredients: List[Ingredient]
    instructions: List[str]


@fn(model="openai/gpt-4o")
def extract(image=ImageContent) -> Recipe:
    """given an image, extract the recipe"""

recipe = extract(ImageContent.from_path("dumplings.png"))

print(recipe)

Now we have the recipe we can translate the units to metric. We do this by defining a function that takes a `Recipe` and returns a `Recipe` with the units translated.

In [None]:
@fn(model="openai/gpt3.5-turbo")
def translate_to_metric(recipe: Recipe) -> Recipe:
    """Given a recipe, translate the weight units to metric system, keep tbsp and tsp as is"""


translated = translate_to_metric(recipe)

print(translated)

title='Canice’s Pork and Chive Dumplings' ingredients=[Ingredient(item='Medium ground pork', amount=0.68, unit='kg', notes='optional: sub 0.5 lb of finely chopped shrimp'), Ingredient(item='Wrapper package', amount=1.0, unit='package', notes='(1 lb)'), Ingredient(item='Flowering chives or Chinese chives', amount=1.0, unit='bunch', notes=None), Ingredient(item='Garlic cloves', amount=2.0, unit='', notes='to taste'), Ingredient(item='Fresh ginger', amount=2.0, unit='tbsp', notes=None), Ingredient(item='Light soy sauce', amount=2.0, unit='tbsp', notes=None), Ingredient(item='Shaoxing rice cooking wine', amount=2.0, unit='tbsp', notes='the brown varieties have more flavour, avoid the clear wines'), Ingredient(item='Chinese chili oil', amount=2.0, unit='tsp', notes=None), Ingredient(item='Sesame oil', amount=1.0, unit='tbsp', notes=None), Ingredient(item='Broth', amount=0.0, unit='tablespoons', notes='optional'), Ingredient(item='Cornstarch', amount=1.0, unit='tbsp', notes=None), Ingredient

Finally we want to scale the recipe to the number of people. We do this by defining a function that takes a `Recipe` and returns a `Recipe` with the ingredients scaled.

Since some ingredients do not scale linearly (such as salt) we allow the model to add notes about such ingredients. We do this by defining a `RecipeWithNotes` class that inherits from `Recipe` and adds a `notes` field.

In [None]:
class RecipeWithNotes(Recipe):
    notes: Optional[str] = Field(None, description="Additional notes for the recipe for example if not all ingredients scale well")

@fn(model="openai/gpt-4o")
def scale(recipe: Recipe, people: int) -> RecipeWithNotes:
    """Given a recipe, scale the ingredients to the number of people"""

scaled = scale(translated, 10)
print(scaled)

title='Canice’s Pork and Chive Dumplings' ingredients=[Ingredient(item='Medium ground pork', amount=1.36, unit='kg', notes='optional: sub 0.5 lb of finely chopped shrimp'), Ingredient(item='Wrapper package', amount=2.0, unit='package', notes='(1 lb each)'), Ingredient(item='Flowering chives or Chinese chives', amount=2.0, unit='bunch', notes=None), Ingredient(item='Garlic cloves', amount=4.0, unit='', notes='to taste'), Ingredient(item='Fresh ginger', amount=4.0, unit='tbsp', notes=None), Ingredient(item='Light soy sauce', amount=4.0, unit='tbsp', notes=None), Ingredient(item='Shaoxing rice cooking wine', amount=4.0, unit='tbsp', notes='the brown varieties have more flavour, avoid the clear wines'), Ingredient(item='Chinese chili oil', amount=4.0, unit='tsp', notes=None), Ingredient(item='Sesame oil', amount=2.0, unit='tbsp', notes=None), Ingredient(item='Broth', amount=0.0, unit='tablespoons', notes='optional'), Ingredient(item='Cornstarch', amount=2.0, unit='tbsp', notes=None), Ingre

This is the end of the tutorial. We have now seen how to use images together with multimodal models and structured data in Opper.