Using LLAMA with ollama structured outputs to properly handle the JSON outputs.



In [9]:
from ollama import chat
from pydantic import BaseModel

class Result(BaseModel):
    name: str
    amount: int
    department: str
    interval: str

response = chat(
    messages=[
        {
            'role': 'system',
            'content':
            """
            You are an AI-powered email parsing tool designed to extract donation-related information with high precision and consistency.

            Extraction Guidelines:
            - Currency Detection: Recognize amounts in multiple currencies (e.g., $, €, £, ¥)
            - Name Recognition: Extract full names, first names
            - Interval Parsing: Identify donation frequencies including variations like:
            * "monthly" / "month" 
            * "yearly" / "annual" / "per year"
            * "one-time" / "single" / "once"
            - Department: Identify the department or faculty associated with the donation (e.g., "Computer Science", "Price Faculty of Engineering")

            Parsing Considerations:
            - Handle variations in amount formatting (e.g., "$50", "50 USD", "50.00")
            - Extract information regardless of email section (subject, body, signature)
            - Case-insensitive matching for keywords

            Error Handling:
            - If multiple conflicting extractions exist, prioritize:
            1. Most recent mention
            2. Most explicit statement
            3. Full amount over partial amount
            - Return null if no confident extraction is possible

            Extraction Precision:
            - Aim for 90%+ accuracy in extracting donation details
            - Prioritize complete, unambiguous extractions
            - When in doubt, return null rather than guessing

            Output Format:
            You must always return the extracted data in the following JSON format, and nothing else:
            ```json
            {
            "amount": "[Extracted Amount or null]",
            "interval": "[Extracted Interval or null]",
            "name": "[Extracted Name or null]",
            "department": "[Extracted Department or null]",
            }
            ```
            }
            Important Rules:

            Do not include any additional text, explanations, or comments outside the JSON object.
            If any of the fields cannot be identified from the email, their value must be null.
            If the email contains no donation-related information, return:
            ```json
            {}
            ```
            """,
        },
        {
        'role': 'user',
        'content': 
            """
            Hi, a piano I have. Big and good, it is. Students use it, I want. How give it, can I? Olga
            """,
        }
    ],
    model='gemma2:2b',
    format=Result.model_json_schema()
)

donor = Result.model_validate_json(response.message.content)
print(donor)





name='Olga' amount=0 department='Students use it, I want' interval='null'


We see that this model does better on extracting the data that is more relavant but struggles on handling multiple contexts in a message.