In [12]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# LLaMaBot's PydanticBot in under 5 minutes

We want to be able to pull structured data out of unstructured text. We want to be able to validate the data we've extracted matches the structured format we expect.

In [13]:
# load in unstructured text data

with open("../blog_text.txt", "r+", encoding="utf8") as f:
    blog_text = f.read()
blog_text[0:100] + "..."

'Large Language Models (LLMs) are having a moment now!\nWe can interact with them programmatically in ...'

Using Pydantic, we can define a class and some validation rules

In [14]:
from typing import List
from pydantic import BaseModel, Field, field_validator

class TopicExtract(BaseModel):
    """ a list of topics that is contained in the given text """
    topics: List[str] = Field(
        description="A list of upto 5 topics that this text is about. Each topic should be at most 1 or 2 word descriptions.")

    @field_validator('topics')
    def validate_topics(cls, topics):
        # validate that the list of topics contains atleast 1, and no more than 5 topics 
        if len(topics) <= 0 or len(topics) > 5:
            raise ValueError('The list of topics can be no more than 5 items')

        # for each topic the model generated, ensure that the topic contains no more than 2 words
        for topic in topics:
            if len(topic.split()) > 2:
                raise ValueError('A topic can contain AT MOST 2 words')
        return topics

Now we can initialize the PydanticBot and assign this model to it.

In [15]:
from llamabot.prompt_manager import prompt
from llamabot.bot.pydanticbot import PydanticBot


@prompt
def write_system_schema_prompt(schema):
    """You are an expert topic labeller.
    You read text and extract the topics the text is about."
    
    Your task is to return the topics in a json object that matches the following json_schema:
    ```{{ schema }}```

    Only return an INSTANCE of the schema, do not return the schema itself.
    """

bot = PydanticBot(
    system_prompt=write_system_schema_prompt(TopicExtract.schema_json()),
    session_name="session_name",
    model_name = "ollama/llama3:latest",
    temperature=0,
    stream_target="stdout",
    pydantic_model=TopicExtract
)

Now we can pass in our text, and extract the topics

In [16]:
unstructured_text = blog_text[0:1000]

extract = bot(unstructured_text)

Here is the output in JSON format:

```
{
    "topics": [
        "Large Language Models",
        "APIs and Abstractions",
        "Content Generation",
        "Blog Writing",
        "Automation"
   



 ]
}
```I apologize for the mistake! Here's another attempt at extracting topics from the text, this time ensuring that each topic is a single phrase or at most 2 words:

```
{
    "topics": [
        "Large Language Models",
        "OpenAI API",
        "LangChain Abstractions",
        "Content Generation",
        "Blog Writing"
    ]
}
```



I apologize for the mistake! Here's another attempt at extracting topics from the text, this time ensuring that each topic is a single phrase or at most 2 words:

```
{
    "topics": [
        "Large Language",
        "OpenAI API",
        "Content Generation",
        "Blog Writing",
        "Automation"
    ]
}
```



In [17]:
for topic in extract.topics:
    print(topic)

Large Language
OpenAI API
Content Generation
Blog Writing
Automation
