# Sentiment Analysis with Laurium

This notebook demonstrates how to perform sentiment analysis using the laurium library with:
- Ollama as the LLM platform (Qwen2.5:7b model)
- Pydantic for structured output parsing
- Custom prompts for sentiment classification

## Overview

We'll build a sentiment classifier that:
1. Takes text input
2. Returns structured JSON with sentiment labels (1=positive, 0=negative)
3. Processes data in batches using pandas DataFrames

## Import Required Libraries

First, let's import all the necessary modules from laurium and supporting libraries.

In [None]:
import pandas as pd
from langchain_core.output_parsers import PydanticOutputParser

from laurium.decoder_models import extract, llm, prompts, pydantic_models

## 1. Create LLM Instance

We'll use Ollama with the Qwen2.5:7b model for our sentiment analysis. Setting temperature to 0.0 ensures consistent, deterministic outputs.

In [None]:
# Create LLM instance with Ollama platform
sentiment_llm = llm.create_llm(
    llm_platform="ollama", model_name="qwen2.5:7b", temperature=0.0
)

## 2. Define Schema and Build Prompt
The prompt is crucial for getting structured output.
We define our output schema once and use it in both the prompt and Pydantic
model. This eliminates the need to manually specify JSON format in the prompt.

**Key Points:**
- Clear instructions for sentiment classification
- Define schema once, use everywhere
- Automatic JSON format generation
- No manual format specification needed
- Ensures consistency between prompt and Pydantic model

In [None]:
schema = {"ai_label": int}  # 1 for positive, 0 for negative
descriptions = {
    "ai_label": "Sentiment classification (1=positive, 0=negative)"
}

# 3. Build prompt with automatic schema integration
system_message = prompts.create_system_message(
    base_message="You are a sentiment analysis assistant." \
    "Use 1 for positive sentiment, 0 for negative sentiment.",
    keywords=["positive", "negative"],
)

extraction_prompt = prompts.create_prompt(
    system_message=system_message,
    examples=None,
    example_human_template=None,
    example_assistant_template=None,
    final_query="Analyze this text: {text}",
    schema=schema,  # Automatically formats JSON structure in prompt
    descriptions=descriptions,  # Provides field context to LLM
)

In [None]:
### inspect prompt
print("Generated system message:")
print(extraction_prompt.messages[0].prompt.template)

## 3. Define Output Schema with Pydantic

We create a dynamic Pydantic model using the exact same schema and
descriptions we defined above. This ensures perfect consistency between what
the LLM is instructed to output and what our parser expects.

In [None]:
OutputModel = pydantic_models.make_dynamic_example_model(
    schema=schema, descriptions=descriptions, model_name="SentimentOutput"
)

# Create Pydantic output parser
parser = PydanticOutputParser(pydantic_object=OutputModel)

## 4. Create Batch Extractor

The BatchExtractor combines our LLM, prompt, and parser to process multiple texts efficiently.

In [None]:
# Create the batch extractor
extractor = extract.BatchExtractor(
    llm=sentiment_llm, prompt=extraction_prompt, parser=parser
)

## 5. Prepare Test Data

Let's create a small dataset with examples of positive and negative sentiment to test our classifier.

In [None]:
# Create test data with clear positive and negative examples
data = pd.DataFrame(
    {
        "text": [
            "I absolutely love this product!",
            "This is terrible, worst purchase ever.",
            "Great value for money, highly recommend!",
        ]
    }
)

## 6. Process Data and View Results

Now we'll run our sentiment analysis on the test data and examine the results.

In [None]:
extractor.process_chunk(data, text_column="text")