Used Langchain, Claude Sonnet 3.5 LLM, AWS to classify text into categories or labels using chat models with structured outputs.

[Langchain Documentation](https://python.langchain.com/docs/tutorials/classification/)

Tagging means labeling a document with classes such as:

*   Sentiment
*   Language
*   Style (formal, informal etc.)
*   Covered topics
*   Political tendency

Installed all the necessary libraries for to run with Langchain

In [3]:
%pip install --upgrade --quiet langchain-core langchain-aws

Put in your API Keys for AWS and Ensure that they work

Select a chat model from AWS and Load it

In [5]:
# Ensure your AWS credentials are configured

from langchain.chat_models import init_chat_model

llm = init_chat_model("anthropic.claude-3-5-sonnet-20240620-v1:0", model_provider="bedrock_converse")

Let's specify a Pydantic model with a few properties and their expected type in our schema.

In [8]:
from langchain_core.prompts import ChatPromptTemplate
# from langchain_openai import ChatOpenAI -> Didn't use OpenAI Keys, so commented this line out
from pydantic import BaseModel, Field

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)


class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text")
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10"
    )
    language: str = Field(description="The language the text is written in")


# Structured LLM
structured_llm = llm.with_structured_output(Classification)

Let's call the LLM for the response

In [9]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response

Classification(sentiment='Positive', aggressiveness=1, language='Spanish')

If we want dictionary output, we can just call .model_dump()

In [10]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response.model_dump()

{'sentiment': 'Negative', 'aggressiveness': 9, 'language': 'Spanish'}

As we can see in the examples, it correctly interprets what we want.

The results vary so that we may get, for example, sentiments in different languages ('positive', 'enojado' etc.).

We will see how to control these results in the next section.

Fine Tuning the Classification Model

Careful schema definition gives us more control over the model's output.

Specifically, we can define:

*   Possible values for each property
*   Description to make sure that the model understands the property
*   Required properties to be returned

Let's redeclare our Pydantic model to control for each of the previously mentioned aspects using enums:

In [11]:
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive",
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., enum=["spanish", "english", "french", "german", "italian"]
    )

In [None]:
%pip install langchain langchain-aws pydantic

This code is the alternative for what the documentation has for using the OpenAI model.

In [13]:
from langchain_aws import ChatBedrockConverse

# Step 1: Initialize Claude via Bedrock Converse
llm = ChatBedrockConverse(
    model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",  # or another supported model
    temperature=0
).with_structured_output(Classification)

In [14]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='happy', aggressiveness=1, language='spanish')

Second Input is another spanish sentence, the model will classify the sentiment for this sentence.

In [16]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='sad', aggressiveness=5, language='spanish')

Final Input, let's see what the model tells us!

In [17]:
inp = "Weather is ok here, I can go outside without much more than a coat"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='neutral', aggressiveness=1, language='english')