<a href="https://colab.research.google.com/github/sampathk-hps/langchain-fundamentals-colab/blob/main/LangChain_3_Classify_Text_into_Labels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Tagging means labeling a document with classes such as:

* Sentiment
* Language
* Style (formal, informal etc.)
* Covered topics
* Political tendency

Tagging has a few components:

1. function: Like extraction, tagging uses functions to specify how the model should tag a document
2. schema: defines how we want to tag the document

In [None]:
%pip install -U langchain-core

Collecting langchain-core
  Downloading langchain_core-0.3.77-py3-none-any.whl.metadata (3.2 kB)
Downloading langchain_core-0.3.77-py3-none-any.whl (449 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m449.5/449.5 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain-core
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 0.3.76
    Uninstalling langchain-core-0.3.76:
      Successfully uninstalled langchain-core-0.3.76
Successfully installed langchain-core-0.3.77


In [None]:
%pip install -qU langchain-perplexity

In [None]:
import getpass
import os

if not os.environ.get("PERPLEXITY_API_KEY"):
    os.environ["PERPLEXITY_API_KEY"] = getpass.getpass("Perplexity API Key:")

from langchain.chat_models import init_chat_model
llm = init_chat_model(model='sonar', model_provider='perplexity', )

In [None]:
import getpass
import os

# Set the OPENAI_API_KEY environment variable for Perplexity
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Perplexity API Key:")

from langchain.chat_models import init_chat_model

# Initialize the Perplexity chat model
llm = init_chat_model(model='sonar', model_provider='perplexity')


In [None]:
from langchain_core.prompts import ChatPromptTemplate
# from pydantic import BaseModel, Field: Imports BaseModel and Field from Pydantic, which are used to define the structure (schema) of the desired output.
from pydantic import BaseModel, Field

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

# class Classification(BaseModel): ...: Defines the Pydantic model Classification. This acts as the schema, specifying the expected fields (sentiment, aggressiveness, language) and their data types and descriptions.
class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text")
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10"
    )
    language: str = Field(description="The language the text is written in")

# Structured LLM: It takes the initialized language model (llm) and wraps it with the with_structured_output method. This method tells the language model to format its output according to the Classification Pydantic model.
structured_llm = llm.with_structured_output(Classification)


In [None]:
inp = "This is the best pizza i have ever had. Toppings were so perfect, i loved it!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

# response
response.model_dump()

{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'english'}

## Finer control

Careful schema definition gives us more control over the model's output.

Specifically, we can define:

1. Possible values for each property
2. Description to make sure that the model understands the property
3. Required properties to be returned

In [None]:
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive",
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., description="The language the text is written in", enum=["spanish", "english", "french", "german", "italian"]
    )

# Structured LLM
structured_llm = llm.with_structured_output(Classification)

In [None]:
inp = "This is the best pizza i have ever had. Toppings were so perfect, i loved it!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

# response
response.model_dump()

{'sentiment': 'happy', 'aggressiveness': 1, 'language': 'english'}

In [None]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
structured_llm.invoke(prompt)

Classification(sentiment='neutral', aggressiveness=5, language='spanish')