In [None]:
!pip install -qU langchain langchain-openai

# Classify Text into Labels

Tagging means labeling a document with classes such as
* Sentiment
* Language
* Style (formal, informal, etc.)
* Covered topics
* Political tendency

Tagging has a few components:
* `function`: like extraction, tagging uses functions to specify how the model should tag a document
* `schema`: defines how we want to tag the document

## Setup

In [None]:
import os

langchain_api_key = 'your_langchain_api_key_here'  # Replace with your actual LangChain API key
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

openai_api_key = 'your_openai_api_key_here'  # Replace with your actual OpenAI API key
os.environ['OPENAI_API_KEY'] = openai_api_key

## Quickstart

Start with specifying a Pydantic model with a few properties and their expected type in our schema.

In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field


tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)


class Classification(BaseModel):
    sentiment: str = Field(description='The sentiment of the text.')
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10?"
    )
    language: str = Field(description="The language of the text is written in")
    translated: str = Field(description='Translate this language into English')


# LLM
llm = ChatOpenAI(
    model='gpt-3.5-turbo',
    temperature=0
).with_structured_output(Classification)

tagging_chain = tagging_prompt | llm

Test:

In [8]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
tagging_chain.invoke({"input": inp})

Classification(sentiment='positive', aggressiveness=1, language='Spanish', translated='I am incredibly happy to have met you! I think we will be very good friends!')

If we want dictionary output, we can just call `.dict()`

In [9]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
results = tagging_chain.invoke({"input": inp})
results.dict()

{'sentiment': 'negative',
 'aggressiveness': 8,
 'language': 'Spanish',
 'translated': "I am very angry with you! I'm going to give you what you deserve!"}

It correctly interprets what we want.

## Finer control

Careful schema definition gives us more control over the model's output. We can define:
* Possible values for each property
* Description to make sure that the model understands the property
* Required properties to be returned

We can re-declare our Pydantic model to control for each of the previously mentioned aspects using `enum`s:

In [10]:
class ClassificationRefine(BaseModel):
    sentiment: str = Field(..., enum=['happy', 'neutral', 'sad'])
    aggressiveness: int = Field(
        ...,
        description="describes how agressive the statement it, the higher number the more aggressive",
        enum=[1,2,3,4,5],
    )
    language: str = Field(
        ...,
        enum=['spanish', 'english', 'french', 'german', 'italian']
    )
    translated: str = Field(
        ...,
        description="Translate this language into English"
    )

In [11]:
tagging_prompt = ChatPromptTemplate.from_template(
"""
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

llm = ChatOpenAI(
    model='gpt-3.5-turbo',
    temperature=0
).with_structured_output(ClassificationRefine)

tagging_chain = tagging_prompt | llm

In [12]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
tagging_chain.invoke({"input": inp})

ClassificationRefine(sentiment='happy', aggressiveness=1, language='spanish', translated='I am incredibly happy to have met you! I think we will be very good friends!')

In [13]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
tagging_chain.invoke({"input": inp})

ClassificationRefine(sentiment='sad', aggressiveness=5, language='spanish', translated='I am very angry with you! I am going to give you what you deserve!')

In [14]:
inp = "Weather is ok here, I can go outside without much more than a coat"
tagging_chain.invoke({"input": inp})

ClassificationRefine(sentiment='neutral', aggressiveness=2, language='english', translated='Weather is ok here, I can go outside without much more than a coat')