<a href="https://colab.research.google.com/github/ramahasiba/NLP/blob/LangChain/Classify_Text_into_Labels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Classify Text into Labels](https://python.langchain.com/docs/tutorials/classification/)

In [1]:
!pip install --upgrade -q langchain-core

In [2]:
!pip install -q dotenv

In [5]:
!pip install langchain_openai -q

In [7]:
!pip install -U langchain-groq -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/130.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.8/130.8 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h

## Setup

In [3]:
import os
from pprint import pprint
from dotenv import load_dotenv
import getpass

try:
  load_dotenv('.env')
except ImportError:
  print('No .env file found')

# Setup LangSmith to be able to inspect what exactly goes inside my chain or agent
os.environ["LANGSMITH_TRACING"] = "true"
if "LANGSMITH_API_KEY" not in os.environ:
  os.environ["LANGSMITH_API_KEY"] = getpass.getpass(
      prompt = "Enter the Langsmith api key:"
  )

if "LANGSMITH_PROJECT" not in os.environ:
  os.environ["LANGSMITH_PROJECT"] = getpass.getpass(
      prompt = "Enter langsmith project name: "
  )
  if not os.environ.get("LANGSMITH_PROJECT"):
    os.environ["LANGSMITH_PROJECT"] = "default"

os.environ["GROQ_API_KEY"] = os.getenv('GROQ_API_KEY')
os.environ["HF_TOKEN"] = os.getenv('HF_TOKEN')

Enter langsmith project name: ··········


In [8]:
from langchain.chat_models import init_chat_model # Chat model is unstance of the runnable interface
model_name = "llama3-70b-8192"

llm=init_chat_model(model_name, model_provider="groq")

## Define a Pydantic model with properties and their expectedtype in the schema

In [9]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

tagging_prompt = ChatPromptTemplate.from_template(
    """
    Extract the desired information from the following passage.

    Only extract the properties mentioned in the 'Classification' function.

    passage: {input}
    """
)

class Classification(BaseModel):
  sentiment: str = Field(description="The sentiment of the text")
  aggressiveness: int = Field(
      description="How aggressive the text is on a scale from 1 to 10"
  )
  language: str = Field(description="The language the text is written in")

# Structured LLM
structured_llm = llm.with_structured_output(Classification)

the `with_structured_output` method takes a schema, which can be specified as a TypedDict class, JSON Schema or a Pydantic class, the method specifies the names, types, and descriptions of the desired output attributes. it returns a model like runnable.

In [10]:
inp =  "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({
    "input": inp
})

In [11]:
response  = structured_llm.invoke(prompt)
response

Classification(sentiment='Positive', aggressiveness=1, language='Spanish')

if any of the attributes is missed or have wrong type, then the Pydantic schema will raise an error.

### Generate a dictionary output

In [13]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response.model_dump()

{'sentiment': 'angry', 'aggressiveness': 10, 'language': 'Spanish'}

## Finer Control
we do this control to get more accurate result, where careful schema definition gives us more control over the model's output. We can define:
- possible values for each property
- Description to make sure that the model understands the property.
- Required properties to be returned


In [None]:
class Classification(BaseModel):
  sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
  aggressiveness: int = Field(
      ...,
      description = "describe how aggressive the statement is, the higher the number the more aggressive",
      enum=[1, 2, 3, 4, 5]
  )
  language: str = Field(
      ...,
      enum=["spanish", "english", 'french', "german", "italian"]
  )

In [14]:
tagging_prompt = ChatPromptTemplate.from_template(
    """
    Extract the desired information from the following passage.

    Only extract the properties mentioned int the 'Classification' function.

    Passage: {input}
    """
)

structured_llm = llm.with_structured_output(Classification)

In [16]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
structured_llm.invoke(prompt)

Classification(sentiment='Positive', aggressiveness=1, language='Spanish')

In [18]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
structured_llm.invoke(prompt)

Classification(sentiment='Angry', aggressiveness=10, language='Spanish')

In [20]:
inp = "Weather is ok here, I can go outside without much more than a coat"
prompt = tagging_prompt.invoke({"input": inp})
structured_llm.invoke(prompt)

Classification(sentiment='neutral', aggressiveness=5, language='English')