# Classification

Useful in Tagging

- think: tags on a library book (a document) on TPL website

Label classes for a book could be:

- language
- topics covered
- genre


## In this notebook

How to use tool calling function for tagging documents?


In [1]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
from pydantic import BaseModel, Field # pydantic is used for validation (TODO: find out more on this later)

## schema

What properties do we want to extract from the prompt?

`pydantic` simply allows us to define a schema


In [46]:
# class Classification(BaseModel):
#     sentiment: str = Field(description="The sentiment of the text in english")
#     melancholy: int = Field(description="How melancholic the text is on a scale of 1 to 10")
#     language: str = Field(description="The language the text is written in")

class Classification(BaseModel):
    hello: str = Field(description="pick a letter between A and Z")
    # melancholy: int = Field(description="How melancholic the text is on a scale of 1 to 10")
    # language: str = Field(description="The language the text is written in")

## prompt template

A `ChatPromptTemplate` to standardize the common bit (aka boilerplate) of prompts (here, the common bit is the instructions pertaining to a structured output)

In [None]:
tagging_template = ChatPromptTemplate.from_template(
    template="""
Extract desired information from the following passage.
Only extract the information mentioned in the "Classification" class.
Passage:
{input}
"""
)

## LLM

initialize the model to follow a structure for outputs (the schema defined above)

In [48]:
llm = ChatOllama(
    temperature=0,
    model="llama3.2"
).with_structured_output(
    schema = Classification
)

## prompt

combine the input and the prompt template to create a prompt

In [49]:
inp = "At vero eos et accusamus et iusto odio dignissimos ducimus, qui blanditiis praesentium voluptatum deleniti atque corrupti, quos dolores et quas molestias excepturi sint, obcaecati cupiditate non provident, similique sunt in culpa, qui officia deserunt mollitia animi, id est laborum et dolorum fuga."

prompt = tagging_template.invoke(
    input={"input": inp}
)

## Output

Finally, pass the prompt to the LLM

Remember what effort has gone into preparing the prompt (literally prompt engineering):

1. The prompt template giving a common context to each input
2. The schema for structured output clearly defining what output is expected
3. The LLM initialized with schema

In [52]:
response = llm.invoke(
    input=prompt
)

response

Classification(hello='s')

# Why you do this?

Think of **structured output** as an extremely particular way of getting the LLM to answer back.

We're used to seeing Chat GPT blurt out paragraphs on paragraphs.

We can make better use of the output if we can pass it around an application, which is done with API, which have schema or a structure in general.

Hence we need structured output

# Practice

OpenAI uses a moderation API to filter potentially harmful input.

I thought of recreating this functionality using structured outputs

its as simple as defining the harmful categories in the schema.

This implementation is very rudimentary and not practical

# Counter Strike 2 chat moderation

In [81]:
class Moderation(BaseModel):
    violence: float = Field(description="Rate the violence in the sentence between 0 to 1")
    hate: float = Field(description="Rate the hate in the sentence between 0 to 1")
    hacking: float = Field(description="Rate the extent of hacking in the sentence between 0 to 1")

In [90]:
moderation_model = ChatOllama(
    temperature=0,
    model="llama3.2"
).with_structured_output(
    schema=Moderation
)

In [83]:
moderation_template = ChatPromptTemplate.from_template(
    template="""
Give the scores for the potentially harmful categories defined in "Moderation" class.
Passage:
{input}
"""
)

In [84]:
inp = "I swear I will shoot down all puny terrorists who try to peek at A long site!"

prompt = moderation_template.invoke(
    input={"input": inp}
)

In [85]:
response = moderation_model.invoke(
    input=prompt
)

response

Moderation(violence=0.8, hate=0.9, hacking=0.0)

In [86]:
inp = "Shut up SkreemingKroos, you're trash. you have 1-5 KD and chatting shit!?"

prompt = moderation_template.invoke(
    input=inp
)

response = moderation_model.invoke(
    input=prompt
)

response

Moderation(violence=0.0, hate=0.0, hacking=0.0)

In [87]:
inp = "Yo CT, vote out fortnitelover2 he's got aimbot and wall X-ray"

prompt = moderation_template.invoke(
    input=inp
)

response = moderation_model.invoke(
    input=prompt
)

response

Moderation(violence=0.0, hate=1.0, hacking=1.0)

**note**: why is my output score not between 0 and 1 when I have mentioned it in the schema??

In [97]:
inp = "bruh fiddlejiddle12 got suspicious peek at middle"

prompt = moderation_template.invoke(
    input=inp
)

response = moderation_model.invoke(
    input=prompt
)

response

Moderation(violence=0.0, hate=0.0, hacking=1.0)