# Sentiment Analysis App

## Intro
* Sentiment Analysis is a very popular functionality. For example, be able to determine if a product review is positive or negative.
* Our app will be able to do more than that. It will be a text classification app, also called a "tagging" app.
* In short, we will create an app to classify text into labels. And these labels can be:
    * Sentiment: Sentiment Analysis app.
    * Language: Language Analysis app.
    * Style (formal, informal, etc): Style Analysis app.
    * Topics or categories: Topic or category Analysis app.
    * Political tendency: Political Analysis app.
    * Etc.

In [1]:
# !pip install python-dotenv

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())


## Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [3]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [4]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

In [5]:
from langchain_openai import ChatOpenAI
from langchain_groq import ChatGroq
import os

# Access the API key
groq_api_key = os.getenv("GROQ_API_KEY")

# Use it (example)
print(f"Using Groq API key: {groq_api_key[:4]}...")  # Print partial for safety

llm = ChatGroq(
    model_name="llama3-8b-8192",  # Options: llama3-8b-8192, llama3-70b-8192, mixtral-8x7b-32768
    groq_api_key=os.getenv("GROQ_API_KEY")
)

Using Groq API key: gsk_...


* Instead of using the previous llm, we will define a new llm in the following block of code and use the with_structured_output method supported by OpenAI models:

## Tag Definition
* In the following code we define the 3 tags we will analize in this app:
    * sentiment.
    * political tendency.
    * language. 

In [24]:
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_groq import ChatGroq
import os

# Define your schema
class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text, e.g., positive, negative, or neutral")
    political_tendency: str = Field(description="The political tendency of the user, e.g., left, right, or neutral")
    language: str = Field(description="The language the text is written in, e.g., English, Hindi, etc.")

# Set up the LLM with your key (make sure it's loaded correctly)
llm = ChatGroq(
    model_name="llama3-8b-8192",
    groq_api_key=os.getenv("GROQ_API_KEY")  # Ensure this is set in your .env
)

# Better prompt
tagging_prompt = ChatPromptTemplate.from_messages([
     ("system", 
     "You are an expert text classifier. For any given passage, extract the following fields:\n"
     "- {{\"sentiment\"}}: positive, negative, or neutral\n"
     "- {{\"political_tendency\"}}: left, right, or neutral\n"
     "- {{\"language\"}}: the language in which the text is written\n"
     "Return the output as strict JSON only, matching this structure:\n"
     "{{\"sentiment\": \"...\", \"political_tendency\": \"...\", \"language\": \"...\"}}")
,
    ("human", "{input}")
])

chain=tagging_prompt|llm



In [25]:
trump_follower = "I'm confident that President Trump's leadership and track record will once again resonate with Americans. His strong stance on economic growth and national security is exactly what our country needs at this pivotal moment. We need to bring back the proven leadership that can make America great again!"

In [26]:
biden_follower = "I believe President Biden's compassionate and steady approach is vital for our nation right now. His commitment to healthcare reform, climate change, and restoring our international alliances is crucial. It's time to continue the progress and ensure a future that benefits all Americans."

In [27]:
import json
response = chain.invoke({"input": trump_follower})

In [28]:

try:
    data = json.loads(response.content)
    result = Classification(**data)  # Optional validation with Pydantic
    print(result)
except json.JSONDecodeError:
    print("❌ LLM did not return valid JSON:")
    print(response.content)

sentiment='positive' political_tendency='right' language='en'


In [29]:
chain.invoke({"input": biden_follower})

AIMessage(content='{\n  "sentiment": "positive",\n  "political_tendency": "left",\n  "language": "en"\n}', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 153, 'total_tokens': 180, 'completion_time': 0.024694191, 'prompt_time': 0.01808207, 'queue_time': 0.273783569, 'total_time': 0.042776261}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_8b7c3a83f7', 'finish_reason': 'stop', 'logprobs': None}, id='run--7800a171-09a3-43e1-9441-7588b5a08707-0', usage_metadata={'input_tokens': 153, 'output_tokens': 27, 'total_tokens': 180})

* Careful schema definition gives us more control over the model's output.
* Specifically, we can define:
    * Possible values for each property.
    * Description to make sure that the model understands the property.
    * Required properties to be returned.
* Let's redeclare our Pydantic model to control for each of the previously mentioned aspects **using enums**:

In [30]:
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    political_tendency: str = Field(
        ...,
        description="The political tendency of the user",
        enum=["conservative", "liberal", "independent"],
    )
    language: str = Field(
        ..., enum=["spanish", "english"]
    )

In [42]:
tagging_prompt = ChatPromptTemplate.from_messages([
    ("system", 
     "Classify the following text and return valid JSON with exactly these fields:\n"
     "sentiment (positive, negative, neutral),\n"
     "political_tendency (left, right, neutral),\n"
     "language (e.g., English, Hindi)\n\n"
     "Return only JSON, no explanation."),
    ("human", "{input}")
])


llm = ChatGroq(model_name="llama3-8b-8192", groq_api_key=os.getenv("GROQ_API_KEY"), temperature=0)




In [43]:
tagging_chain = tagging_prompt | llm  # no structured output
response = tagging_chain.invoke({"input": trump_follower})
import json
parsed = json.loads(response.content)

In [44]:
parsed

{'sentiment': 'positive', 'political_tendency': 'right', 'language': 'English'}

In [None]:
# tagging_chain.invoke({"input": biden_follower})

AIMessage(content='There is no "Classification" function mentioned in the passage. The passage is an opinion piece about President Biden\'s approach and policies. It does not contain any information about properties or classification.', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 38, 'prompt_tokens': 84, 'total_tokens': 122, 'completion_time': 0.033941792, 'prompt_time': 0.037375023, 'queue_time': 1.228757428, 'total_time': 0.071316815}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_8b7c3a83f7', 'finish_reason': 'stop', 'logprobs': None}, id='run--79b28c94-78a8-4f3d-8bfe-7fff6c307807-0', usage_metadata={'input_tokens': 84, 'output_tokens': 38, 'total_tokens': 122})

## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 001-sentiment-analysis.py
* In terminal, make sure you are in the directory of the file and run:
    * python 001-sentiment-analysis.py