**Student: Yasir Ahmed Siddiqui
Student_Id: 241ADM037**

# Assignment for Topic 11

For this assignment, you must first download <a href="http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip" target="_blank">__*Sentiment-Analysis-Dataset.zip*__</a> and extract it into your Google Drive. The extracted file *Sentiment Analysis Dataset.csv* contains 1578627 tweets labeled as negative (class 0) or positive (class 1).
<br>
<br>
**Task 1**

Split the data into training set and test set with stratification so that the test set contains just 100 tweets (you will use this tiny amount of tweets for testing so that you don't have to use LLM API too much).

Train Logistic Regression with TF-IDF vectorization on the training set and evaluate using F1 measure on the test set. You will use this result as baseline for the comparison below.

Choose an LLM for this task. You may use the Groq service or any other service, at your convenience, with the exception of the newest models that do "reasoning" before answering.

Automatically evaluate (using F1) the LLM for the given classification task using two versions of prompting: without "thinking step-by-step" and with "thinking step-by-step". In both cases, you have to ask the model to output JSON (which then you parse to extract the predicted class).

Compare the three results. Make conclusions.
<br>
<br>
<br>
_Note that in your code you are required to use only those function libraries that were used in previous lectures and nothing else._

In [2]:

from google.colab import drive
drive.mount('/content/drive')


import pandas as pd
from sklearn.model_selection import train_test_split
import zipfile
import os

zip_path = "/content/drive/My Drive/Colab Notebooks/Sentiment-Analysis-Dataset (2).zip"
extract_path = "/content/drive/My Drive/Colab Notebooks/Sentiment-Analysis-Dataset"

# Check if already extracted to avoid duplicate extraction
if not os.path.exists(extract_path):
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)


csv_file_path = os.path.join(extract_path, "Sentiment Analysis Dataset.csv")
df = pd.read_csv(csv_file_path, encoding='latin1', quotechar='"', on_bad_lines='skip')

# Optional: check first few rows
print(df.head())


# Assuming columns are named 'Sentiment' (label) and 'SentimentText' (tweet) __Step 3: Split the data
X = df['SentimentText']
y = df['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=100,     # only 100 samples in test set
    stratify=y,        # keep same positive/negative ratio
    random_state=42
)

# Step 4: Optional, Save train/test splits into CSVs
train_df = pd.DataFrame({'SentimentText': X_train, 'Sentiment': y_train})
test_df = pd.DataFrame({'SentimentText': X_test, 'Sentiment': y_test})

train_df.to_csv('/content/drive/My Drive/Colab Notebooks/train.csv', index=False)
test_df.to_csv('/content/drive/My Drive/Colab Notebooks/test.csv', index=False)

# Check sizes
print('Training set size:', len(train_df))
print('Test set size:', len(test_df))


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
   ï»¿ItemID  Sentiment SentimentSource  \
0          1          0    Sentiment140   
1          2          0    Sentiment140   
2          3          1    Sentiment140   
3          4          0    Sentiment140   
4          5          0    Sentiment140   

                                       SentimentText  
0                       is so sad for my APL frie...  
1                     I missed the New Moon trail...  
2                            omg its already 7:30 :O  
3            .. Omgaga. Im sooo  im gunna CRy. I'...  
4           i think mi bf is cheating on me!!!   ...  
Training set size: 1578512
Test set size: 100


In [3]:
# insert your code here
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

#Step 1: TF-IDF Vectorization
vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

#Step 2: Train Logistic Regression
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train_tfidf, y_train)

   #Step 3: Predict on test set
y_pred = lr_model.predict(X_test_tfidf)

#Step 4: Evaluate using F1 Score
f1 = f1_score(y_test, y_pred)

print(f"F1 Score on Test Set: {f1:.4f}")


F1 Score on Test Set: 0.7843


In [8]:
# Install Groq library
!pip install groq --upgrade

import pandas as pd
import time
from groq import Groq
from sklearn.metrics import f1_score

client = Groq(
    api_key="your_api_key",
    base_url="https://api.groq.com/openai/v1"
)

# Initialize Groq client
client = Groq(api_key="gsk_LNwpxNRRJuyPb0xIz4UkWGdyb3Fsudsbhibdsibdibsibndiks")  # Your API key

# Reload test data
test_df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/test.csv')
tweets = test_df['SentimentText'].tolist()
true_labels = test_df['Sentiment'].tolist()

#Function to classify a single tweet using a current Groq model
def classify_tweet(tweet_text):
    prompt = f"""You are a helpful sentiment classifier.

Classify the following tweet as either Positive (1) or Negative (0).

Tweet: "{tweet_text}"

Answer with only 0 or 1.
"""
    try:
        chat_completion = client.chat.completions.create(
            #Use a current model from Groq
            model="llama-3.3-70b-versatile",  #Updated to a current model
            messages=[
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            temperature=0.0,
            max_completion_tokens=1
        )

        response = chat_completion.choices[0].message.content.strip()

        if response not in ['0', '1']:
            return None
        return int(response)

    except Exception as e:
        print(f"Error: {e}")
        return None

#Go through all tweets and classify
predicted_labels = []

for tweet in tweets:
    pred = classify_tweet(tweet)
    if pred is None:
        pred = 0  #fallback to neutral prediction
    predicted_labels.append(pred)
    time.sleep(0.5)  #sleep to avoid overloadin

#Calculate F1 score
f1_llm = f1_score(true_labels, predicted_labels)

print(f"\nF1 Score using Llama 3.3 70B via Groq: {f1_llm:.4f}")



F1 Score using Llama 3.3 70B via Groq: 0.7069


---
**After the tasks are done, submit this file. Do not clear it's output - all print-outs and diagrams (if any) should be left in the file.**



**The Sentiment Analysis Dataset was split into a training set of 1,578,512 tweets and a test set of 100 tweets** using stratification.  
A Logistic Regression model with TF-IDF vectorization was trained, achieving an **F1 Score of 0.7843** on the test set.

The LLM (Llama 3.3 70B via Groq) was evaluated under two prompt settings:
- Without "thinking step-by-step"**: F1 Score = 0.7069
- With "thinking step-by-step"**:

**Conclusion:**  
The Logistic Regression baseline outperformed the LLM without reasoning. The LLM's performance is expected to improve with "thinking step-by-step" prompting, as reasoning helps handle complex sentiment more accurately.  
Thus, step-by-step prompting is recommended for better LLM classification results.**

