# Banglish Sentiment Challenge
Welcome! This notebook will guide you through the process of classifying Bangla-English mixed sentences into Positive, Negative, or Neutral sentiment.

## Workflow Overview
1. **Import Libraries**: Load required packages (e.g., pandas, transformers, etc.).
2. **Load Data**: Read the provided CSV files.
3. **Zero-shot Sentiment Classification**: Use a multilingual model or prompt engineering to predict sentiment.
4. **Prepare Submission**: Format predictions for submission.
5. **Evaluation**: (Optional) Calculate macro-averaged F1-score if ground truth is available.

Let's get started!

In [2]:
# Analyze label distribution in example.csv
import pandas as pd
example_df = pd.read_csv('example.csv')
print('Label distribution:')
print(example_df['predicted_sentiment'].value_counts())
print('\nSample data:')
print(example_df.head())

Label distribution:
predicted_sentiment
negative    3
neutral     3
positive    1
Name: count, dtype: int64

Sample data:
           id                               text predicted_sentiment
0  sample_799        Rate deri kore ghumiyechi 👎            negative
1  sample_825  Bagane phul phuteche onek sundor             positive
2  sample_226          Database এ error দেখাচ্ছে            negative
3    sample_9          Sondhyay parke halte jabo             neutral
4   sample_16        রান্নাঘরে মা কাজ করছেন 🤦‍♂️            negative


## Few-shot Training and Prediction
We will use the labeled examples from `example.csv` for few-shot learning or prompt engineering, and predict sentiment for the sentences in `test.csv`.

In [3]:


# Load test.csv for prediction
test_df = pd.read_csv('test.csv')
print('Test data sample:')
print(test_df.head())

Test data sample:
           id                                text
0  sample_798          I bought a নতুন বই to read
1  sample_141  Bondhudero sathe ghurte giyechilam
2  sample_675           Bazare aj onek bhir chilo
3  sample_574  এই movie টা really interesting ছিল
4  sample_488        সন্ধ্যায় পার্কে হাঁটতে যাবো


In [4]:
# Few-shot prompt engineering using example.csv for zero-shot classification
from transformers import pipeline

# Prepare prompt examples for each label
examples = example_df[['text', 'predicted_sentiment']].values.tolist()

classifier = pipeline('zero-shot-classification', model='joeddav/xlm-roberta-large-xnli')
labels = ['positive', 'negative', 'neutral']

def predict_with_examples(text, examples, labels):
    # Optionally, you can concatenate example texts to the input for prompt engineering
    # For simplicity, we use the classifier directly here
    return classifier(text, labels)['labels'][0]

test_df['sentiment'] = test_df['text'].apply(lambda x: predict_with_examples(x, examples, labels))
test_df[['id', 'text', 'sentiment']].head()

  from .autonotebook import tqdm as notebook_tqdm
2025-07-28 12:02:56.407786: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-28 12:02:57.692285: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-28 12:02:58.562636: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753682579.188485   10044 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753682579.332706   10044 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1753682580.750461   10044 computation_placer.cc:177] computation placer already

NameError: name 'torch' is not defined

In [None]:
# Save new predictions to submission.csv
submission = test_df[['id', 'sentiment']]
submission.to_csv('submission.csv', index=False)
print('Submission file saved as submission.csv')

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from transformers import pipeline
import os

# Load the test data
test_df = pd.read_csv('test.csv')
test_df.head()

## Zero-shot Sentiment Classification
We will use a multilingual transformer model (such as XLM-RoBERTa or mBERT) with Hugging Face's zero-shot classification pipeline to predict sentiment for each Banglish sentence.

The candidate labels are:
- positive
- negative
- neutral

In [None]:
# Initialize zero-shot classification pipeline
classifier = pipeline('zero-shot-classification', model='joeddav/xlm-roberta-large-xnli')

# Define candidate labels
labels = ['positive', 'negative', 'neutral']

# Predict sentiment for each sentence
def predict_sentiment(text):
    result = classifier(text, labels)
    return result['labels'][0]

test_df['sentiment'] = test_df['text'].apply(predict_sentiment)
test_df[['text', 'sentiment']].head()

## Prepare Submission
Format the predictions as required and save to `submission.csv`.

In [None]:
# Save predictions to submission.csv
submission = test_df[['id', 'sentiment']]
submission.to_csv('submission.csv', index=False)
print('Submission file saved as submission.csv')

## Evaluation (Optional)
If you have ground truth labels, you can evaluate your predictions using the macro-averaged F1-score.

In [None]:
# Optional: Evaluate using macro-averaged F1-score if ground truth is available
from sklearn.metrics import f1_score

# If you have a column 'true_sentiment' in test_df, uncomment below:
f1 = f1_score(test_df['true_sentiment'], test_df['sentiment'], average='macro')
print('Macro-averaged F1-score:', f1)