<a href="https://colab.research.google.com/github/muskanrath30/muskanrath30/blob/main/SentimentClassifier_RuleBased.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Dependencies

In [1]:
!pip install gdown

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Download Data

In [2]:
!gdown https://drive.google.com/uc?id=1h8Es1iPcg3Fs2CmUNi7MZeSG6I7nj2xc
!gdown https://drive.google.com/uc?id=1v5R0lS3U3MFXdtNaSiKOAQQ2P7dhc9eT
!gdown https://drive.google.com/uc?id=1_bcMpkPGKZ7tvuTL81YvgMJGqUPr7sIr

Downloading...
From: https://drive.google.com/uc?id=1h8Es1iPcg3Fs2CmUNi7MZeSG6I7nj2xc
To: /content/test.tsv
100% 206k/206k [00:00<00:00, 86.8MB/s]
Downloading...
From: https://drive.google.com/uc?id=1v5R0lS3U3MFXdtNaSiKOAQQ2P7dhc9eT
To: /content/train.tsv
100% 787k/787k [00:00<00:00, 136MB/s]
Downloading...
From: https://drive.google.com/uc?id=1_bcMpkPGKZ7tvuTL81YvgMJGqUPr7sIr
To: /content/valid.tsv
100% 101k/101k [00:00<00:00, 53.5MB/s]


# Load Data

In [3]:
label_map = {"positive": 1, "negative": 0}

def load_data(filename):
    sentences = []
    labels = []
    for f in open(filename):
        parts = f.split("\t")
        sentences.append(parts[0])
        labels.append(label_map[parts[1].strip()])
    return sentences, labels


# Extract Features

In this exercise, we are using Bag of Words features -- i.e. does a specified word appear in the sentence, or not. 

In [60]:
def extract_features(sentence):
    # edit the below two lines (good_words and bad_words) and check the accuracy
    #good_words = set(["best", "good", "delicious", "lovely"])
    #bad_words = set(["worst", "bad", "abysmal"])
    good_words = set(["best", "good", "delicious", "lovely","stupendous","magnificent","beautiful","magnanimous","enterprising","cheerful","trustworthy","vibrant","accomplished","graceful","sincere","serene","divine"])
    bad_words = set(["worst", "bad", "abysmal","horrible","exasperating","harrowing","heart breaking","heart wrenching","dishonest","anger","arrogant","disgraceful","rude","apathy"])
    
    return_dict = {"num_positive_words": 0, "num_negative_words": 0}
    
    for word in sentence.split(" "):
        if word in good_words:
            return_dict["num_positive_words"] = return_dict["num_positive_words"] + 1
        if word in bad_words:
            return_dict["num_negative_words"] = return_dict["num_negative_words"] + 1
    return return_dict

# Classification

In [61]:
def classify(sentence):
    features = extract_features(sentence)

    if features["num_positive_words"] > features["num_negative_words"]:
        return 1
    else:
        return 0

# Evaluation

This method computes accuracy given the predictions and the gold labels.

In [62]:
def evaluate(predictions, labels):
    correct = 0
    for p, l in zip(predictions, labels):
        if p == l:
            correct += 1
    return float(correct / len(predictions))

# Error analysis

This method is helpful in finding instances of the dataset that the classifier gets wrong.

In [63]:
def analysis(sentences, predictions, labels):
    incorrect = []
    for s, p, l in zip(sentences, predictions, labels):
        if p != l:
            incorrect.append(s + "\t" + str(p) + "\t" + str(l))
    return incorrect

# Main method to run the sentiment classifier


In [64]:
def main(train_file="/content/train.tsv", valid_file="/content/valid.tsv"):
    train_sentences, train_labels = load_data(train_file)
    valid_sentences, valid_labels = load_data(valid_file)
    
    predictions = [classify(s) for s in valid_sentences]
    accuracy = evaluate(predictions, valid_labels)
    print(accuracy)
    print("\n\n\n".join(analysis(valid_sentences, predictions, valid_labels)[:10]))
    
    

In [65]:
main()

0.5286697247706422
And if you 're not nearly moved to tears by a couple of scenes , you 've got ice water in your veins .	0	1


A warm , funny , engaging film .	0	1


Uses sharp humor and insight into human nature to examine class conflict , adolescent yearning , the roots of friendship and sexual identity .	0	1


Visually imaginative , thematically instructive and thoroughly delightful , it takes us on a roller-coaster ride from innocence to experience without even a hint of that typical kiddie-flick sentimentality .	0	1


Nothing 's at stake , just a twisty double-cross you can smell a mile away -- still , the derivative Nine Queens is lots of fun .	0	1


Unlike the speedy wham-bam effect of most Hollywood offerings , character development -- and more importantly , character empathy -- is at the heart of Italian for Beginners .	0	1


You 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an er