Skip to content

Sentiment Analysis using the "Sentiment140" dataset by Stanford. The model characterizes a tweet as either positive or negative.

Notifications You must be signed in to change notification settings

rkritika1508/Sentiment-Analysis

Repository files navigation

Sentiment Analysis of twitter data

This project constitutes analyzing tweets using the "Sentiment140" dataset by Stanford. The model uses machine learning algorithms and feature extraction techniques to characterize a tweet as either positive or negative.

The linked jupyter notebooks are as follows:

  • First.ipynb - Preparation and cleaning of the dataset

  • Second.ipynb - Analysis, visualisation and preparing the data for further visualisation

  • Third.ipynb - Zipf Law and visualisation of tweet tokens

  • Fourth.ipynb - Split the dataset, took TextBlob sentiment analyzer as the Baseline, feature extraction using CountVectorizer and Logistic Regression based classification model applied on unigrams, bigrams and trigrams

  • Fifth.ipynb - Feature extraction using TF-IDF and Logistic Regression based classification model applied on unigrams, bigrams and trigrams and analysed the performance of other classification algorithms like Ridge Classifier, Perceptron, Passive-Agressive Classifier, Stochastic Gradient Descent, LinearSVC, L1 based LinearSVC, KNN, Nearest Centroid, Multinomial NB, Bernoulli NB & Adaboost

  • Sixth.ipynb - Implemented Doc2Vec model using Gensim for feature extraction. Used DBOW (Distributed Bag Of Words), DMC (Distributed Memory Concatenated), DMM (Distributed Memory Mean), DBOW + DMC & DBOW + DMM on unigrams

  • Seventh.ipynb - Implemented Phrase Modelling using Gensim. Implemented DBOW, DMC, DMM, DBOW + DMC & DBOW + DMM on bigrams and trigrams. Also implemented other classification algorithms on the dataset

About

Sentiment Analysis using the "Sentiment140" dataset by Stanford. The model characterizes a tweet as either positive or negative.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published