This repository helps in doing Sentiment Analysis and Topic Modelling.
Basically, there are 4 parts to this:
- Getting Twitter Data based upon hashtags
- Training and saving the models (Word2Vec, TF-IDF and SVM model)
- Using the model for Sentiment CLassification
- Using individual sentiments to do Topic Modelling
The training data for Sentiment Classified tweets can be obtained from the below link and keep it under the folder train_data: http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip
There are a few activities which needs to be done once:
- under train_models - execute train_word2vec.py to train the model and save it in pickle format
- under train_models - execute train_tfidf.py to train the model and save it in pickle format
- under train_models - execute train_classifier.py to train the model and save it in pickle format
The Classifier accuracy is around 78% in test dataset.
Once, the above is completed, the models are ready to predict.
Keep running the twitter_data.py in order to collect more samples of data.
Once, everything is done, run all_together.py to classify the tweets into positive and negative sentiments and do a topic modelling on each dataset separately.
- Improve the classifier by using negation statements
- Improve the classifier using n-gram phrases
- Use Convolution Neural Network for increased accuracy of the model