This is my fork of our NLP372 project. These are the links to the contributors' GitHubs.
Mai Tung Duong
Assem Zhunis
Ern Chern
Nguyen Khanh Thi
Minsoo Kang(강민수)
If you want to run our classifier, you'll need to get datas from Kaggle. (If the data is not available, please contact me.)
The main code you should use is the pipeline.py, but you might need to adjust classifier.py and extractor.py
Datafile should be in the same level with the directory (or you can change the path to your file)
Detecting Emotions on COVID-19 Over Time Using NLP
Dataset: twitter_crosstab.csv, downloaded from saifmohammad.com
Labeled tweets in 4 emotion categories: anger, fear, joy, sadness
Extractor: return scored features based on Bayesian posterior probability
Model: bag-of-word, Tf-idf vectorization
Accuracy: ~80%
Vectorization: Word2vec
Cluster: K-mean
Anger is dominant in emotions on COVID-19. The main sources of anger may lie in the government response and policies. However,further investigation is needed to support our hypothesis. The next step may include classifying results by location and analyzing not only news titles but also the content of the news. In this case, building a news-specific model may be required. Overall, by using a simple and interpretable model in Python and NLTK, our project may contribute in providing an insight to take into account the mental state of the country, while imposing some policies and setting recovery strategies during hard times of the pandemic.
