Skip to content

This project performs sentiment analysis and topic modeling on a dataset of COVID-19-related tweets. The project classifies tweets into positive, negative, and neutral sentiments while uncovering key topics discussed on Twitter.

Notifications You must be signed in to change notification settings

ntsation/tweet-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis and Topic Modeling with Twitter Data

Overview

This notebook performs sentiment analysis and topic modeling on a dataset of COVID-19 related tweets. The script uses the TextBlob library for sentiment analysis and the Latent Dirichlet Allocation (LDA) algorithm to uncover latent topics within the tweet text. Additionally, a classifier is trained to categorize tweets into three sentiment categories: positive, negative, and neutral.

Requirements

Before running the script, ensure you have installed the necessary libraries. You can install all the dependencies with the following command:

pip install pandas matplotlib seaborn textblob scikit-learn wordcloud tqdm

How to Use

  1. Clone the repository:

    • First, clone this repository to your local environment or Jupyter Notebook.
  2. Obtain the dataset:

    • Ensure you have the CSV file containing the tweets in the specified path within the notebook. The dataset should include a column with the tweet text.
  3. Run the notebook:

    • Open the notebook in Jupyter Notebook and execute the cells sequentially to perform the analysis.

Notebook Sections

1. Data Loading and Preprocessing

  • The notebook starts by loading the Twitter data from the specified CSV file using the Pandas library.
  • Sentiment analysis is then performed on each tweet using TextBlob, and the sentiment polarity scores are added to the dataframe.

2. Topic Modeling with LDA

  • Topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm, a popular technique for discovering hidden topics in large text datasets.
  • The notebook defines the function get_lda_topics() to preprocess the text, create a document-term matrix, and fit the LDA model.

3. Visualizing Topics

  • Functions such as plot_lda_topics() and plot_wordclouds() are used to visualize the topics generated by the LDA model.
  • These visualizations include displaying the most frequent words for each topic and creating word clouds to graphically represent the topics.

4. Visualizing Sentiment Distribution

  • The distribution of sentiment polarity scores is visualized using a histogram, giving a clear view of how sentiments are distributed across the tweets.

5. Training a Sentiment Classifier

  • A Naive Bayes classifier is trained to classify tweets into three sentiment categories: positive, negative, and neutral.
  • The model is trained using the labeled dataset and the sentiment polarity scores extracted earlier.

6. Evaluating the Classifier

  • The performance of the classifier is evaluated using metrics such as precision, recall, F1-score, and accuracy.
  • A confusion matrix and classification report are generated to assess the classifier's effectiveness and identify which sentiment categories were more difficult to classify correctly.

Results

When running the notebook, you will obtain the following results:

  • Topic Visualizations: Images displaying the topics generated by the LDA model and the most common words associated with each topic. This helps understand the main issues being discussed in the COVID-19 tweets.
  • Sentiment Distribution: Graphs showing the distribution of sentiments (positive, negative, and neutral) in the tweets.
  • Classifier Performance: Performance metrics such as model accuracy, the confusion matrix, and the classification report, providing detailed insights into the classifier's effectiveness.

About

This project performs sentiment analysis and topic modeling on a dataset of COVID-19-related tweets. The project classifies tweets into positive, negative, and neutral sentiments while uncovering key topics discussed on Twitter.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published