Sentiment Analysis and Topic Modeling with Twitter Data

Overview

This notebook performs sentiment analysis and topic modeling on a dataset of COVID-19 related tweets. The script uses the TextBlob library for sentiment analysis and the Latent Dirichlet Allocation (LDA) algorithm to uncover latent topics within the tweet text. Additionally, a classifier is trained to categorize tweets into three sentiment categories: positive, negative, and neutral.

Requirements

Before running the script, ensure you have installed the necessary libraries. You can install all the dependencies with the following command:

pip install pandas matplotlib seaborn textblob scikit-learn wordcloud tqdm

How to Use

Clone the repository:
- First, clone this repository to your local environment or Jupyter Notebook.
Obtain the dataset:
- Ensure you have the CSV file containing the tweets in the specified path within the notebook. The dataset should include a column with the tweet text.
Run the notebook:
- Open the notebook in Jupyter Notebook and execute the cells sequentially to perform the analysis.

Notebook Sections

1. Data Loading and Preprocessing

The notebook starts by loading the Twitter data from the specified CSV file using the Pandas library.
Sentiment analysis is then performed on each tweet using TextBlob, and the sentiment polarity scores are added to the dataframe.

2. Topic Modeling with LDA

Topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm, a popular technique for discovering hidden topics in large text datasets.
The notebook defines the function get_lda_topics() to preprocess the text, create a document-term matrix, and fit the LDA model.

3. Visualizing Topics

Functions such as plot_lda_topics() and plot_wordclouds() are used to visualize the topics generated by the LDA model.
These visualizations include displaying the most frequent words for each topic and creating word clouds to graphically represent the topics.

4. Visualizing Sentiment Distribution

The distribution of sentiment polarity scores is visualized using a histogram, giving a clear view of how sentiments are distributed across the tweets.

5. Training a Sentiment Classifier

A Naive Bayes classifier is trained to classify tweets into three sentiment categories: positive, negative, and neutral.
The model is trained using the labeled dataset and the sentiment polarity scores extracted earlier.

6. Evaluating the Classifier

The performance of the classifier is evaluated using metrics such as precision, recall, F1-score, and accuracy.
A confusion matrix and classification report are generated to assess the classifier's effectiveness and identify which sentiment categories were more difficult to classify correctly.

Results

When running the notebook, you will obtain the following results:

Topic Visualizations: Images displaying the topics generated by the LDA model and the most common words associated with each topic. This helps understand the main issues being discussed in the COVID-19 tweets.
Sentiment Distribution: Graphs showing the distribution of sentiments (positive, negative, and neutral) in the tweets.
Classifier Performance: Performance metrics such as model accuracy, the confusion matrix, and the classification report, providing detailed insights into the classifier's effectiveness.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
tweetML.ipynb		tweetML.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis and Topic Modeling with Twitter Data

Overview

Requirements

How to Use

Notebook Sections

1. Data Loading and Preprocessing

2. Topic Modeling with LDA

3. Visualizing Topics

4. Visualizing Sentiment Distribution

5. Training a Sentiment Classifier

6. Evaluating the Classifier

Results

About

Releases

Packages

Languages

ntsation/tweet-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis and Topic Modeling with Twitter Data

Overview

Requirements

How to Use

Notebook Sections

1. Data Loading and Preprocessing

2. Topic Modeling with LDA

3. Visualizing Topics

4. Visualizing Sentiment Distribution

5. Training a Sentiment Classifier

6. Evaluating the Classifier

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages