**Live Streaming Twitter NLP Analysis**

This project was inspired `sentdex`. (https://github.com/Sentdex/socialsentiment/)

# What is in this repository

**Jupyter Notebooks**
- `final_notebook.ipynb`: contains training and validating of various models including BERT and TF-IDF models.

**Python Files**
- `twitter_stream.py`: connects to Twitter API and streamline tweets with various keywords
- `app.py`: Dash dashboard

**Folders**
- `images`: contains image files
- `models`: contains (1) BERT NLP and (2) spaCy TF-IF vectorization sentiment analysis models *(*BERT model excluded)*
- `data`: contains SQLite3 databases pulled from Twitter using Tweepy *(only sample database is included)*
- `datasets`: contains datasets that were used to train various NLP sentiment analysis models
- `src`: contains useful codes that were used in creating models
- `keys`: contains Twitter API key information *(*files excluded)*

**How to use this project**

1. Add Twitter API information in `keys` directory, and make sure the PATH is correctly defined in `twitter_stream.py`.
2. Adjust any keywords or queries in `twitter_stream.py`.
3. Set up SQLITE3 database path and run `twitter_stream.py`.
4. Run `app.py`!

<img src='images/dashboard_1.png'>
<img src='images/dashboard_2.png'>
<img src='images/dashboard_3.png'>
<img src='images/dashboard_4.png'>

# Live Dashboard

* Currently working on deploying the app through Heroku!

# Motivation

## Business Case
In the <a href='https://github.com/singsang2/tweet-product-sentiment-analysis'>`previous project`</a>, I got to work with various NLP models analyzing sentiments of tweets on different products. 

Sources:

[1] https://unsplash.com/photos/ulRlAm1ITMU

[2]. Bing Liu, https://www.morganclaypool.com/doi/abs/10.2200/s00416ed1v01y201204hlt016

## Goals
The goals of this project are to 

    [1] stream Twitter with on various topics (ex. Microsoft, Starbucks, Google, etc.) and
    [2] effectively implement various NLP models (including TextBlob, BERT, and TF-IDF models) to classify tweets
    [3] to flag the user for any strongly negative tweets on a Dashboard to effectively respond to them.

    

# Conclusion

BERT model has accuracy of 84%, however due to its computing time,Textblob was used as an initial model to classify polarity of each tweet. The BERT model was used to confirm any strongly negative sentiment tweets classified by Textblob.



# Future Work

1. Database structure
    - As the database size increases, it might get too much for sqlite3 to handle. So, we want to separate database into parts so only when a large number of data is requested, we can combine or join smaller databases. 
2. Reply feature
    - It would be nice if we could add a feature where you could reply to any of tweets shown in the dashboard without going to twitter page.
3. Multiple keywords
    - It would be nice if multiple keywords can be analyzed and followed at a given time.
4. Table Editing Mode
    - Modify flagged tweets in the dashboard so that a user can classify flagged tweets as 'resolved', 'false negative', or 'other' for further customer service / data analysis.
5. Further analysis on both positive and negative tweets
    - Find any correlation between tweet trends with how the company is doing to help with future direction of a company.