Classification and Detection of Disaster Tweets

About

Natural Disasters have caused an average of 60,000 deaths worldwide. When Natural Disaster strike, many that have witnessed it would often report it on social media in real time which can be done through twitter or facebook. Many often seek news from social media as it is much faster than traditional media. Since people would report it on social media, there is a need for fast response from the rescue operators to respond to the disaster. However, there is currently no system in place to alert the rescue operators about a disaster that is posted on social media.

The goal of this project is to identify tweets that are deemed as a Disaster Tweet through the use of Machine Learning

In order to achieve the goals set out, we will need to:

Find a suitable dataset
Clean the dataset
Find a suitable model for training
Implement the idea (through a website)

Full Presentation

Details	Link
Presentation Video	Click Here
Full Code in Google Colab	Click Here
Website	Click Here

For detailed walkthrough, please view the source code in order from:

Dataset used

We used this dataset provided by Kaggle for our project

Models used

Dense Network
Long Short-Term Memory (LSTM) Network
Bi-directional LSTM Network

Conclusion

Between Dense Network, LSTM Model, and Bi-Directional LSTM, Dense Network has the highest accuracy.
Data overfitting would slightly reduce accuracy of the model
With close to 80% accuracy, our model did well on the classification most of the time.
However, when testing it on a new set of data, the accuracy of our dataset have dropped to approximately 50%.
While, it is shown that our test dataset have learn some universal features, but the drop in accuracy was not what we expected.
The drop in accuracy may be caused by the difference in datasets. Each datasets have its own unique features.
Also, this may suggests that our training dataset may be unrepresentative of the large pool of datasets. Hence, showing the limiting factor of our dataset where we do not have a broad domain.

Future improvement

On a larger scale, we would like to try using the Bidirectional Encoder Representations from Transformers (BERT) model
Further tunes our hyperparameter
Generalise our dataset to prevents any bias or unrepresentative datasets

Takeaways

Data Cleaning
- Using regex to remove unwanted characters
- Fixed the imbalanced dataset to prevent inaccuracy of data
Data Visualisation
- Prepare and visualise our data using wordcloud
Data Pre-processing (Text Processing)
- Use of tokenization
- Use of sequencing
- Use of padding
Machine learning
- Dense Network using keras
- Long Short-Term Memory (LSTM) Network
- Bi-directional LSTM Network
Website
- Use of Streamlit
- Deploy in both localhost and home server to test run

References

Contributors

woonyee28 - Website, Implementation and Setup of Idea
Baby-McBabyFace - Data Cleaning, Data Visualization, Data Pre-processing
keenlim - Machine Learning Models, Comparison of data

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
.gitignore		.gitignore
Comparison_and_Other_Methods.ipynb		Comparison_and_Other_Methods.ipynb
Data_Extraction_and_Data_Cleaning.ipynb		Data_Extraction_and_Data_Cleaning.ipynb
Data_Visualization_and_Data_Pre_processing.ipynb		Data_Visualization_and_Data_Pre_processing.ipynb
Dense_Network,_LSTM_and_Bi_LSTM.ipynb		Dense_Network,_LSTM_and_Bi_LSTM.ipynb
README.md		README.md
SC1015_Mini_Project.ipynb		SC1015_Mini_Project.ipynb
Validation.ipynb		Validation.ipynb
index-streamlit.py		index-streamlit.py
requirements.txt		requirements.txt
sc1015_mini_project.py		sc1015_mini_project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification and Detection of Disaster Tweets

About

Full Presentation

Dataset used

Models used

Conclusion

Future improvement

Takeaways

References

Contributors

About

Releases

Packages

Languages

keenlim/mini-project

Folders and files

Latest commit

History

Repository files navigation

Classification and Detection of Disaster Tweets

About

Full Presentation

Dataset used

Models used

Conclusion

Future improvement

Takeaways

References

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages