This project is part of the Data Science Nanodegree Program by Udacity. It involves building a Natural Language Processing (NLP) model to categorize messages from real-life disaster events in real-time. The dataset consists of pre-labelled tweets and messages.
The project is divided into the following key sections:
- Processing data: Building an ETL pipeline to extract, clean, and store the data in a SQLite database.
- Building a machine learning pipeline: Training a classifier to categorize text messages into various categories.
- Running a web app: Displaying model results in real-time.
- Python 3.5+
- NumPy, SciPy, Pandas, Scikit-Learn
- NLTK
- SQLAlchemy
- Pickle
- Flask, Plotly
Clone the git repository:
-
Run the ETL pipeline to clean data and store the processed data in the database:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
Run the ML pipeline to load data from the database, train the classifier, and save the classifier as a pickle file:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
Run the web app:
python run.py
Access the web app at:
- Running on http://127.0.0.1:3001
- Running on http://192.168.29.170:3001
M S Mohan Kumar
This project is licensed under the MIT License.
- Udacity for providing an excellent Data Science Nanodegree Program.