# Disaster Response Messages ETL/ML Pipelines

> ETL and ML pipelines for processing/flagging disaster response messages.

**A Note on the Code in This Project:**
Development and testing of the code is done in the Jupyter notebooks in the `notebooks` directory. Generation of the scripts to run in a production environment are automatically generated using [nbdev](https://github.com/fastai/nbdev).

## Project Description

A potentially useful source of information for disaster relief agencies during disaster response are the communications by individuals affected by the disaster, either direct or through social media. The problem is that there are are millions of communications, only a small fraction of which are relevant. Particularly during a disaster response, agencies have the least capacity to filter these messages manually. So it is useful to be able to automatically classify these messages.

This project creates a pipeline to process classified messages, and thereby train a model that can be used to predict classifications of messages during disaster response.

Inputs to the ETL pipeline are messages received during a disaster, and categorizations of the need expressed by the messages
(e.g., "shelter", "medical_help", "food", "aid_related", etc.) The ETL pipeline transforms the data into a format suitable
as input to a supervised learning algorithm.

The ML pipeline reads the data generated by the ETL pipeline and trains a model. The purpose of the model is to predict
the needs expressed by new messages so that messages can be automatically classified during a live disaster response.

Finally, a web app both displays some statistics on the dataset used for training, and allows the user to experiment with the 
trained model by entering arbitrary (English language) message text, and seeing the need categories predicted by the model.

## File Descriptions

The source code for the ETL and ML pipelines as well as usage examples and tests are in the two iPython notebooks in the main directory:
- `ETL_Pipeline_Preparation.ipynb`
- `ML_Pipeline_Preparation.ipynb`

Python scripts used to run the pipelines are autogenerated from these notebooks using [nbdev](https://github.com/fastai/nbdev).

These scripts and other code and resources needed for the web app are under the `udacity_root` folder.

## How to use

**Step 1:** Run the following commands in the project's `udacity_root` directory to set up your database and model.

To run ETL pipeline that cleans data and stores in database

```shell
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
```

To run ML pipeline that trains the classifier and saves it for future use
  
```bash
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
```

**Step 2:** Run the following command in the `app` subdirectory to run the web app.

```shell
python run.py
```

**Step 3:** Go to http://localhost:3001/