Skip to content

As part of the Udacity Data Scientist Nanodegree Program, this multioutput classification project aims to analyze and classify messages to improve communication during disasters.

pcmaldonado/Disaster_Response

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster Response Project

Overview

As part of the Udacity Data Scientist Nanodegree Program, this multioutput classification project aims to analyze and classify messages to improve communication during disasters, using data provided by Appen (formally Figure 8) that contains real messages that were sent during disaster events.

A web application can be hosted locally to classify new messages.

The result should look like this:

Home
Visualizing data (1)
Visualizing data (2)
Message Classification
About the Project

To run the app locally, as well as any other step of the project, it is recommended to create a new working environment and install the required libraries. For example, if using Anaconda:

  • conda create -n <env_name> then conda activate <env_name>
  • conda install -c anaconda pip
  • pip install -r requirements.txt

Or simply (although it can generate a "PackagesNotFoundError"):

  • conda create --name <env_name> --file requirements.txt

Directories

Config

Configuration setup is handled in the "config" directory through two files: core.py and config.yml. The different path files needed to run this project (messages and categories data, database file, and model pickle file) are specified in this folder, and a validation is done to ensure everything works as intented. Thus users do not need to write additional name files when running the python script, which can prevent errors and increases efficiency.

To re-train the model, e.g. if new data becomes available, the "config.yml" should be updated with the new file names, or the previous data files should be replaced.

Data

It contains the raw data (messages and categories) in csv files, as well as the cleaned data in a database (.db) file.

It also contains the python script needed to apply the entire ETL process, process_data.py, which extracts data from csv files, transforms them and then loads them into a single SQLite database.

To run this script on the command line, from the project folder:
python data/process_data.py

Models

It contains the python script that handles all the machine learning steps needed for this project, train_classifier.py. It also holds the pickle file containing the best model from the GridSearchCV done on the training set.

To run the python script, train_classifier.py, from the command line:
python models/train_classifier.py

Training

App

It contains the necessary files to run the wep application. This include two python scripts:

  • run.py which contains the Flask code needed to render the HTML files as well as the Plotly figures
  • functions.py which contains extra functions needed to execute run.py (for a modular and clean code)

In addition, two additional directories: templates and static, contain the necessary HTML and CSS files.

To access the web application on a local computer, run: python app/run.py and run the given url.

About

As part of the Udacity Data Scientist Nanodegree Program, this multioutput classification project aims to analyze and classify messages to improve communication during disasters.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published