The project aims to predict a disaster type [natural, technology, conflict, unknown] from features such as [money spent on recovering, fatalities, people evacuated, etc]. The first step performed is following an ETL(extract, transfer, load). The data is extracted from across the Internet and sources will be cited in the following subsections. The data mart is loaded into MySQL database. Both supervised and unsupervised learnings are performed.
The input data sets are composed of two files which are located in ../data
folder:
- Locate
utils.py -> sql_connection()
, configure thepasswd
parameter to your own MySQL database. - The driver script is
main.py
, to run the project, simply runpython3 main.py
.
While running main.py
, a MySQL database named disaster_DB
is created. This is the data mart we create.
Every input & output files are stored inside ../data
folder.
data/physical_model
contains the .sql script to create a MySQL database schema. The script is automatically run bymain.py
so you do not need to perform additional step to create a database.data/datamart
is a pysical representation of the data mart in .txt files, which contains a fact table and according dimension tables.data/clustering
contains screenshots of clusterings run by python.data/dashboard
contains Tableau screenshots to summarize the input data in a compact representation.data/model_accuracy
contains a screenshot which compares the accuracies of several classifiers.