SQL-Mongo Project – Populations at Risk for COVID19

This project was undertaken as academic study to understand and apply the concepts of BUAN 6320: DataBase Foundations for Business Analytics. The data set chosen by the course instructor and the analysis done as part of this project is not representative of the overall statistics.

About:

The 3,142 counties of the United States span a diverse range of social, economic, health, and weather conditions. Because of the COVID19 pandemic, over 2,400 of these counties have already experienced some COVID19 cases.

Combining county-level data on health, socioeconomics, and weather can help us address identify which populations are at risk for COVID19 and help prepare high-risk communities.

Temperature and humidity may affect the transmissibility of COVID19, but in the United States, warmer regions also tend to have markedly different socioeconomic and health demographics. As such, it's important to be able to control for factors like obesity, diabetes, access to healthcare, and poverty rates, since these factors themselves likely play a role in COVID19 transmission and fatality rates.

The dataset has 3 CSV files and total of 415 columns:

US_counties_COVID19_health_weather_data.csv
us_county_sociohealth_data.csv
us_county_geometry.csv

The dataset provides all of this information, formatted, cleaned, and ready for analysis. Most columns have little or no missing data. A small number have larger amounts of missing data; see the kernel that generated this dataset for more details:(https://www.kaggle.com/johnjdavisiv/us-counties-weather-health-covid19-data)

Thanks to John Davis for the dataset!

Note:

The project is in progress and is subject to change technically. All the documentation will be updated as the project progress.

DataBase Creation:

Schema Design:

415 columns over the 3 source files are divided into 6 tables mentioned below.

Tables:

county_details
State_lockdown_details
fips_daily_cases
daily_weather_details
county_socio_health
station_details

Entity relations:

Project_ER_Schema.mwb file has the detailed ER Schema of the Database.
Project_ER_Schema.sql file can be used to create the DB on the local system.

INSERT_SQL queries Generation:

sql_insert_generator.ipynb python notebook takes input of 3 CSV files and generates INSERT SQL quries for each tables designed in DB Schema. the python notebook does many of the data processing functions like duplicate handling and splitting the columns and generating datasets for individual tables.

Data Loading:

Due to the foreign key dependencies between the tables, run the SQL files in the below order only. You can find the SQL queries here

state_lockdown_details.sql
county_details.sql
station_details.sql
fips_daily_cases.sql
daily_weather_details.sql
county_socio_health.sql

Validation:

After Creating the DB and loading the data, "Validation_Quries.sql" can be used to check if you have loaded the complete data as expected.
Each query has the count commented below to check the data count after loading into the table.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Data_Load_Quries		Data_Load_Quries
Instructions		Instructions
LICENSE		LICENSE
Miniature_ ER_Schema.png		Miniature_ ER_Schema.png
Project_ER_Schema.mwb		Project_ER_Schema.mwb
Project_ER_Schema.sql		Project_ER_Schema.sql
ReadMe.md		ReadMe.md
Tables_columns.xlsx		Tables_columns.xlsx
Validation_Quries.sql		Validation_Quries.sql
sql_insert_generator.ipynb		sql_insert_generator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data_Load_Quries

Data_Load_Quries

Instructions

Instructions

LICENSE

LICENSE

Miniature_ ER_Schema.png

Miniature_ ER_Schema.png

Project_ER_Schema.mwb

Project_ER_Schema.mwb

Project_ER_Schema.sql

Project_ER_Schema.sql

ReadMe.md

ReadMe.md

Tables_columns.xlsx

Tables_columns.xlsx

Validation_Quries.sql

Validation_Quries.sql

sql_insert_generator.ipynb

sql_insert_generator.ipynb

Repository files navigation

SQL-Mongo Project – Populations at Risk for COVID19

About:

Note:

DataBase Creation:

Schema Design:

Tables:

Entity relations:

INSERT_SQL queries Generation:

Data Loading:

Validation:

About

Releases

Packages

Languages

License

rajadevineni/MySQL_Mongo_DB_for_COVID_Data

Folders and files

Latest commit

History

Repository files navigation

SQL-Mongo Project – Populations at Risk for COVID19

About:

Note:

DataBase Creation:

Schema Design:

Tables:

Entity relations:

INSERT_SQL queries Generation:

Data Loading:

Validation:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages