# COVID-19

<p align="center">
  <img width="180" src="https://user-images.githubusercontent.com/19881320/54484151-b85c4780-4836-11e9-923f-c5e0e5afe866.jpg">
</p>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)


## Contact Information

William Ponton: [LinkedIn](https://www.linkedin.com/in/williampontoncfsp/) 

Email: [@gorbulus](waponton@gmail.com)

REPL: [@gorbulus](https://repl.it/@gorbulus)

Github: [gorbulus](https://github.com/gorbulus)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)


## Overview

This notebook will be a collection of Data Science basics, examples, and best practices for use as a reference guide.

There are five sections of this guide broken down to the basic steps of the Data Analysis process.  The first step is related to importing a dataset to your environment to be able to analyze and do the work.  Some estimates show that Data Scientists can spend up to 80% of their time cleaning and organizing data for analysis and modeling.  The second step is to define best practices for cleaning and organizing, how to handle ```NULL``` values, and how to merge and organize messy data.  Once the dataset is normalized and cleaned, this guide will detail common statistical methods and define the values needed for visualization and final stats for the Interpretation section.  Numerical Analysis is the 'magic' of Data Science, as this step often can expose anomalies and patterns in the data that humans alone might not have been able to interpret.  The output of the Numerical Analysis step also powers the Visualizations that will be presented to the stakeholders in the final reporting, and is vital for the subsequent step of Interpretation and Reporting.  Finally, the guide covers creating a deliverable to be passed off to other departments. The final result must be understandable by all audiences it is intended for, so knowing the goals of the project up front is imperative for keeping the results in the scope of the audience's understanding of the analysis. 

### Data Science Steps

- 0.0 Importing Data

- 0.1 Cleaning & Organizing

- 0.2 Numerical Analysis

- 0.3 Visualizations

- 0.4 Interpretation & Reporting

## Data Science with Python

Python has a rich Data Science functionality that has been motivated by teams of scientists and engineers trying to solve scientific and engineering problems.  Python's Object Oriented Design, ease of syntax, and available libraries make it the industry standard for Data Analysis.  A 2016 study done by [O'Reily](https://www.oreilly.com/data/free/files/2016-data-science-salary-survey.pdf) shows that ```Python``` is now dominant over ```R``` throughout the Data Science community, favoring ```Python 3.6``` to the soon to be extinct ```Python 2.7```.  I also plan to create a Data Science Playbook for ```R``` techniques in the future (I am still learning!).


```Python``` has become the fastest growing programming language of 2019, and continues to remain the industry standard for modeling and analysis in the scientific and engineering industries.  The Scientific Python Stack is an array of technologies that make Python so powerful for Data analysis and statistical prediction.

To get everything running in this project, use ```pip install -r requirements.txt```

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)



## Project Stack

<p align="center">
  <img src="https://user-images.githubusercontent.com/19881320/54723910-6457a880-4b3f-11e9-850b-8c2be2ff62a8.jpg">
</p>

### Language
- Python 3.6 (replacing legacy Python 2.7 in 2020)
- Cython (a speedy C library for backing up numpy)

### Scientific & Numeric Power
- SciPy
- NumPy
- SciKitLearn

### Interactive Environment
- Anaconda IDE
- IPython Notebooks
- GitHub (version control)
- RMOTR Notebooks

### Data Science Libraries
- Analysis tools
    - NumPy
    - Pandas
    - Cython
- Visualization tools
    - Matplotlib
    - Seaborn
    - Bokeh


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

# Analyzing the epidemiological outbreak of COVID‐19

A visual exploratory data analysis approach.

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

%matplotlib inline

In [31]:
# Column names
column_names = ["Province_State", "Country_Region", "Lat", "Long", "Date", "Confirmed", "Deaths", "Active"]
COVID_DATASET = "C:/Users/beefy/covid_19/app_data/covid_df.csv"
# Reading the CSV file using the col_names list in the names parameter:
covid_df = pd.read_csv(COVID_DATASET, sep=",")


print(covid_df.shape)

covid_df.head()

(25465, 9)


Unnamed: 0,Province/State,Country/Region,Lat,Long,date,confirmed,deaths,recovered,active
0,,Thailand,15.0,101.0,1/22/20,2,0,0,2
1,,Japan,36.0,138.0,1/22/20,2,0,0,2
2,,Singapore,1.2833,103.8333,1/22/20,0,0,0,0
3,,Nepal,28.1667,84.25,1/22/20,0,0,0,0
4,,Malaysia,2.5,112.5,1/22/20,0,0,0,0


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

In [40]:
confirmed_df = covid_df.drop(columns = ["deaths", "recovered", "active"])
confirmed_df.head()

confirmed_cases = 0
confirmed_cases = confirmed_df["confirmed"].sum()
print("Total confirmed cases: ", confirmed_cases)

Total confirmed cases:  3628957


In [42]:
death_df = covid_df.drop(columns = ["confirmed", "recovered", "active"])
death_df.head()

death_cases = 0
death_cases = death_df["deaths"].sum()
print("Total death cases: ", death_cases)

Total death cases:  116824


In [45]:
recovered_df = covid_df.drop(columns = ["confirmed", "deaths", "active"])
recovered_df.head()

recovered_cases = 0
recovered_cases = recovered_df["recovered"].sum()
print("Total recovered cases: ", recovered_cases)

Total recovered cases:  1357289


In [46]:
active_df = covid_df.drop(columns = ["confirmed", "deaths", "recovered"])
active_cases = 0
active_cases = covid_df["active"].sum()
print("Total active cases: ", active_cases)

Total active cases:  2154844


In [49]:
# Net cases
net_cases = (confirmed_cases - death_cases - recovered_cases)
print("Total net cases (active): ", net_cases)

Total net cases (active):  2154844


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)