## Shape of the Curve COVID 19

Import the following libraries by adding the following command in your Jupyter Notebook and run the cell. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Next, we create a new DataFrame by importing the CSV file:

In [None]:
covid_df = pd.read_csv("COVID-19 Cases.csv", header=0)

To verify the DataFrame has loaded correctly, we can run a head() command to display the first few records:

In [None]:
covid_df.head()

Next we will isolate the data we want to focus our attention on by creating a new DataFrame from the source, and by applying a few filters against it. 

We want to isolate the records where all the following conditions are true.
* First the Daily Difference count is greater than zero.
* Next the Case Type should be Confirmed.
* Finally, the Country_Region should be only Italy:

In [None]:
df_results = covid_df[(covid_df.Difference >0) & (covid_df.Case_Type == 'Confirmed') & (covid_df.Country_Region == 'Italy')]

To see the results sorted, we run the following command:

In [None]:
df_results.sort_values(by='Cases', ascending=False)

Now we want to visually display the distribution of the values in the Difference column. We can pass an array of values into the default hist() plot using the following command:

In [None]:
df_results.hist(column='Difference');

Use the describe() function against this DataFrame to see summary statistics. We can look at one column by explicitly passing it in the square brackets along with the column /field name in double quotes:

In [None]:
df_results["Difference"].describe()

## Understanding outliers and trends

Create a new DataFrame by importing the CSV file:

In [None]:
covid_df = pd.read_csv("COVID-19 Cases.csv", header=0)

To verify the DataFrame has loaded correctly, we can run a head() command to display the first few records:

In [None]:
covid_df.head()

Similar to the prior exercise, we will isolate the data we want to focus on attention on by creating a new DataFrame from the source and applying a few filters against it. 

We want to isolate the records where all the following conditions are true.
* First the Daily Difference count is greater than zero.
* Next the Case Type should be Confirmed.
* Finally, we use the pipe symbol `|` to create an “or” condition to allow for multiple **Country_Regions**:

In [None]:
df_results = covid_df[(covid_df.Difference >0) & (covid_df.Case_Type == 'Confirmed') & ((covid_df.Country_Region == 'Italy') | (covid_df.Country_Region == 'Spain') | (covid_df.Country_Region == 'Germany'))]

To see the results, we run the following command:

In [None]:
df_results.head()

To display a box plot by Country, we use the following command.
- The boxplot() has a few parameters like the ```by=``` which allows us to group the data by the **Country_Region**.
- We also include the ```column=``` to isolate the values in the Difference field. 
- Finally, we pass in the ```grid=False``` to turn off the grid lines in the chart:

In [None]:
df_results.boxplot(by='Country_Region', column=['Difference'], grid=False);

## Finding patterns in data

Create a new DataFrame by importing the CSV file:

In [None]:
covid_df = pd.read_csv("COVID-19 Cases.csv", header=0)

We will now create two new DataFrames which will be subsets from the original source. The advantage of naming them generically as **df_results_1** and **df_results_2** allows you to adjust the filters like **Country_Region** used in this one line without changing any other code in the additional steps:

In [None]:
df_results_1 = covid_df[(covid_df.Case_Type == 'Confirmed') & (covid_df.Country_Region == 'Germany')]

Run a head() command to validate the results:

In [None]:
df_results_1.head()

We will load the second DataFrame that we will use to compare with the first using the following commands:

In [None]:
df_results_2 = covid_df[(covid_df.Case_Type == 'Confirmed') & (covid_df.Country_Region == 'Italy')]

Run a head() command to validate the results:

In [None]:
df_results_2.head()

Lets profile the data in each DataFrame to better understand it. We use the describe() function to better identify key statistics and how the data is distributed:

In [None]:
df_results_1["Cases"].describe()

In [None]:
df_results_2["Cases"].describe()

We use the plt.scatter() function to create the visualization. It requires two parameters which are the x and y axis values separated by a comma. We are passing the common series of values found in the Cases column from each DataFrame. We also include labels and a title to help the audience understand the chart:

In [None]:
plt.scatter(df_results_1["Cases"], df_results_2["Cases"]);
plt.title("# of Cases")
plt.xlabel("Germany Cases")
plt.ylabel("Italy Cases");