## Covid19 Data Analysis and Predictions
This analysis summarizes the modeling, simulation, and analytics work around the COVID-19 outbreak around the world from the perspective of data science and visual analytics. It examines the impact of best practices and preventive measures in various sectors and enables outbreaks to be managed with available health resources.

This project is divided into these parts:
1. [Install required packages](#Install-required-packages)
2. [Data Analysis and Manipulation](#Data-Analysis-and-Manipulation)
3. [Data Visualization ](#Data-Visualization)


### Install required packages

In [None]:
# import required python libraries
import plotly.graph_objs as go
import plotly.io as pio
import plotly.express as px
import pandas as pd


# Install data visulaization libraries
import matplotlib.pyplot as plt

# importing plotly
import plotly.offline as py
import plotly.figure_factory as ff
import plotly.io as pio
# Change renderer to work in VS Code
#pio.renderers.default = "browser"  # Opens in web browser
pio.renderers.default = "vscode"

In [99]:
#! python -m pip install jupyter_contrib_nbextensions
! python -m jupyter contrib nbextension install --user


usage: jupyter.py [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
                  [--paths] [--json] [--debug]
                  [subcommand]

Jupyter: Interactive Computing

positional arguments:
  subcommand     the subcommand to launch

options:
  -h, --help     show this help message and exit
  --version      show the versions of core jupyter packages and exit
  --config-dir   show Jupyter config dir
  --data-dir     show Jupyter data dir
  --runtime-dir  show Jupyter runtime dir
  --paths        show all Jupyter paths. Add --json for machine-readable
                 format.
  --json         output paths as machine-readable json
  --debug        output debug information about paths

Available subcommands: console dejavu events execute kernel kernelspec lab
labextension labhub migrate nbconvert notebook qtconsole run script server
troubleshoot trust

Jupyter command `jupyter-contrib` not found.


### Data Analysis and Manipulation

In [None]:
# Enable interactive plotting in Jupyter Notebooks(Initializing Plotly)
py.init_notebook_mode(connected=True)

# rendering method for displaying plots in Google Colab(optional, only required for google colab notebook)
pio.renderers.default = 'colab'

In [None]:
# load covid19 dataset

In [101]:
# To load covid19 dataset 1

dataset1 = pd.read_csv("covid.csv")

# To retrieve top 5 rows
dataset1.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,iso_alpha
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas,USA
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas,BRA
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia,IND
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe,RUS
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa,ZAF


In [None]:
# To know the total number of rows and columns
dataset1.shape

(209, 17)

In [50]:
# To know the total number of elements
dataset1.size

3553

In [51]:
# Quick overview of the data for cleaning and preprocessing
dataset1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country/Region    209 non-null    object 
 1   Continent         208 non-null    object 
 2   Population        208 non-null    float64
 3   TotalCases        209 non-null    int64  
 4   NewCases          4 non-null      float64
 5   TotalDeaths       188 non-null    float64
 6   NewDeaths         3 non-null      float64
 7   TotalRecovered    205 non-null    float64
 8   NewRecovered      3 non-null      float64
 9   ActiveCases       205 non-null    float64
 10  Serious,Critical  122 non-null    float64
 11  Tot Cases/1M pop  208 non-null    float64
 12  Deaths/1M pop     187 non-null    float64
 13  TotalTests        191 non-null    float64
 14  Tests/1M pop      191 non-null    float64
 15  WHO Region        184 non-null    object 
 16  iso_alpha         209 non-null    object 
dt

In [53]:
# To load Covid19 dataset 2 

dataset2 = pd.read_csv("covid_grouped.csv")

# To retrieve top 5 rows
dataset2.head()


Unnamed: 0,Date,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,WHO Region,iso_alpha
0,2020-01-22,Afghanistan,0,0,0,0,0,0,0,Eastern Mediterranean,AFG
1,2020-01-22,Albania,0,0,0,0,0,0,0,Europe,ALB
2,2020-01-22,Algeria,0,0,0,0,0,0,0,Africa,DZA
3,2020-01-22,Andorra,0,0,0,0,0,0,0,Europe,AND
4,2020-01-22,Angola,0,0,0,0,0,0,0,Africa,AGO


In [54]:
# To know the total number of rows and columns
dataset2.shape

(35156, 11)

In [None]:
# Quick overview of the data for cleaning and preprocessing
dataset2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35156 entries, 0 to 35155
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Date            35156 non-null  object
 1   Country/Region  35156 non-null  object
 2   Confirmed       35156 non-null  int64 
 3   Deaths          35156 non-null  int64 
 4   Recovered       35156 non-null  int64 
 5   Active          35156 non-null  int64 
 6   New cases       35156 non-null  int64 
 7   New deaths      35156 non-null  int64 
 8   New recovered   35156 non-null  int64 
 9   WHO Region      35156 non-null  object
 10  iso_alpha       35156 non-null  object
dtypes: int64(7), object(4)
memory usage: 3.0+ MB


In [97]:
# To load Covid19 dataset 3
dataset3 = pd.read_csv("coviddeath.csv")

# To retrieve top 5 rows
dataset3.head()

Unnamed: 0,Data as of,Start Week,End Week,State,Condition Group,Condition,ICD10_codes,Age Group,Number of COVID-19 Deaths,Flag
0,08/30/2020,02/01/2020,08/29/2020,US,Respiratory diseases,Influenza and pneumonia,J09-J18,0-24,122.0,
1,08/30/2020,02/01/2020,08/29/2020,US,Respiratory diseases,Influenza and pneumonia,J09-J18,25-34,596.0,
2,08/30/2020,02/01/2020,08/29/2020,US,Respiratory diseases,Influenza and pneumonia,J09-J18,35-44,1521.0,
3,08/30/2020,02/01/2020,08/29/2020,US,Respiratory diseases,Influenza and pneumonia,J09-J18,45-54,4186.0,
4,08/30/2020,02/01/2020,08/29/2020,US,Respiratory diseases,Influenza and pneumonia,J09-J18,55-64,10014.0,


In [None]:
# To retrieve the name of all the columns from Covid19 dataset 1
dataset1.columns

Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
       'TotalTests', 'Tests/1M pop', 'WHO Region', 'iso_alpha'],
      dtype='object')

In [96]:
# To remove the mentioned column names
dataset1.drop(['NewCases','NewDeaths','NewRecovered'], 
               axis=1,inplace=True, errors= 'ignore')

# To retrieve 10 random rows from dataset
dataset1.sample(10)

Unnamed: 0,Country/Region,Continent,Population,TotalCases,TotalDeaths,TotalRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,iso_alpha
161,Tanzania,Africa,59886383.0,509,21.0,183.0,305.0,7.0,8.0,0.4,,,Africa,TZA
207,Vatican City,Europe,801.0,12,,12.0,0.0,,14981.0,,,,Europe,VAT
65,Costa Rica,North America,5098730.0,21070,200.0,7038.0,13832.0,103.0,4132.0,39.0,96110.0,18850.0,Americas,CRI
191,St. Vincent Grenadines,North America,110976.0,56,,46.0,10.0,,505.0,,2447.0,22050.0,,
198,Grenada,North America,112576.0,24,,23.0,1.0,,213.0,,6252.0,55536.0,Americas,GRD
118,Mali,Africa,20302901.0,2552,124.0,1954.0,474.0,,126.0,6.0,25152.0,1239.0,Africa,MLI
140,Cyprus,Asia,1208238.0,1208,19.0,856.0,333.0,,1000.0,16.0,216597.0,179267.0,Europe,CYP
51,Kyrgyzstan,Asia,6534479.0,38659,1447.0,30099.0,7113.0,24.0,5916.0,221.0,267718.0,40970.0,Europe,KGZ
85,Gabon,Africa,2230563.0,7787,51.0,5609.0,2127.0,11.0,3491.0,23.0,85369.0,38272.0,Africa,GAB
91,Luxembourg,Europe,626952.0,7073,119.0,5750.0,1204.0,9.0,11282.0,190.0,623994.0,995282.0,Europe,LUX


### Data Visualization 

In [100]:
# Use this code in Jupyter Notebook
'''from plotly.figure_factory import create_table
colorscale = [[0, '#4d004c'],[.5,'#f2e5ff'],[1,'#ffffff']]
table = create_table(dataset1.head(15), colorscale=colorscale)
table.show()'''"Covid Data Analysis_org.ipynb"

# To create table 
colorscale = [[0, '#4d004c'],[0.5,'#f2e5ff'],[1,'#ffffff']]

# Create table
table = ff.create_table(dataset1.head(15), colorscale=colorscale)

# Show using different method
pio.show(table)

In [94]:
# To create barchart
px.bar(dataset1.head(15), x = 'Country/Region',
       y = 'TotalCases', color = 'TotalCases',
       height = 500,hover_data = ['Country/Region', 'Continent'])

In [93]:
# To create barchart
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases', color = 'TotalDeaths', height = 500, 
       hover_data = ['Country/Region', 'Continent'])

In [92]:
# To create a barchart
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases', color = 'TotalRecovered', height = 500, 
       hover_data = ['Country/Region','Continent'])

In [91]:
#To create a barchart
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases', color = 'TotalTests', height = 500, 
       hover_data = ['Country/Region','Continent'])

In [90]:
# To create barchart
px.bar(dataset1.head(15), x = 'TotalTests', y = 'Country/Region', color = 'TotalTests', orientation = 'h', height = 500, 
       hover_data = ['Country/Region','Continent'])

In [89]:
# To create barchart
px.bar(dataset1.head(15), x  = 'TotalTests', y = 'Continent', color = 'Continent', orientation = 'h', height = 500, 
       hover_data = ['Country/Region','Continent'])

In [88]:
# To create scatter plot
px.scatter(dataset1, x = 'Continent', y = 'TotalCases', 
           hover_data = ['Country/Region','Continent'],color= 'TotalCases', size = 'TotalCases', size_max=50)

In [87]:
# To create a scatter plot
px.scatter(dataset1.head(57), x= 'Continent', y= 'TotalCases', 
           hover_data=['Country/Region','Continent'],color= 'TotalCases',size= 'TotalCases', size_max= 80, log_y=True)

In [86]:
# To create a scatter plot
px.scatter(dataset1.head(54), x = 'Continent',y = 'TotalTests', 
           hover_data = ['Country/Region','Continent'], color = 'TotalTests', size='TotalTests', size_max=80)

In [85]:
# To create a scatter plot
px.scatter(dataset1.head(100), x ='Country/Region', y='TotalCases',
           hover_data=['Country/Region','Continent'],color='TotalCases',size='TotalCases', size_max=80)

In [84]:
# To create a scatter plot
px.scatter(dataset1.head(30),x='Country/Region',y='TotalCases',
           hover_data=['Country/Region','Continent'],color='Country/Region',size='TotalCases',size_max=80,log_y=True)