# VISUALIZING COVID-19 USING PLOTLY

# **Introduction**

I am using kaggle dataset,Novel Corona Virus 2019 Dataset https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset to visualize Coronavirus 19 using plotly.

Plotly https://chart-studio.plotly.com/feed/#/ is interactive plotting library used to visualize various datasets, either on their chart studio or as library imported in python notebook.

In this notebook, we will create choropleth Maps, bar graphs, pie charts and line graphs. I look forward to start drawing insights from this dataset.



In [1]:


#Importing relevant libraries
import numpy as np 
import pandas as pd 
import plotly as py
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

##Importing data into notebook
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_confirmed.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths_US.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_recovered.csv
/kaggle/input/novel-corona-virus-2019-dataset/COVID19_open_line_list.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths.csv
/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_confirmed_US.csv
/kaggle/input/novel-corona-virus-2019-dataset/COVID19_line_list_data.csv


In [2]:
#Reading the data by pandas..Trying this you may have to change location according to local location

corona_data=pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv')


In [3]:
#Viewing first three rows of data for quick insight
corona_data.head(3)


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
1,2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14.0,0.0,0.0
2,3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6.0,0.0,0.0


You can see that the data was created on 22/1/2020

In [4]:
#Viewing last two rows of data
corona_data.tail(2)


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
14809,14810,04/13/2020,Yunnan,Mainland China,2020-04-13 23:15:42,184.0,2.0,174.0
14810,14811,04/13/2020,Zhejiang,Mainland China,2020-04-13 23:15:42,1267.0,1.0,1239.0


You can see that the data was last updated on 11/04/2020 

# Creating Choropleth Map

We are using Choropleth Map for Comparing the spread of confirmed cases by region. 

In [5]:
choro_map=px.choropleth(corona_data, 
                    locations="Country/Region", 
                    locationmode = "country names",
                    color="Confirmed", 
                    hover_name="Country/Region", 
                    animation_frame="ObservationDate"
                   )

choro_map.update_layout(
    title_text = 'Global Spread of Coronavirus',
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
    
choro_map.show()

# Creating PieCharts¶

Piecharts are good visuals for quick comparing variables by showing percentages. Here I am using Pie Chart to show the difference in number of cases between countries.

In [6]:
pie_chart = px.pie(corona_data, values = 'Confirmed',names='Country/Region', height=600)
pie_chart.update_traces(textposition='inside', textinfo='percent+label')

pie_chart.update_layout(
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))

pie_chart.show()

It's clear that US is leading the number of cases.

Let's use pie chart to see top 10 countries with many cases. There many ways to do it. One way is to sort number of Confirmed cases, and then choose row 1 to 10.


In [7]:
#Manipulating the dataframe
top10 = corona_data.groupby(['Country/Region', 'ObservationDate']).sum().reset_index().sort_values('Confirmed', ascending=False)
top10  = top10.drop_duplicates(subset = ['Country/Region'])
top10 = top10.iloc[0:10]


In [8]:
pie_chart_top10 = px.pie(top10, values = 'Confirmed',names='Country/Region', height=600)
pie_chart_top10.update_traces(textposition='inside', textinfo='percent+label')

pie_chart_top10.update_layout(
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))

pie_chart_top10.show()

We can also do the same thing to see less affected countries from February to April 11th. Note that the chart shows increase over a time. It is not exact totals by now.

In [9]:
#Manipulating the dataframe
last20 = corona_data.groupby(['Country/Region', 'ObservationDate']).sum().reset_index().sort_values('Confirmed', ascending=False)
last20  = last20.drop_duplicates(subset = ['Country/Region'])
last20 = last20.iloc[-20:-1]
last20

Unnamed: 0,Country/Region,ObservationDate,SNo,Confirmed,Deaths,Recovered
3669,Jersey,03/14/2020,5586,2.0,0.0,0.0
1,"('St. Martin',)",03/10/2020,4675,2.0,0.0,0.0
6770,St. Martin,03/09/2020,4412,2.0,0.0,0.0
2222,Faroe Islands,03/10/2020,4672,2.0,0.0,0.0
7177,The Bahamas,03/18/2020,6706,1.0,0.0,0.0
7181,The Gambia,03/17/2020,6429,1.0,0.0,0.0
7862,Yemen,04/13/2020,14670,1.0,0.0,0.0
7717,Vatican City,03/09/2020,4507,1.0,0.0,0.0
0,Azerbaijan,02/28/2020,2664,1.0,0.0,0.0
2474,"Gambia, The",03/18/2020,6696,1.0,0.0,0.0


In [10]:
pie_chart_last20 = px.pie(last20, values = 'Confirmed',names='Country/Region', height=600)
pie_chart_last20.update_traces(textposition='inside', textinfo='percent+label')

pie_chart_last20.update_layout(
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))

pie_chart_last20.show()

# Number of Confirmed Cases Over time by Bar Graphs

In [11]:
bar_data = corona_data.groupby(['Country/Region', 'ObservationDate'])['Confirmed', 'Deaths', 'Recovered'].sum().reset_index().sort_values('ObservationDate', ascending=True)


In [12]:
bar_fig = px.bar(bar_data, x="ObservationDate", y="Confirmed", color='Country/Region', text = 'Confirmed', orientation='v', height=1300,width=1000,
             title='Increase in COVID-19 Cases')
bar_fig.show()

*You can see that there were big increase in cases between 02/12/2020 and 02/15/2020 than other days. *

In [13]:
bar_fig2 = px.bar(bar_data, x="ObservationDate", y="Deaths", color='Country/Region', text = 'Deaths', orientation='v', height=1000,width=900,
             title='COVID-19 Deaths since February to April 11th')
bar_fig2.show()

In [14]:
bar_fig3 = px.bar(bar_data, x="ObservationDate", y="Recovered", color='Country/Region', text = 'Recovered', orientation='v', height=1000,width=900,
             title='COVID-19 Recovered Cases since February to April 11th')
bar_fig3.show()

We see that the number of deaths recovered cases have been increasing. We can expect another increase for both as also number of confirmed cases increases day by day. 

# Comparing both Cases, deaths and recovered cases from February to April 11st

In [15]:
line_data = corona_data.groupby('ObservationDate').sum().reset_index()

line_data = line_data.melt(id_vars='ObservationDate', 
                 value_vars=['Confirmed', 
                             'Recovered', 
                             'Deaths'], 
                 var_name='Ratio', 
                 value_name='Value')

line_fig = px.line(line_data, x="ObservationDate", y="Value", line_shape="spline",color='Ratio', 
              title='Confirmed cases, Recovered cases, and Death Over Time')
line_fig.show()

*It's now clear and easy to see numbers of confirmed cases, recovered, and deaths. A bit promising, Recovered cases are greater than Deaths. Thanks to visualization, we can see the differences in one sight. *