# Introduction

COVID-19 Pandemic has taken the world by the storm. The virus has been responsible for crippling the economies of many developed countries and has put a halt on our everyday life. In this report, we will aim to analyze the COVID-19 data globally and locally to come up with meaningful insights and patterns.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px
import seaborn as sns

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
import pandas as pd
Case = pd.read_csv("../input/coronavirusdataset/Case.csv")
PatientInfo = pd.read_csv("../input/coronavirusdataset/PatientInfo.csv")
PatientRoute = pd.read_csv("../input/coronavirusdataset/PatientRoute.csv")
Region = pd.read_csv("../input/coronavirusdataset/Region.csv")
SearchTrend = pd.read_csv("../input/coronavirusdataset/SearchTrend.csv")
Time = pd.read_csv("../input/coronavirusdataset/Time.csv")
TimeAge = pd.read_csv("../input/coronavirusdataset/TimeAge.csv")
TimeGender = pd.read_csv("../input/coronavirusdataset/TimeGender.csv")
TimeProvince = pd.read_csv("../input/coronavirusdataset/TimeProvince.csv")
confirmed = pd.read_excel("../input/covid19-coronavirus/time_series_covid19_confirmed.xlsx")
deaths = pd.read_excel("../input/covid19-coronavirus/time_series_covid19_deaths.xlsx")
recovered = pd.read_excel("../input/covid19-coronavirus/time_series_covid19_recovered.xlsx")

In [None]:
china_cases = confirmed.loc[confirmed["Country/Region"]=="China"].sum().reset_index()
china_cases = china_cases.iloc[4:]

saudiarabia_cases = confirmed.loc[confirmed["Country/Region"]=="Saudi Arabia"].sum().reset_index()
saudiarabia_cases = saudiarabia_cases.iloc[4:]


unitedKingdom_cases = confirmed.loc[confirmed["Country/Region"]=="United Kingdom"].sum().reset_index()
unitedKingdom_cases = unitedKingdom_cases.iloc[4:]

southkorea_cases = confirmed.loc[confirmed["Country/Region"]=="Korea, South"].sum().reset_index()
southkorea_cases = southkorea_cases.iloc[4:]

bahrain_cases = confirmed.loc[confirmed["Country/Region"]=="Bahrain"].sum().reset_index()
bahrain_cases = bahrain_cases.iloc[4:]

kuwait_cases = confirmed.loc[confirmed["Country/Region"]=="Kuwait"].sum().reset_index()
kuwait_cases = kuwait_cases.iloc[4:]

timeline = go.Figure()

timeline.add_trace(go.Scatter(x=china_cases["index"], y=china_cases[0], name="Cases in China",
                          line_color='red'))

timeline.add_trace(go.Scatter(x=bahrain_cases["index"], y=bahrain_cases[0], name="Cases in Bahrain",
                          line_color='deepskyblue'))

timeline.add_trace(go.Scatter(x=southkorea_cases["index"], y=southkorea_cases[0], name="Cases in South Korea",
                          line_color='purple'))

timeline.add_trace(go.Scatter(x=kuwait_cases["index"], y=kuwait_cases[0], name="Cases in Kuwait",
                          line_color='darkorange'))

timeline.add_trace(go.Scatter(x=saudiarabia_cases["index"], y=saudiarabia_cases[0], name="Cases in Saudi",
                          line_color='green'))

timeline.add_trace(go.Scatter(x=unitedKingdom_cases["index"], y=unitedKingdom_cases[0], name="Cases in UK",
                          line_color='blue'))

timeline.update_layout(title_text='Spread of Corona over a period of Time', yaxis_type = "log",
                  xaxis_rangeslider_visible=True)
timeline.show()

# Global Overview
We have analyzed the spread of the COVID19 over China, Bahrain, South Korea, Kuwait, Saudi and United Kingdom. We have observed that in all these countries the virus has been following an exponential growth. We can note that China exponential growth receding while other countries are still exponentially growing. Essentially, all the countries are following the epidemic curve:

# Epidemic Curve

![epedemiccurve](https://miro.medium.com/max/1700/1*O8g9T36LCp7A8jqvP9MNlQ.jpeg)

Credit: https://towardsdatascience.com/classify-growth-patterns-for-covid-19-data-41af4c7adc55

The recession in China’s spread of the virus indicates that they made it past the middle point of epidemic curve. While Saudi Arabia, United Kingdom, Kuwait and Bahrain are still on the rise exponentially. We can also see that South Korea’s COVID-19 spread is receding a bit less than China’s, but more than other countries listed. We have selected South Korea for analysis in further sections due to the availability of its COVID-19 dataset and similarity to Saudi Arabia in its containment measures.

# Recovered & Deceased

In [None]:
china_cases = deaths.loc[confirmed["Country/Region"]=="China"].sum().reset_index()
china_cases = china_cases.iloc[4:]

saudiarabia_cases = deaths.loc[confirmed["Country/Region"]=="Saudi Arabia"].sum().reset_index()
saudiarabia_cases = saudiarabia_cases.iloc[4:]


unitedKingdom_cases = deaths.loc[confirmed["Country/Region"]=="United Kingdom"].sum().reset_index()
unitedKingdom_cases = unitedKingdom_cases.iloc[4:]

southkorea_cases = deaths.loc[confirmed["Country/Region"]=="Korea, South"].sum().reset_index()
southkorea_cases = southkorea_cases.iloc[4:]

bahrain_cases = deaths.loc[confirmed["Country/Region"]=="Bahrain"].sum().reset_index()
bahrain_cases = bahrain_cases.iloc[4:]

kuwait_cases = deaths.loc[confirmed["Country/Region"]=="Kuwait"].sum().reset_index()
kuwait_cases = kuwait_cases.iloc[4:]

timeline = go.Figure()

timeline.add_trace(go.Scatter(x=china_cases["index"], y=china_cases[0], name="Cases in China",
                          line_color='red'))

timeline.add_trace(go.Scatter(x=bahrain_cases["index"], y=bahrain_cases[0], name="Cases in Bahrain",
                          line_color='deepskyblue'))

timeline.add_trace(go.Scatter(x=southkorea_cases["index"], y=southkorea_cases[0], name="Cases in South Korea",
                          line_color='purple'))

timeline.add_trace(go.Scatter(x=kuwait_cases["index"], y=kuwait_cases[0], name="Cases in Kuwait",
                          line_color='darkorange'))

timeline.add_trace(go.Scatter(x=saudiarabia_cases["index"], y=saudiarabia_cases[0], name="Cases in Saudi",
                          line_color='green'))

timeline.add_trace(go.Scatter(x=unitedKingdom_cases["index"], y=unitedKingdom_cases[0], name="Cases in UK",
                          line_color='blue'))

timeline.update_layout(title_text='Deaths over a period of Time', yaxis_type = "log",
                  xaxis_rangeslider_visible=True)
timeline.show()

In [None]:
china_cases = recovered.loc[confirmed["Country/Region"]=="China"].sum().reset_index()
china_cases = china_cases.iloc[4:]

saudiarabia_cases = recovered.loc[confirmed["Country/Region"]=="Saudi Arabia"].sum().reset_index()
saudiarabia_cases = saudiarabia_cases.iloc[4:]


unitedKingdom_cases = recovered.loc[confirmed["Country/Region"]=="United Kingdom"].sum().reset_index()
unitedKingdom_cases = unitedKingdom_cases.iloc[4:]

southkorea_cases = recovered.loc[confirmed["Country/Region"]=="Korea, South"].sum().reset_index()
southkorea_cases = southkorea_cases.iloc[4:]

bahrain_cases = recovered.loc[confirmed["Country/Region"]=="Bahrain"].sum().reset_index()
bahrain_cases = bahrain_cases.iloc[4:]

kuwait_cases = recovered.loc[confirmed["Country/Region"]=="Kuwait"].sum().reset_index()
kuwait_cases = kuwait_cases.iloc[4:]

timeline = go.Figure()

timeline.add_trace(go.Scatter(x=china_cases["index"], y=china_cases[0], name="Cases in China",
                          line_color='red'))

timeline.add_trace(go.Scatter(x=bahrain_cases["index"], y=bahrain_cases[0], name="Cases in Bahrain",
                          line_color='deepskyblue'))

timeline.add_trace(go.Scatter(x=southkorea_cases["index"], y=southkorea_cases[0], name="Cases in South Korea",
                          line_color='purple'))

timeline.add_trace(go.Scatter(x=kuwait_cases["index"], y=kuwait_cases[0], name="Cases in Kuwait",
                          line_color='darkorange'))

timeline.add_trace(go.Scatter(x=saudiarabia_cases["index"], y=saudiarabia_cases[0], name="Cases in Saudi",
                          line_color='green'))

timeline.add_trace(go.Scatter(x=unitedKingdom_cases["index"], y=unitedKingdom_cases[0], name="Cases in UK",
                          line_color='blue'))

timeline.update_layout(title_text='Recovery over a period of Time', yaxis_type = "log",
                  xaxis_rangeslider_visible=True)
timeline.show()

In [None]:
Time['date'] = pd.to_datetime(Time['date'])

fig1 = go.Figure() # Create a fig model
# Add a trace for each column you want presented. We are using Scatter
fig1.add_trace(go.Scatter(x=Time['date'], y=Time['confirmed'], fill='tozeroy', name='confirmed'))
fig1.add_trace(go.Scatter(x=Time['date'], y=Time['released'], fill='tozeroy', name='released'))
fig1.add_trace(go.Scatter(x=Time['date'], y=Time['deceased'], fill='tozeroy', name='deceased'))

# Make it look nice..
fig1.update_layout(
    title = "Confirmed, Released and Deceased over Time",
    yaxis_title = "Number of Cases",
    font = dict(
        family="Arial, monospace",
        size=15,
        color="#7f7f7f"
    )
)
py.iplot(fig1)

# South Korea
We have seen that the spread of the virus follows an exponential pattern. Similarly, the confirmed reported cases within South Korea follows the same pattern. It is notable that the released and deceased represent only a small part of confirmed cases. Therefore, this means that there are many cases who are still under close supervision in hospitals since the start of this pandemic. We would like to investigate the ratio of released and deceased cases over time.

In [None]:
fig2 = go.Figure() # Create a fig model
# Add a trace for each column you want presented. We are using Scatter
fig2.add_trace(go.Scatter(x=Time['date'], y=Time['released']/Time['deceased'], fill='tozeroy', name='released'))

# Make it look nice..
fig2.update_layout(
    title = "Released/Deceased Ratio over Time",
    yaxis_title = "Number of Cases",
    font = dict(
        family="Arial, monospace",
        size=15,
        color="#7f7f7f"
    )
)
py.iplot(fig2)

Death rates were highest in early March. However, after that point there was a steady increase in recovered cases that were released after early March. We would like to know the age distribution of the deceased and released in South Korea.

In [None]:
PatientInfo['Age'] = 2020-PatientInfo['birth_year']
fig3 = px.histogram(PatientInfo[PatientInfo['state']=='deceased'],x="Age",marginal="box",nbins=20)
fig3.update_layout(
    title = "number of deceased by age ",
    xaxis_title="Age",
    yaxis_title="number of cases",
    barmode="group",
    bargap=0.1,
    xaxis = dict(
        tickmode = 'linear',
        tick0 = 0,
        dtick = 10),
    font=dict(
        family="Arial, monospace",
        size=15,
        color="#7f7f7f"
    )
    )
py.iplot(fig3)

fig4 = px.histogram(PatientInfo[PatientInfo['state']=='released'],x="Age",marginal="box",nbins=20)
fig4.update_layout(
    title = "number of released by age ",
    xaxis_title="Age",
    yaxis_title="number of cases",
    barmode="group",
    bargap=0.1,
    xaxis = dict(
        tickmode = 'linear',
        tick0 = 0,
        dtick = 10),
    font=dict(
        family="Arial, monospace",
        size=15,
        color="#7f7f7f"
    )
    )
py.iplot(fig4)

The histogram shows the number of deceased cases by age. The median is 69 years old indicating that the COVID-19 to be deadly for older people. There are few young outliers who may have been cases with pre-existing conditions. However, we can draw a conclusion here that COVID-19 have a higher fatality for older people. However, for the released cases the median age is 37 years old. Most survivors of the COVID-19 are the young population. However, there is an outlier that is 84 years old and many cases that are above 60+ years old that survived the COVID-19. 

In [None]:
PatientInfo['Age'] = 2020-PatientInfo['birth_year']
fig5 = px.histogram(PatientInfo[PatientInfo['state']=='isolated'],x="Age",marginal="box",nbins=20)
fig5.update_layout(
    title = "age distibution of isolated cases ",
    xaxis_title="Age",
    yaxis_title="number of cases",
    barmode="group",
    bargap=0.1,
    xaxis = dict(
        tickmode = 'linear',
        tick0 = 0,
        dtick = 10),
    font=dict(
        family="Arial, monospace",
        size=15,
        color="#7f7f7f"
    )
    )
py.iplot(fig5)

In [None]:

SaudiCity = pd.read_csv("../input/saudicovid19/saudi_cityy.csv" )


temp = SaudiCity.sort_values('Total', ascending=True)
#state_order = SaudiCity['City']

fig = px.bar(temp,x="Total", y="city ", color='Total', title='Total Cases per City in March', orientation='h', width=800,color_discrete_sequence=px.colors.qualitative.Vivid)
fig.show()



# Saudi Arabia
Saudi Arabia is still at the beginning of the epidemic curve. The cases are growing exponentially as observed in other countries, some of which have performed containment measures. Saudi Arabia containment measures shows some promise that would enable it to flatten the high point of the epidemic curve. The current cases distribution for March are displayed above. 

# Conclusion
In this report, we have started by showing countries with different geographic areas, demographics and containment measures. We have selected South Korea, which has publicly available and updated dataset, to compare it with Saudi Arabia cases growth. South Korea has applied strict containment measures like Saudi Arabia. We investigated the age distribution of the cases in South Korea and found that COVID-19 proved to be more fatal to older people while younger people tend to be released after showing recovery. The released and deceased ratio shown for South Korea showed a dip in deceased cases in the middle of the spread in early March, after which there has been a steady increase in recovered cases. We expect Saudi Arabia to follow the same pattern on a smaller scale as it must be taken into note the different daily routines and population sizes of the two countries. 51 Billion for South Korea vs 32.94 million for Saudi Arabia.

**Credits**: Our contributions in this report is the dataset for the Saudi Arabia COVID19 as of March 28th that maps COVID19 cases to Saudi Cities and statistical analysis. Saudi Arabia cities COVID19 dataset was compiled manually by Hadeel Al-Dhubaiban and the illustrations and analysis done by Asaad AlGhamdi & Hadeel Al-Dhubaiban. Other datasets used are COVID19 public Kaggle datasets.