<a href="https://www.kaggle.com/code/ainurrohmanbwx/covid-19-analytics?scriptVersionId=146918615" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

In the analysis of COVID-19 data, the primary focus will be on understanding and monitoring the pandemic situation in various countries. In this context, data related to the total number of cases, deaths, recoveries, and active cases will be collected and presented by country. Additionally, the evolving trends of confirmed deaths, recoveries, and active cases over time will be examined. The analysis will also consider the ratio of population to the number of tests conducted to assess the extent to which testing efforts have had an impact. With a focus on the most affected countries, this analysis will identify nations with the highest numbers of confirmed cases, deaths, active cases, and total recoveries, providing essential insights for evaluating the impact of this pandemic.In the analysis of COVID-19 data, the primary focus will be on understanding and monitoring the pandemic situation in various countries. In this context, data related to the total number of cases, deaths, recoveries, and active cases will be collected and presented by country. Additionally, the evolving trends of confirmed deaths, recoveries, and active cases over time will be examined. The analysis will also consider the ratio of population to the number of tests conducted to assess the extent to which testing efforts have had an impact. With a focus on the most affected countries, this analysis will identify nations with the highest numbers of confirmed cases, deaths, active cases, and total recoveries, providing essential insights for evaluating the impact of this pandemic.

# Load Data

In [1]:
import os
import pandas as pd

data_folder = '/kaggle/input/corona-virus-report'
data_files = os.listdir(data_folder)

data = [pd.read_csv(os.path.join(data_folder, file)) for file in data_files]
world_data, day_wise, group_data, usa_data, province_data = data[4], data[2], data[5], data[3], data[0]

# Let's analyze the data

#### Which country has the highest total cases, deaths, recoveries and active users?

In [2]:
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa


In [3]:
world_data.shape

(209, 16)

In [4]:
world_data.columns

Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
       'TotalTests', 'Tests/1M pop', 'WHO Region'],
      dtype='object')

**Features Explanation**

- **Country/Region**: This is the name of the country or geographic region associated with the COVID-19 data in the dataset.
- **Continent**: This is the continent on which the country or region is located. For example, North America, Europe, Asia, etc.
- **Population**: The total population of the country or region.
- **TotalCases**: This is the total number of COVID-19 cases reported in the country or region to date.
- **NewCases**: This is the number of new COVID-19 cases reported in a specific time period, often daily or weekly.
- **TotalDeaths**: Total COVID-19 deaths reported in the country or region to date.
- **NewDeaths**: This is the number of new deaths from COVID-19 reported in a specific time period, often daily or weekly.
- **TotalRecovered**: The total number of patients who have recovered from COVID-19 in the country or region.
- **NewRecovered**: The number of patients who recovered from COVID-19 in a specific time period, often daily or weekly.
- **ActiveCases**: The current number of active cases of COVID-19, which is the difference between total cases, total deaths and total recoveries.
- **Serious,Critical**: Number of serious or critical cases requiring intensive medical care.
- **Tot Cases/1M pop**: This is the total number of COVID-19 cases per million population.
- **Deaths/1M pop**: This is the total number of deaths from COVID-19 per million population.
- **TotalTests**: The total number of COVID-19 tests that have been carried out in the country or region.
- **Tests/1M pop**: This is the total number of COVID-19 tests per million residents.
- **WHO Region**: A region defined by the World Health Organization (WHO) that includes that country or region within the framework of that organization.

In [5]:
import plotly.express as px

# Create a color scale for the treemaps
color_scale = px.colors.qualitative.Plotly

columns = ['TotalCases', 'TotalDeaths', 'TotalRecovered', 'ActiveCases']

# Create a dictionary to specify labels for columns
labels = {
    'TotalCases': 'Total Cases',
    'TotalDeaths': 'Total Deaths',
    'TotalRecovered': 'Total Recovered',
    'ActiveCases': 'Active Cases'
}

for i in columns:
    # Create the treemap
    fig = px.treemap(
                     world_data.iloc[:20],
                     values=i,
                     path=['Country/Region'],
                     color=i,
                     color_continuous_scale=color_scale
    )

    # Customize the layout
    fig.update_layout(
        title=f'Treemap of {labels[i]} by Country',
        margin=dict(l=0, r=0, b=0, t=40),
    )

    # Customize the color scale and labels
    fig.update_traces(
        marker_line_width=1.5,
        hovertemplate='<b>%{label}</b><br>%{value}<extra></extra>',
        textinfo='label+value+percent parent'
    )

    # Show the plot
    fig.show()

#### What is the trajectory of confirmed deaths, recoveries, and active cases?

In [6]:
day_wise.head()

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,No. of countries
0,2020-01-22,555,17,28,510,0,0,0,3.06,5.05,60.71,6
1,2020-01-23,654,18,30,606,99,1,2,2.75,4.59,60.0,8
2,2020-01-24,941,26,36,879,287,8,6,2.76,3.83,72.22,9
3,2020-01-25,1434,42,39,1353,493,16,3,2.93,2.72,107.69,11
4,2020-01-26,2118,56,52,2010,684,14,13,2.64,2.46,107.69,13


In [7]:
day_wise.shape

(188, 12)

In [8]:
day_wise.columns

Index(['Date', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases',
       'New deaths', 'New recovered', 'Deaths / 100 Cases',
       'Recovered / 100 Cases', 'Deaths / 100 Recovered', 'No. of countries'],
      dtype='object')

**Features Explanation**

- **Date**: This feature represents the date or timestamp for which the COVID-19 data is reported. It's essential for tracking the progression of the pandemic over time.
- **Confirmed**: This is the total number of confirmed COVID-19 cases in a specific location (such as a country or region) up to the given date.
- **Deaths**: The total number of deaths attributed to COVID-19 in the same location up to the given date.
- 'Recovered**: The total number of individuals who have recovered from COVID-19 in the same location up to the given date.
- 'Active': The number of active COVID-19 cases, which is calculated by subtracting the total deaths and total recoveries from the total confirmed cases.
- **New cases**: This represents the number of new confirmed COVID-19 cases reported on the specific date.
- **New deaths**: The count of new COVID-19-related deaths reported on the specific date.
- **New recovered**: The number of individuals who have newly recovered from COVID-19 on the specific date.
- **Deaths / 100 Cases**: This is a percentage, calculated by dividing the total number of deaths by the total number of confirmed cases and multiplying by 100. It indicates the percentage of cases that result in death.
- **Recovered / 100 Cases**: Similar to 'Deaths / 100 Cases,' this percentage is calculated by dividing the total number of recoveries by the total number of confirmed cases and multiplying by 100. It shows the percentage of cases that have resulted in recovery.
- **Deaths / 100 Recovered**: This percentage is calculated by dividing the total number of deaths by the total number of recoveries and multiplying by 100. It reflects the percentage of those who have recovered but later succumbed to the virus.
- **No. of countries**: This feature provides the count of the number of countries or regions for which the COVID-19 data is reported on the specific date, indicating the geographic coverage of the dataset.

In [9]:
import plotly.express as px

# Create a color scale for the lines
color_scale = px.colors.qualitative.Set1

fig = px.line(
              day_wise,
              x='Date',
              y=['Confirmed', 'Deaths', 'Recovered', 'Active'],
              color_discrete_sequence=color_scale
)

# Customize the layout with meaningful titles
fig.update_layout(
    title='COVID-19 Cases Progression Over Time',
    xaxis_title='Date',
    yaxis_title='Count',
    legend_title='Category',
    margin=dict(l=50, r=20, b=30, t=60),
)

# Show the plot
fig.show()

#### How can the ratio of population to the number of tests conducted be effectively represented graphically?

In [10]:
tests_per_population_ratio = world_data['Population']/world_data['TotalTests'].iloc[:20]

fig = px.bar(
             world_data.iloc[:20],
             x='Country/Region',
             y=tests_per_population_ratio[:20],
             color='Country/Region',
             title='Tests per Population Ratio by Country',
             color_discrete_sequence=px.colors.qualitative.Set1
)

# Customize the layout
fig.update_layout(
    xaxis_title="Country/Region",  # No x-axis title
    yaxis_title='Tests per Population Ratio',
    xaxis_tickangle=-45,  # Rotate x-axis labels for better readability
    margin=dict(l=60, r=20, b=60, t=40),
)
# Add a y-axis label
fig.update_yaxes(title_text='Tests per Population Ratio')

# Show the plot
fig.show()

#### Which 20 nations have been significantly impacted by the coronavirus?

In [11]:
# Create a DataFrame with the top 20 rows from world_data
top_20_data = world_data.iloc[:20]

# Define the colors for the bars
colors = ['rgba(31, 119, 180, 0.8)', 'rgba(255, 127, 14, 0.8)', 'rgba(44, 160, 44, 0.8)', 'rgba(214, 39, 40, 0.8)', 'rgba(148, 103, 189, 0.8)']

# Create a Plotly Express figure
fig = px.bar(
    top_20_data,
    x='Country/Region',
    y=['Serious,Critical', 'TotalDeaths', 'TotalRecovered', 'ActiveCases', 'TotalCases'],
    labels={'x': 'Country/Region', 'value': 'Count'},
    title='COVID-19 Statistics by Country/Region (Top 20)',
    color_discrete_sequence=colors
)

# Customize the layout
fig.update_layout(
    xaxis_title_text="Country/Region",
    yaxis_title="Count",
    xaxis_tickangle=-45,
    legend_title_text='',
    title_text="COVID-19 Statistics by Country (Top 20)",
    title_font_size=20,
    showlegend=True,
)

# Show the figure
fig.show()

#### Which 20 countries are the hardest hit by the highest number of confirmed cases?

In [12]:
import plotly.graph_objects as go

# Sort the data
sorted_data = world_data.sort_values(by='TotalCases', ascending=False)[:20]  # Change 'ActiveCases' to 'TotalCases'

# Create a color scale for the bars
color_scale = px.colors.sequential.Viridis[::-1]

fig = go.Figure()

# Create a horizontal bar chart
fig.add_trace(
    go.Bar(
        x=sorted_data['TotalCases'],  # Change 'ActiveCases' to 'TotalCases'
        y=sorted_data['Country/Region'],
        orientation='h',
        marker=dict(color=sorted_data['TotalCases'], colorscale=color_scale),  # Change 'ActiveCases' to 'TotalCases'
        text=sorted_data['TotalCases'],  # Change 'ActiveCases' to 'TotalCases'
        textposition='inside',
        texttemplate='%{text}',
    )
)

# Customize the layout
fig.update_layout(
    title='Top 20 Countries by Total Cases',  # Update the title
    xaxis_title='Total Cases',  # Update the x-axis title
    yaxis_title='Country',  # You can add a label for the y-axis if needed
    showlegend=False,  # Hide the legend
    margin=dict(l=120, r=20, t=70, b=20),  # Adjust margins
)

# Customize color bar
fig.update_coloraxes(colorbar_title='Total Cases')  # Update the color bar title
fig.update_coloraxes(showscale=True)

fig.show()

#### Which 20 countries are the hardest hit by the highest number of total deaths?

In [13]:
# Sort the data
sorted_data = world_data.sort_values(by='TotalDeaths', ascending=False)[:20]

# Create a color scale for the bars
color_scale = px.colors.sequential.Viridis[::-1]

fig = go.Figure()

# Create a horizontal bar chart
fig.add_trace(
    go.Bar(
        x=sorted_data['TotalDeaths'],
        y=sorted_data['Country/Region'],
        orientation='h',
        marker=dict(color=sorted_data['TotalDeaths'], colorscale=color_scale),
        text=sorted_data['TotalDeaths'],
        textposition='inside',
        texttemplate='%{text}',
    )
)

# Customize the layout
fig.update_layout(
    title='Top 20 Countries by Total Deaths',
    xaxis_title='Total Deaths',
    yaxis_title='Country',  # You can add a label for the y-axis if needed
    showlegend=False,  # Hide the legend
    margin=dict(l=120, r=20, t=70, b=20),  # Adjust margins
)

# Customize color bar
fig.update_coloraxes(colorbar_title='Total Deaths')
fig.update_coloraxes(showscale=True)

fig.show()

#### Which 20 countries are the hardest hit by the highest number of active cases?

In [14]:
# Sort the data
sorted_data = world_data.sort_values(by='ActiveCases', ascending=False)[:20]

# Create a color scale for the bars
color_scale = px.colors.sequential.Viridis[::-1]

fig = go.Figure()

# Create a horizontal bar chart
fig.add_trace(
    go.Bar(
        x=sorted_data['ActiveCases'],
        y=sorted_data['Country/Region'],
        orientation='h',
        marker=dict(color=sorted_data['ActiveCases'], colorscale=color_scale),
        text=sorted_data['ActiveCases'],
        textposition='inside',
        texttemplate='%{text}',
    )
)

# Customize the layout
fig.update_layout(
    title='Top 20 Countries by Active Cases',
    xaxis_title='Active Cases',
    yaxis_title='Country',  # You can add a label for the y-axis if needed
    showlegend=False,  # Hide the legend
    margin=dict(l=120, r=20, t=70, b=20),  # Adjust margins
)

# Customize color bar
fig.update_coloraxes(colorbar_title='Active Cases')
fig.update_coloraxes(showscale=True)

fig.show()

#### Which 20 countries are the hardest hit by the highest number of recovered cases?

In [15]:
# Sort the data
sorted_data = world_data.sort_values(by='TotalRecovered', ascending=False)[:20]

# Create a color scale for the bars
color_scale = px.colors.sequential.Viridis[::-1]

fig = go.Figure()

# Create a horizontal bar chart
fig.add_trace(
    go.Bar(
        x=sorted_data['TotalRecovered'],  # Change 'ActiveCases' to 'TotalRecovered'
        y=sorted_data['Country/Region'],
        orientation='h',
        marker=dict(color=sorted_data['TotalRecovered'], colorscale=color_scale),  # Change 'ActiveCases' to 'TotalRecovered'
        text=sorted_data['TotalRecovered'],  # Change 'ActiveCases' to 'TotalRecovered'
        textposition='inside',
        texttemplate='%{text}',
    )
)

# Customize the layout
fig.update_layout(
    title='Top 20 Countries by Total Recovered',  # Update the title
    xaxis_title='Total Recovered',  # Update the x-axis title
    yaxis_title='Country',  # You can add a label for the y-axis if needed
    showlegend=False,  # Hide the legend
    margin=dict(l=120, r=20, t=70, b=20),  # Adjust margins
)

# Customize color bar
fig.update_coloraxes(colorbar_title='Total Recovered')  # Update the color bar title
fig.update_coloraxes(showscale=True)

fig.show()

#### Which pie chart illustrates the statistics for the most severely impacted regions?

In [16]:
# Your existing data and code
labels = world_data[:15]['Country/Region'].values
cases = ['TotalCases', 'TotalDeaths', 'TotalRecovered', 'ActiveCases']

# Define colors for the pie slices
colors = px.colors.qualitative.Set3

for i in cases:
    fig = px.pie(
        world_data[:15], 
        values=i, 
        names=labels, 
        hole=0.3,
        title=f'Distribution of {i} in Top 15 Countries',
        color_discrete_sequence=colors
    )

    # Customizing the layout
    fig.update_traces(
        textinfo="percent+label",
        pull=[0.1, 0, 0],
        marker=dict(line=dict(color='white', width=2))
    )

    # You can further customize the layout as needed
    fig.update_layout(
        title_x=0.5,  # Center the title
        legend_title_text='Countries',
        showlegend=True,
    )

    # Show the plot
    fig.show()

#### What is the ratio of deaths to confirmed cases?

In [17]:
death_to_case_ratio = world_data['TotalDeaths']/world_data['TotalCases']                               

# Customize the color scale for the bars
color_scale = px.colors.qualitative.Pastel

fig = px.bar(
    world_data, 
    x='Country/Region', 
    y=death_to_case_ratio, 
    title="Death-to-Case Ratio by Country",
    labels={'Country/Region': 'Country', 'death_to_case_ratio': 'Death-to-Case Ratio'},
    color=death_to_case_ratio,
    color_continuous_scale=color_scale,
)

# Customizing the layout
fig.update_layout(
    xaxis_title="Country",
    yaxis_title="Death-to-Case Ratio",
    title_x=0.5,  # Center the title
)

# Show the plot
fig.show()

#### What is the ratio of deaths to recoveries?

In [18]:
death_to_recovery_ratio = world_data['TotalDeaths']/world_data['TotalRecovered']

# Customize the color scale for the bars
color_scale = px.colors.qualitative.Set2

fig = px.bar(
    world_data, 
    x='Country/Region', 
    y=death_to_recovery_ratio, 
    title="Death-to-Recovery Ratio by Country",
    labels={'Country/Region': 'Country', 'death_to_recovery_ratio': 'Death-to-Recovery Ratio'},
    color=death_to_recovery_ratio,
    color_continuous_scale=color_scale,
)

# Customizing the layout
fig.update_layout(
    xaxis_title="Country",
    yaxis_title="Death-to-Recovery Ratio",
    title_x=0.5,  # Center the title
)

# Show the plot
fig.show()

#### What is the ratio of serious cases to deaths?

In [19]:
serious_to_death_ratio = world_data['Serious,Critical']/world_data['TotalDeaths']

# Customize the color scale for the bars
color_scale = px.colors.qualitative.Plotly

fig = px.bar(
    world_data, 
    x='Country/Region', 
    y=serious_to_death_ratio, 
    title="Serious-to-Death Ratio by Country",
    labels={'Country/Region': 'Country', 'serious_to_death_ratio': 'Serious-to-Death Ratio'},
    color=serious_to_death_ratio,
    color_continuous_scale=color_scale,
)

# Customizing the layout
fig.update_layout(
    xaxis_title="Country",
    yaxis_title="Serious-to-Death Ratio",
    title_x=0.5,  # Center the title
)

# Show the plot
fig.show()

#### Create a visual representation of the complete statistics for a specific country, including confirmed, active, recovered, and deceased cases.

In [20]:
from plotly.subplots import make_subplots
import plotly.graph_objs as go

def country_visualization(df, country):
    data = df[df['Country/Region'] == country]
    data = data[['Date', 'Confirmed', 'Deaths', 'Recovered', 'Active']]

    # Create subplots
    fig = make_subplots(rows=1, cols=4, subplot_titles=('Confirmed', 'Deaths', 'Recovered', 'Active'))

    # Define a common trace style
    trace_style = dict(mode='lines+markers', marker=dict(size=4), line=dict(width=2))

    # Create traces and add them to subplots
    for i, column in enumerate(['Confirmed', 'Deaths', 'Recovered', 'Active']):
        trace = go.Scatter(
            name=column,
            x=data['Date'],
            y=data[column],
            **trace_style
        )
        fig.add_trace(trace, row=1, col=i + 1)

    # Configure subplot layout
    fig.update_yaxes(title_text='Count', row=1, col=1)

    # Update subplot titles
    fig.update_layout(title=f'{country} COVID-19 Data', title_x=0.5)

    # Show the plot
    fig.show()

In [21]:
country_visualization(group_data, 'Brazil')