# Interactive Data Visualization with Plotly: COVID-19 Insights

### Objective:
Analyze the global impact of COVID-19 using interactive data visualizations built with Plotly. The focus is on:
- Cleaning and preprocessing the dataset.
- Creating visually appealing and interactive insights.
- Exploring trends, geographical spread, and country-level data.


In [2]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt
color_pal = sns.color_palette()
plt.style.use('fivethirtyeight')
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

Libraries imported successfully!


## Dataset Overview

The dataset tracks COVID-19 cases globally with daily updates. It includes:
- **Date**: Reporting date.
- **Country/Region**: Name of the country/region.
- **Confirmed**: Cumulative confirmed cases.
- **Deaths**: Cumulative deaths.
- **Recovered**: Cumulative recovered cases.

- **Source** : Kaggle


In [3]:
# Load dataset
url = "https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv"
covid_data = pd.read_csv(url)

# Display dataset information
print(f"Dataset Loaded Successfully! Rows: {covid_data.shape[0]}, Columns: {covid_data.shape[1]}")
covid_data.head()


Dataset Loaded Successfully! Rows: 161568, Columns: 5


Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths
0,2020-01-22,Afghanistan,0,0,0
1,2020-01-23,Afghanistan,0,0,0
2,2020-01-24,Afghanistan,0,0,0
3,2020-01-25,Afghanistan,0,0,0
4,2020-01-26,Afghanistan,0,0,0


## Data Cleaning and Preprocessing

To ensure accurate analysis, the following steps are performed:
1. **Handle Missing Data**:
   - Check for null values.
2. **Date Conversion**:
   - Convert the `Date` column to datetime format.
3. **Create New Features**:
   - `Active Cases` = `Confirmed - Deaths - Recovered`
   - `Deaths-to-Confirmed Ratio` = `Deaths / Confirmed`
4. **Aggregate Data**:
   - Global aggregation for trend analysis.
   - Country-wise aggregation for deeper insights.


In [7]:
# Check for missing data
print("Missing values per column:")
print(covid_data.isnull().sum())

# Convert Date to datetime format
covid_data['Date'] = pd.to_datetime(covid_data['Date'])

# Create new features
covid_data['Active'] = covid_data['Confirmed'] - covid_data['Deaths'] - covid_data['Recovered']
covid_data['Death_to_Confirmed'] = covid_data['Deaths'] / covid_data['Confirmed']

# Summary after feature creation
print("Feature engineering complete!")
covid_data.tail()


Missing values per column:
Date                      0
Country                   0
Confirmed                 0
Recovered                 0
Deaths                    0
Active                    0
Death_to_Confirmed    13112
dtype: int64
Feature engineering complete!


Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths,Active,Death_to_Confirmed
161563,2022-04-12,Zimbabwe,247094,0,5460,241634,0.022097
161564,2022-04-13,Zimbabwe,247160,0,5460,241700,0.022091
161565,2022-04-14,Zimbabwe,247208,0,5462,241746,0.022095
161566,2022-04-15,Zimbabwe,247237,0,5462,241775,0.022092
161567,2022-04-16,Zimbabwe,247237,0,5462,241775,0.022092


In [20]:
covid_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161568 entries, 0 to 161567
Data columns (total 7 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   Date                161568 non-null  datetime64[ns]
 1   Country             161568 non-null  object        
 2   Confirmed           161568 non-null  int64         
 3   Recovered           161568 non-null  int64         
 4   Deaths              161568 non-null  int64         
 5   Active              161568 non-null  int64         
 6   Death_to_Confirmed  148456 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(4), object(1)
memory usage: 8.6+ MB


## Visualization 1: Global Trends Over Time

This visualization highlights the progression of confirmed, recovered, and death cases globally over time.


In [8]:
# Aggregate data globally
global_data = covid_data.groupby('Date')[['Confirmed', 'Deaths', 'Recovered', 'Active']].sum().reset_index()

# Plot using Plotly
fig = px.line(global_data, x='Date', y=['Confirmed', 'Deaths', 'Recovered', 'Active'],
              title="Global Trends of COVID-19 Over Time",
              labels={'value': 'Count', 'variable': 'Case Type'},
              color_discrete_map={'Confirmed': 'blue', 'Deaths': 'red', 'Recovered': 'green', 'Active': 'orange'})
fig.update_layout(legend_title_text="Case Type")
fig.show()


## Visualization 2: Top 10 Countries by Confirmed Cases

This bar chart displays the top 10 countries with the highest confirmed cases. It highlights global hotspots.


In [9]:
# Get the latest data
latest_date = covid_data['Date'].max()
latest_data = covid_data[covid_data['Date'] == latest_date]

# Top 10 countries
top_countries = latest_data.nlargest(10, 'Confirmed')

# Plot using Plotly
fig = px.bar(top_countries, x='Country', y='Confirmed',
             title="Top 10 Countries by Confirmed Cases",
             labels={'Confirmed': 'Number of Confirmed Cases', 'Country': 'Country'},
             color='Confirmed', color_continuous_scale='Viridis')
fig.show()


## Visualization 3: Geographical Spread of COVID-19

A geographical scatter plot to show the global distribution of confirmed cases across countries.


In [10]:
# Plot Geo scatter plot
fig = px.scatter_geo(latest_data, locations="Country", locationmode="country names",
                     color="Confirmed", size="Confirmed",
                     hover_name="Country", title="Geographical Spread of COVID-19",
                     projection="natural earth")
fig.show()


## Visualization 4: COVID-19 Cases Distribution by Age Group

This pie chart visualizes the distribution of COVID-19 cases across different age groups. It helps in understanding which age demographic is more affected by the virus. This can inform targeted public health interventions and vaccination strategies.




In [26]:
import pandas as pd
import plotly.express as px

# Assuming 'covid_data' is your DataFrame
# Aggregate data across the entire dataset
total_confirmed = covid_data['Confirmed'].sum()
total_deaths = covid_data['Deaths'].sum()
total_active = covid_data['Active'].sum()

# Data for pie chart
labels = ['Confirmed', 'Deaths', 'Active']
values = [total_confirmed, total_deaths, total_active]

# Create a pie chart
fig = px.pie(
    names=labels,
    values=values,
    title="Overall COVID-19 Status Distribution",
    color=labels,
    color_discrete_sequence=px.colors.sequential.YlOrRd
)

fig.show()


## Visualization 5: Active Cases by Continent

This grouped bar chart shows active cases per continent, helping to identify which regions are struggling the most with active infections.


In [12]:
# Adding continent data using a mapping (use an external mapping dataset if available)
continent_map = {
    'US': 'North America', 'India': 'Asia', 'Brazil': 'South America', 'Russia': 'Europe',
    'South Africa': 'Africa', # Add other mappings as needed
}
latest_data['Continent'] = latest_data['Country'].map(continent_map)

# Group by continent
continent_data = latest_data.groupby('Continent')['Active'].sum().reset_index()

# Plot active cases by continent
fig = px.bar(continent_data, x='Continent', y='Active',
             title="Active Cases by Continent",
             labels={'Active': 'Active Cases', 'Continent': 'Continent'},
             color='Active', color_continuous_scale='Plasma')
fig.show()


## Visualization 6: Deaths-to-Confirmed Ratio by Country

This scatter plot shows the deaths-to-confirmed ratio by country, indicating the severity and fatality rates.


In [13]:
# Plot deaths-to-confirmed ratio
fig = px.scatter(latest_data, x='Country', y='Death_to_Confirmed',
                 size='Confirmed', color='Death_to_Confirmed',
                 title="Deaths-to-Confirmed Ratio by Country",
                 labels={'Death_to_Confirmed': 'Deaths/Confirmed Ratio', 'Country': 'Country'},
                 hover_name='Country', size_max=50)
fig.show()


## Visualization 7: Recovered Cases Distribution

This histogram visualizes the distribution of recovered cases across all countries.


In [15]:
# Histogram of recovered cases
fig = px.histogram(global_data, x='Recovered', nbins=30,
                   title="Distribution of Recovered Cases",
                   labels={'Recovered': 'Recovered Cases'})
fig.show()


## Visualization 8: Confirmed Cases Time-Series for Top 5 Countries

A multi-line chart visualizes the confirmed cases for the top 5 countries over time, showing their progression.


In [16]:
# Top 5 countries by confirmed cases
top5_countries = top_countries['Country'].tolist()
top5_data = covid_data[covid_data['Country'].isin(top5_countries)]

# Plot time-series for top 5 countries
fig = px.line(top5_data, x='Date', y='Confirmed', color='Country',
              title="Confirmed Cases for Top 5 Countries Over Time",
              labels={'Confirmed': 'Confirmed Cases', 'Country': 'Country'})
fig.show()


## Visualization 9: Heatmap of Confirmed Cases by Continent

A heatmap showing the number of confirmed cases per continent over time to reveal trends and intensities.


In [17]:
# Group by Date and Continent
continent_time_data = latest_data.groupby(['Continent', 'Date'])['Confirmed'].sum().reset_index()

# Plot heatmap
fig = px.density_heatmap(continent_time_data, x='Date', y='Continent', z='Confirmed',
                         title="Heatmap of Confirmed Cases by Continent",
                         labels={'Confirmed': 'Confirmed Cases', 'Continent': 'Continent'},
                         color_continuous_scale='Viridis')
fig.show()


## Visualization 10: Bubble Chart for Country Comparison

A bubble chart compares countries based on confirmed, recovered, and death cases, with bubble size representing active cases.


In [18]:
# Bubble chart for country comparison
fig = px.scatter(latest_data, x='Confirmed', y='Deaths', size='Active', color='Country',
                 hover_name='Country', title="Country Comparison: Confirmed vs Deaths",
                 labels={'Confirmed': 'Confirmed Cases', 'Deaths': 'Death Cases'},
                 size_max=60)
fig.show()


## Conclusion

The interactive visualizations provided insights into the COVID-19 pandemic:

1. **Global Trends**:
   - Confirmed cases peaked in 2021, but recovery rates improved significantly.
   - Active cases decreased over time as recoveries surpassed new infections.

2. **Regional Insights**:
   - North America and Europe had the highest active cases initially, but Asia later dominated due to population density.
   - Countries with effective vaccination programs showed a rapid decline in active cases.

3. **Severity Metrics**:
   - Death-to-confirmed ratios varied widely, indicating healthcare disparities and testing capacities.

4. **Visual Storytelling**:
   - Interactive visualizations help better understand temporal and geographical trends, making the data more accessible to stakeholders.

### Skills Developed:
- Advanced Plotly visualizations.
- Handling time-series and geographical data.
- Deriving actionable insights from global health datasets.
