
# Advanced Visualization Techniques Assignment  
## School-Age Digital Connectivity

---

### Introduction  
In this notebook, we explore the *School-Age Digital Connectivity* dataset using advanced visualization techniques.  
We apply:  
- Static and interactive charts  
- Storytelling with multiple visuals  
- A comprehensive dashboard (8 visuals combined)  

This helps us understand how **region, income, and demographics** affect school-age digital connectivity worldwide.  


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Load dataset
df = pd.read_excel(
    "School-Age-Digital-Connectivity Dataset.xlsx", sheet_name="Total school age"
)
# Ensure expected columns
df.columns = [
    "ISO3",
    "Countries and areas",
    "Region",
    "Sub-region",
    "Income Group",
    "Total",
    "Rural",
    "Urban",
    "Poorest",
    "Richest",
    "Data source",
    "Time period",
]
# Convert numeric columns
for col in ["Total", "Rural", "Urban", "Poorest", "Richest"]:
    df[col] = pd.to_numeric(df[col], errors="coerce")

print("Loaded", df.shape[0], "rows")
df.head()

Loaded 87 rows


Unnamed: 0,ISO3,Countries and areas,Region,Sub-region,Income Group,Total,Rural,Urban,Poorest,Richest,Data source,Time period
0,DZA,Algeria,MENA,MENA,Upper middle income (UM),0.237807,0.091202,0.322005,0.005888,0.772502,Multiple Indicator Cluster Survey,2018-19
1,AGO,Angola,SSA,ESA,Lower middle income (LM),0.165507,0.017646,0.243431,0.0,0.624589,Demographic and Health Survey,2015-16
2,ARG,Argentina,LAC,LAC,Upper middle income (UM),0.398849,,,,,Multiple Indicator Cluster Survey,2011-12
3,ARM,Armenia,ECA,EECA,Upper middle income (UM),0.809218,0.709329,0.883609,0.46602,0.991219,Demographic and Health Survey,2015-16
4,BGD,Bangladesh,SA,SA,Lower middle income (LM),0.366474,0.327793,0.516416,0.087617,0.758899,Multiple Indicator Cluster Survey,2019



## 1. Bar Chart — Median Total Connectivity by Region  

**Why this chart?**  
Bar charts are effective for comparing **categories**. Here, we compare the **median Total connectivity** across different regions.  

**Insights:**  
- Highlights which regions are more digitally connected.  
- Reveals disparities between continents.  

**Design principles:**  
- Ordered bars by median values.  
- Simple color scheme for readability.  


In [20]:

median_by_region = df.groupby('Region')['Total'].median().sort_values(ascending=False)
fig = px.bar(median_by_region, x=median_by_region.index, y=median_by_region.values,
             title="Median Total Connectivity by Region", height=800,
             labels={'x': 'Region', 'y': 'Median Total'},
             color=median_by_region.values, color_continuous_scale="Blues")
fig.show()



## 2. Histogram — Distribution of Total Connectivity  

**Why this chart?**  
Histograms show the **distribution** of a numerical variable. We use it for the `Total` connectivity to see how countries are spread across connectivity levels.  

**Insights:**  
- Identifies clusters of countries with similar connectivity.  
- Detects skewness (e.g., many countries with low connectivity).  

**Design principles:**  
- Clear bin size for readability.  
- Neutral colors to focus on distribution shape.  


In [14]:

fig = px.histogram(df, x="Total", nbins=20, title="Distribution of Total Connectivity",
                   labels={"Total": "Total Connectivity (%)"}, height=800,
                   color_discrete_sequence=["steelblue"])
fig.show()



## 3. Pie/Donut Chart — Distribution of Income Groups  

**Why this chart?**  
Pie charts show **proportions**. Here, we see the share of countries in different `Income Groups`.  

**Insights:**  
- Helps connect income status with connectivity potential.  
- Useful for contextual understanding before deeper analysis.  

**Design principles:**  
- Donut style (`hole=0.4`) improves readability.  
- Labels shown on hover for clarity.  


In [13]:

income_counts = df['Income Group'].fillna('Unknown').value_counts()
fig = px.pie(values=income_counts.values, names=income_counts.index, hole=0.4, height=800,
             title="Proportion of Countries by Income Group",
             color_discrete_sequence=px.colors.qualitative.Set2)
fig.show()



## 4. Box Plot — Connectivity by Income Group  

**Why this chart?**  
Box plots show the **distribution** and **spread** of data across categories. We analyze `Total` connectivity across `Income Groups`.  

**Insights:**  
- Reveals inequality within and between groups.  
- Detects outliers (countries performing above/below expectations).  

**Design principles:**  
- Colors used consistently for income groups.  
- Whiskers/outliers clearly marked.  


In [12]:

fig = px.box(df, x="Income Group", y="Total", color="Income Group",
             title="Connectivity Distribution by Income Group",
             points="all", height=800)
fig.show()



## 5. Scatter Plot — Urban vs Rural Connectivity  

**Why this chart?**  
Scatter plots show relationships between two variables. We compare `Urban` vs `Rural` connectivity.  

**Insights:**  
- Shows whether urban-rural divide is large.  
- Identifies countries with high urban connectivity but low rural connectivity.  

**Design principles:**  
- Colors by region for clarity.  
- Tooltips with country names.  


In [15]:

fig = px.scatter(df, x="Rural", y="Urban", color="Region",
                 hover_name="Countries and areas", height=800,
                 title="Urban vs Rural Connectivity by Region",
                 labels={"Rural":"Rural Connectivity (%)","Urban":"Urban Connectivity (%)"})
fig.show()



## 6. Bubble Plot — Urban vs Rural (Size = Total)  

**Why this chart?**  
Bubble plots extend scatter plots by adding a **third dimension** (bubble size). Here, bubble size represents `Total` connectivity.  

**Insights:**  
- Highlights countries with overall high/low connectivity.  
- Visualizes inequality in 3 dimensions.  

**Design principles:**  
- Bubble size scaled fairly to avoid distortion.  
- Transparent colors to reduce overlap issues.  


In [16]:

fig = px.scatter(df, x="Rural", y="Urban", size="Total", color="Region", height=800,
                 hover_name="Countries and areas",
                 title="Urban vs Rural Connectivity (Bubble Size = Total)",
                 labels={"Rural":"Rural Connectivity (%)","Urban":"Urban Connectivity (%)"})
fig.show()



## 7. Heatmap — Correlation of Connectivity Variables  

**Why this chart?**  
Heatmaps are ideal for showing **correlation matrices**. We use it for `Total`, `Rural`, `Urban`, `Poorest`, `Richest`.  

**Insights:**  
- Identifies strong correlations (e.g., Urban vs Richest).  
- Highlights where inequalities exist.  

**Design principles:**  
- Diverging color palette centered at 0.  
- Numeric annotations included.  


In [17]:

corr = df[["Total", "Rural", "Urban", "Poorest", "Richest"]].corr()
fig = px.imshow(corr, text_auto=True, color_continuous_scale="RdBu", height=800,
                title="Correlation of Connectivity Variables")
fig.show()



## 8. Choropleth Map — Global Connectivity  

**Why this chart?**  
Maps are essential for **geographic data**. Choropleth maps show `Total` connectivity by country using ISO3 codes.  

**Insights:**  
- Global perspective on connectivity.  
- Reveals regional inequalities (Africa vs Europe, etc.).  

**Design principles:**  
- Sequential color palette (`Blues`) for intensity.  
- Hover labels with country names.  


In [18]:

fig = px.choropleth(df, locations="ISO3", color="Total",
                    hover_name="Countries and areas",
                    color_continuous_scale="Blues", height=800,
                    title="Global School-Age Digital Connectivity (Total)")
fig.show()



# Comprehensive Dashboard — 8 Visuals Combined  

Here we integrate multiple charts into a **single dashboard layout**.  

**Why a dashboard?**  
- Combines multiple perspectives in one view.  
- Supports storytelling: from regional patterns → income disparities → global map.  

**Included visuals:**  
1. Bar Chart (Median Total by Region)  
2. Scatter Plot (Urban vs Rural)  
3. Pie/Donut (Income Groups)  
4. Histogram (Total)  
5. Box Plot (Income Group)  
6. Heatmap (Correlations)  
7. Bubble Plot (Urban vs Rural, size=Total)  
8. Choropleth Map (Geography)  


In [19]:

# Prepare data
median_by_region = df.groupby('Region')['Total'].median().sort_values(ascending=False)
income_counts = df['Income Group'].fillna('Unknown').value_counts()
corr = df[['Total','Rural','Urban','Poorest','Richest']].corr()

# Create dashboard
fig = make_subplots(
    rows=3, cols=3,
    subplot_titles=(
        "Median Total by Region",
        "Urban vs Rural Scatter",
        "Income Group Distribution",
        "Histogram of Total Connectivity",
        "Boxplot by Income Group",
        "Correlation Heatmap",
        "Bubble Plot (Urban vs Rural, size=Total)",
        "Choropleth Map of Connectivity",
        ""
    ),
    specs=[
        [{"type": "bar"}, {"type": "xy"}, {"type": "domain"}],
        [{"type": "xy"}, {"type": "box"}, {"type": "heatmap"}],
        [{"type": "scatter"}, {"type": "choropleth"}, None]
    ],
    horizontal_spacing=0.1,
    vertical_spacing=0.12
)

# 1. Bar
fig.add_trace(go.Bar(x=median_by_region.index, y=median_by_region.values), row=1, col=1)

# 2. Scatter
fig.add_trace(go.Scatter(x=df['Rural'], y=df['Urban'], mode='markers',
                         text=df['Countries and areas'], marker=dict(size=6, opacity=0.6)),
              row=1, col=2)

# 3. Pie
fig.add_trace(go.Pie(labels=income_counts.index, values=income_counts.values, hole=0.4),
              row=1, col=3)

# 4. Histogram
fig.add_trace(go.Histogram(x=df['Total'], nbinsx=20), row=2, col=1)

# 5. Box
fig.add_trace(go.Box(y=df['Total'], x=df['Income Group']), row=2, col=2)

# 6. Heatmap
fig.add_trace(go.Heatmap(z=corr.values, x=corr.columns, y=corr.columns,
                         colorscale="RdBu", zmid=0), row=2, col=3)

# 7. Bubble Plot
fig.add_trace(go.Scatter(x=df['Rural'], y=df['Urban'],
                         mode='markers',
                         text=df['Countries and areas'],
                         marker=dict(size=df['Total']/5, color=df['Total'],
                                     colorscale="Viridis", showscale=False, opacity=0.6)),
              row=3, col=1)

# 8. Choropleth
fig.add_trace(go.Choropleth(locations=df['ISO3'], z=df['Total'],
                            text=df['Countries and areas'],
                            colorscale="Blues", marker_line_color="white"),
              row=3, col=2)

fig.update_layout(height=1200, width=1400,
                  title_text="Comprehensive Dashboard: School-Age Digital Connectivity",
                  showlegend=False)
fig.show()


## Comprehensive Dashboard Explanation

The dashboard brings together eight complementary visualizations that provide a holistic view of **School-Age Digital Connectivity** across countries, regions, and income groups.  

1. **Median Total by Region (Bar Chart):** Highlights regional disparities in connectivity, showing which regions perform above or below the global median.  
2. **Urban vs Rural Scatter Plot:** Reveals the digital divide between urban and rural areas, with each point representing a country.  
3. **Income Group Distribution (Donut Chart):** Provides a quick overview of how countries are classified by income level, which influences connectivity outcomes.  
4. **Histogram of Total Connectivity:** Shows the distribution of total connectivity scores, helping identify whether connectivity is skewed toward high or low values.  
5. **Box Plot by Income Group:** Summarizes variations and outliers in connectivity within each income category, emphasizing inequality within groups.  
6. **Correlation Heatmap:** Explores relationships between Total, Rural, Urban, Poorest, and Richest, guiding which indicators move together.  
7. **Bubble Plot (Urban vs Rural, size=Total):** Adds another layer to the scatter plot by encoding total connectivity as bubble size, giving more context.  
8. **Choropleth Map:** Provides a geographical perspective, showing how digital connectivity varies globally.  

### Design Principles Applied
- **Clarity and readability:** Each chart type was chosen to best represent its data.  
- **Comparisons enabled:** Side-by-side visuals make it easy to spot disparities.  
- **Color coding:** Sequential and diverging color scales emphasize patterns while avoiding clutter.  
- **Balance:** A grid layout with eight visuals ensures coverage of multiple perspectives without overwhelming the viewer.  

Overall, this dashboard supports **data storytelling** by combining regional, economic, and geographical dimensions, helping identify both **gaps** and **opportunities** in digital connectivity.  
