# Data Visualization I: Telling Stories with Plotly

This notebook demonstrates how to use Plotly for data visualization, covering basic and advanced plots, customization, interactivity, storytelling, and best practices.

## 1. Import Required Libraries
We'll use pandas for data handling and plotly.express for visualization.

In [59]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

## 2. Load Example Dataset
We'll load the average monthly surface temperature dataset from the data folder.

In [60]:
df = pd.read_csv('../data/average-monthly-surface-temperature.csv')
# Convert 'Day' column to datetime
df['Day'] = pd.to_datetime(df['Day'])
df['Year'] = df['Day'].dt.year

df_2007 = df[df['Year'] == 2007]  # Focus on 2007 for clarity

print(f"Dataset shape: {df.shape}")
print(f"Dataset shape for 2007: {df_2007.shape}")
df_2007.head()

Dataset shape: (220029, 5)
Dataset shape for 2007: (2556, 5)


Unnamed: 0,Entity,Code,Day,Monthly average,Year
804,Afghanistan,AFG,2007-01-15,-2.11214,2007
805,Afghanistan,AFG,2007-02-15,2.100871,2007
806,Afghanistan,AFG,2007-03-15,5.591297,2007
807,Afghanistan,AFG,2007-04-15,15.443104,2007
808,Afghanistan,AFG,2007-05-15,19.199982,2007


In [61]:
df[df['Code'].isna()].Entity.unique()

array(['Africa (NIAID)', 'Antarctica (NIAID)', 'Arctic Ocean (NIAID)',
       'Asia (NIAID)', 'Australia (NIAID)', 'Baltic Sea (NIAID)',
       'Europe (NIAID)', 'Indian Ocean (NIAID',
       'Mediterranean Region (NIAID)', 'North America (NIAID)',
       'North Atlantic Ocean (NIAID)', 'North Pacific Ocean (NIAID)',
       'Oceania (NIAID)', 'South America (NIAID)',
       'South Atlantic Ocean (NIAID)',
       'South China and Easter Archipelagic Seas (NIAID)',
       'South Pacific Ocean (NIAID)', 'Southern Ocean (NIAID)'],
      dtype=object)

In [62]:
df_2007.nunique()

Entity              213
Code                195
Day                  12
Monthly average    2552
Year                  1
dtype: int64

## 3. Bar Plot: Average Temperature by Country
Let's visualize average monthly temperature by country for 2007.

In [63]:
# Get average temperature by country for 2007
temp_by_entity = df_2007.groupby(['Entity', 'Code'])['Monthly average'].mean().reset_index()
temp_by_entity = temp_by_entity[temp_by_entity['Code'].notna() & (temp_by_entity['Entity'] != 'World')]

# temp_by_entity
fig = px.histogram(temp_by_entity, x='Monthly average',
                   title='Distribution of Average Temperatures by Country (2007)',
                   labels={'Monthly average': 'Average Temperature (°C)', 'count': 'Number of Countries'},
                   nbins=30)
fig.show()

# temp_by_entity.columns = ['Entity', 'AvgTemp']
# temp_by_entity = temp_by_entity.sort_values('AvgTemp', ascending=False).head(20)
# fig = px.bar(temp_by_entity, x='AvgTemp', y='Entity',
#              title='Top 20 Countries by Average Temperature (2007)',
#              labels={'AvgTemp': 'Average Temperature (°C)', 'Entity': 'Country'})
# fig.update_layout(height=600, width=1000)
# fig.show()

## 4. Bar Plot: Temperature Range by Country
A bar plot to compare temperature ranges across different countries.

In [64]:
temp_range = df_2007.groupby('Entity')['Monthly average'].agg(['min', 'max']).reset_index()
temp_range['range'] = temp_range['max'] - temp_range['min']
temp_range = temp_range.sort_values('range', ascending=False).head(5)
fig = px.bar(temp_range, x='Entity', y='range',
             title='Temperature Range by Country (2007)',
             labels={'range': 'Temperature Range (°C)', 'Entity': 'Country'})
fig.show()

## 5. Line Plot: Temperature Over Time
Let's see how temperature has changed over time for a specific country.

In [65]:
import numpy as np

fig = px.line(df[df['Entity'] == 'France'], x='Day', y='Monthly average',
              title='Monthly Average Temperature in France Over Time',
              labels={'Monthly average': 'Temperature (°C)', 'Day': 'Date'})

# Add 60-month rolling average
france_data = df[df['Entity'] == 'France'].sort_values('Day').reset_index(drop=True)
france_data['Rolling_Avg_60m'] = france_data['Monthly average'].rolling(window=5*12, min_periods=1).mean()

fig.add_scatter(x=france_data['Day'], y=france_data['Rolling_Avg_60m'],
                mode='lines', name='60-Month Rolling Average',
                line=dict(color='red', dash='dash'))

# Add trend line using numpy polyfit
z = np.polyfit(np.arange(len(france_data)), france_data['Monthly average'], 1)
p = np.poly1d(z)
trend = p(np.arange(len(france_data)))

fig.add_scatter(x=france_data['Day'], y=trend,
                mode='lines', name='Trend Line',
                line=dict(color='green', dash='dot'))

fig.show()

In [None]:
df_de = df[df['Entity'] == 'Germany'].sort_values('Day').reset_index(drop=True)
df_de_filtered = df_de[(df_de['Day'].dt.month == 1) | (df_de['Day'].dt.month == 7)].copy()
df_de_filtered['month_number'] = df_de_filtered['Day'].dt.month
df_de_filtered

Unnamed: 0,Entity,Code,Day,Monthly average,Year,month_number
0,Germany,DEU,1940-01-15,-9.419646,1940,1
6,Germany,DEU,1940-07-15,16.151615,1940,7
12,Germany,DEU,1941-01-15,-5.572927,1941,1
18,Germany,DEU,1941-07-15,18.397738,1941,7
24,Germany,DEU,1942-01-15,-8.174922,1942,1
...,...,...,...,...,...,...
1008,Germany,DEU,2024-01-15,1.539955,2024,1
1014,Germany,DEU,2024-07-15,18.881859,2024,7
1020,Germany,DEU,2025-01-15,2.019176,2025,1
1026,Germany,DEU,2025-07-15,18.510778,2025,7


In [67]:
# Create marker plot comparing January vs July temperatures for Germany
fig = go.Figure()

# Filter January data
jan_data = df_de_filtered[df_de_filtered['month_number'] == 1]
# Filter July data
jul_data = df_de_filtered[df_de_filtered['month_number'] == 7]

# Add January markers
fig.add_trace(go.Scatter(
    x=jan_data['Year'],
    y=jan_data['Monthly average'],
    mode='markers',
    name='January',
    marker=dict(color='#4292c6', size=8)
))

# Add July markers
fig.add_trace(go.Scatter(
    x=jul_data['Year'],
    y=jul_data['Monthly average'],
    mode='markers',
    name='July',
    marker=dict(color='#cd1b1c', size=8)
))

fig.update_layout(
    title='Germany: January vs July Temperature Trends Over Time',
    xaxis_title='Year',
    yaxis_title='Temperature (°C)',
    plot_bgcolor='white',
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False)
)

fig.show()

In [68]:
# Create scatter plot comparing January vs August temperatures
january_temps = df[df['Day'].dt.month == 1].rename(columns={'Monthly average': 'Jan_Temp'})[['Entity', 'Jan_Temp', 'Year']]

july_temps = df[df['Day'].dt.month == 7].rename(columns={'Monthly average': 'Jul_Temp'})[['Entity', 'Jul_Temp', 'Year']]

df_jan_july = january_temps.merge(july_temps, on=['Entity', 'Year'])

fig = px.scatter(df_jan_july, x='Jan_Temp', y='Jul_Temp', color='Entity')
fig.update_layout(title='January vs July Temperatures by Country',
                  xaxis_title='January Average Temperature (°C)',
                  yaxis_title='July Average Temperature (°C)')
fig.show()

In [69]:
# Compare monthly temperature trends for selected countries in 2007
selected_countries = ['Canada', 'Brazil', 'Egypt', 'Russia']
df_selected = df_2007[df_2007['Entity'].isin(selected_countries)]
fig = px.line(df_selected, x='Day', y='Monthly average',
              color='Entity',
              title='Customized: Monthly Temperature Trends for Selected Countries (2007)',
              labels={'Monthly average': 'Temperature (°C)', 'Day': 'Date'})
fig.update_layout(legend_title_text='Country',
                  plot_bgcolor='rgba(240,240,240,0.95)')
fig.show()

## 7. Interactive Charts
Plotly charts are interactive by default: try zooming, hovering, and filtering.

Let's create an interactive chart showing temperature distribution.

In [70]:
# Create an interactive box plot showing temperature distribution by top countries
top_entities = df_2007.groupby('Entity')['Monthly average'].mean().nlargest(8).index
df_top = df_2007[df_2007['Entity'].isin(top_entities)]

fig = px.box(df_top, x='Entity', y='Monthly average',
             title='Temperature Distribution by Top 8 Countries (2007)',
             labels={'Monthly average': 'Temperature (°C)'})

fig.update_layout(
    hovermode='x unified',
    showlegend=True
)

fig.show()

## 8. Assignment: Tell a Story with Global Temperature Data

**Create your own analysis of the global temperature dataset using at least three different Plotly chart types.** Your notebook should reveal patterns, trends, and insights about global temperatures over time.

### Requirements:
1. **Explore the temperature dataset** - Load and examine the data
2. **Create at least three different visualizations** using different chart types (you can use bar, line, scatter, box plots, etc.)
3. **Customize your charts** with meaningful titles, labels, colors, and legends
4. **Add interactivity** - Use hover information, zooming, or filtering features
5. **Interpret your findings** - Write markdown cells explaining what patterns you discovered and what they reveal

### Suggested Analyses:
- Compare average temperatures across different countries or regions
- Track how temperatures have changed over specific time periods
- Identify seasonal patterns in temperature data
- Compare temperature ranges (min/max variations) across locations
- Analyze temperature trends across decades
- Explore relationships between different geographic regions

### Tips:
- Use filtering, grouping, and aggregation to focus on specific insights
- Consider different time periods (a single year, a decade, or longer trends)
- Make your visualizations tell a clear story about climate patterns
- Experiment with different color schemes to make comparisons stand out