# Data Visualization I: Telling Stories with Plotly

This notebook demonstrates how to use Plotly for data visualization, covering basic and advanced plots, customization, interactivity, storytelling, and best practices.

## 1. Import Required Libraries
We'll use pandas for data handling and plotly.express for visualization.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

## 2. Load Example Dataset
We'll load the average monthly surface temperature dataset from the data folder.

In [2]:
df = pd.read_csv('../data/average-monthly-surface-temperature.csv')
# Convert 'Day' column to datetime
df['Day'] = pd.to_datetime(df['Day'])
df['Year'] = df['Day'].dt.year
df_2007 = df[df['Year'] == 2007]  # Focus on 2007 for clarity
print(f"Dataset shape: {df.shape}")
print(f"Dataset shape for 2007: {df_2007.shape}")
df.head()

Dataset shape: (220029, 5)
Dataset shape for 2007: (2556, 5)


Unnamed: 0,Entity,Code,Day,Monthly average,Year
0,Afghanistan,AFG,1940-01-15,-2.032494,1940
1,Afghanistan,AFG,1940-02-15,-0.733503,1940
2,Afghanistan,AFG,1940-03-15,1.999134,1940
3,Afghanistan,AFG,1940-04-15,10.199754,1940
4,Afghanistan,AFG,1940-05-15,17.942135,1940


## 3. Bar Plot: Average Temperature by Country
Let's visualize average monthly temperature by country for 2007.

In [3]:
# Get average temperature by country for 2007
temp_by_entity = df_2007.groupby('Entity')['Monthly average'].mean().reset_index()
temp_by_entity.columns = ['Entity', 'AvgTemp']
temp_by_entity = temp_by_entity.sort_values('AvgTemp', ascending=False).head(20)
fig = px.bar(temp_by_entity, x='AvgTemp', y='Entity',
             title='Top 20 Countries by Average Temperature (2007)',
             labels={'AvgTemp': 'Average Temperature (°C)', 'Entity': 'Country'})
fig.show()

## 4. Bar Plot: Temperature Range by Country
A bar plot to compare temperature ranges across different countries.

In [4]:
temp_range = df_2007.groupby('Entity')['Monthly average'].agg(['min', 'max']).reset_index()
temp_range['range'] = temp_range['max'] - temp_range['min']
temp_range = temp_range.sort_values('range', ascending=False).head(15)
fig = px.bar(temp_range, x='Entity', y='range',
             title='Temperature Range by Country (2007)',
             labels={'range': 'Temperature Range (°C)', 'Entity': 'Country'})
fig.show()

## 5. Line Plot: Temperature Over Time
Let's see how temperature has changed over time for a specific country.

In [5]:
fig = px.line(df[df['Entity'] == 'France'], x='Day', y='Monthly average',
              title='Monthly Average Temperature in France Over Time',
              labels={'Monthly average': 'Temperature (°C)', 'Day': 'Date'})
fig.show()

## 6. Customizing Visualizations
You can customize colors, labels, and layout for clarity and impact.

In [6]:
# Compare monthly temperature trends for selected countries in 2007
selected_countries = ['Canada', 'Brazil', 'Egypt', 'Russia']
df_selected = df_2007[df_2007['Entity'].isin(selected_countries)]
fig = px.line(df_selected, x='Day', y='Monthly average',
              color='Entity',
              title='Customized: Monthly Temperature Trends for Selected Countries (2007)',
              labels={'Monthly average': 'Temperature (°C)', 'Day': 'Date'})
fig.update_layout(legend_title_text='Country',
                  plot_bgcolor='rgba(240,240,240,0.95)')
fig.show()

## 7. Interactive Charts
Plotly charts are interactive by default: try zooming, hovering, and filtering.

Let's create an interactive chart showing temperature distribution.

In [7]:
# Create an interactive box plot showing temperature distribution by top countries
top_entities = df_2007.groupby('Entity')['Monthly average'].mean().nlargest(8).index
df_top = df_2007[df_2007['Entity'].isin(top_entities)]

fig = px.box(df_top, x='Entity', y='Monthly average',
             title='Temperature Distribution by Top 8 Countries (2007)',
             labels={'Monthly average': 'Temperature (°C)'})

fig.update_layout(
    hovermode='x unified',
    showlegend=True
)

fig.show()

## 8. Assignment: Tell a Story with Global Temperature Data

**Create your own analysis of the global temperature dataset using at least three different Plotly chart types.** Your notebook should reveal patterns, trends, and insights about global temperatures over time.

### Requirements:
1. **Explore the temperature dataset** - Load and examine the data
2. **Create at least three different visualizations** using different chart types (you can use bar, line, scatter, box plots, etc.)
3. **Customize your charts** with meaningful titles, labels, colors, and legends
4. **Add interactivity** - Use hover information, zooming, or filtering features
5. **Interpret your findings** - Write markdown cells explaining what patterns you discovered and what they reveal

### Suggested Analyses:
- Compare average temperatures across different countries or regions
- Track how temperatures have changed over specific time periods
- Identify seasonal patterns in temperature data
- Compare temperature ranges (min/max variations) across locations
- Analyze temperature trends across decades
- Explore relationships between different geographic regions

### Tips:
- Use filtering, grouping, and aggregation to focus on specific insights
- Consider different time periods (a single year, a decade, or longer trends)
- Make your visualizations tell a clear story about climate patterns
- Experiment with different color schemes to make comparisons stand out