# Climate Science Quick Start: Global Temperature Analysis

**Duration:** 10-30 minutes  
**Goal:** Analyze 144 years of global temperature data to understand climate change trends

## What You'll Learn

- Load and explore NOAA global temperature anomaly data (1880-2024)
- Calculate warming trends using linear regression
- Visualize temperature changes over time
- Detect statistically significant climate patterns
- Understand the scientific evidence for global warming

## Dataset

We'll use the **NOAA GISTEMP** dataset:
- Monthly global land-ocean temperature anomalies
- Anomaly = difference from 1951-1980 baseline
- Data from 1880 to present
- Source: NASA Goddard Institute for Space Studies (GISS)

üåç **No AWS account or API keys needed - let's get started!**

## 1. Setup and Data Loading

In [None]:
# Import libraries (all pre-installed in Colab/Studio Lab)
import warnings
from datetime import datetime

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy import stats

warnings.filterwarnings("ignore")

# Set visualization style
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 11

print("‚úì Libraries loaded successfully!")
print(f"Analysis date: {datetime.now().strftime('%Y-%m-%d')}")

In [None]:
# Load NOAA global temperature anomaly data
# This is a direct link to the public CSV file
url = "https://data.giss.nasa.gov/gistemp/tabledata_v4/GLB.Ts+dSST.csv"

# Read data (skip first row which is metadata)
df = pd.read_csv(url, skiprows=1)

# Display basic info
print(f"‚úì Loaded {len(df)} years of temperature data (1880-{df['Year'].max()})")
print(f"  Columns: {list(df.columns[:14])}...")  # Show first 14 columns
df.head()

### Understanding Temperature Anomalies

**Temperature Anomaly** = Observed Temperature - Baseline Temperature

- Baseline: Average temperature from 1951-1980
- **Positive anomaly:** Warmer than baseline
- **Negative anomaly:** Cooler than baseline
- Units: Degrees Celsius (¬∞C)

Example: An anomaly of **+1.2¬∞C** means that period was 1.2¬∞C warmer than the 1951-1980 average.

## 2. Data Preparation

In [None]:
# Extract annual averages (last column)
# The 'J-D' column contains the annual mean (January-December)
annual_data = df[["Year", "J-D"]].copy()
annual_data.columns = ["Year", "Anomaly"]

# Remove rows with missing data
annual_data = annual_data[annual_data["Anomaly"] != "***"]
annual_data["Anomaly"] = annual_data["Anomaly"].astype(float)
annual_data["Year"] = annual_data["Year"].astype(int)

print(f"‚úì Prepared {len(annual_data)} years of annual temperature anomalies")
print(f"\nData range: {annual_data['Year'].min()} to {annual_data['Year'].max()}")
print(
    f"Temperature anomaly range: {annual_data['Anomaly'].min():.2f}¬∞C to {annual_data['Anomaly'].max():.2f}¬∞C"
)

In [None]:
# Calculate basic statistics
print("\n=== Temperature Anomaly Statistics ===")
print(annual_data["Anomaly"].describe())

# Split data into periods
period_1 = annual_data[annual_data["Year"] <= 1950]
period_2 = annual_data[(annual_data["Year"] > 1950) & (annual_data["Year"] <= 2000)]
period_3 = annual_data[annual_data["Year"] > 2000]

print("\n=== Average Anomaly by Period ===")
print(f"1880-1950: {period_1['Anomaly'].mean():.3f}¬∞C")
print(f"1951-2000: {period_2['Anomaly'].mean():.3f}¬∞C")
print(f"2001-{annual_data['Year'].max()}: {period_3['Anomaly'].mean():.3f}¬∞C")
print(
    f"\n‚ö†Ô∏è  Temperature increase from early to recent period: {period_3['Anomaly'].mean() - period_1['Anomaly'].mean():.2f}¬∞C"
)

## 3. Trend Analysis

In [None]:
# Calculate linear regression trend
slope, intercept, r_value, p_value, std_err = stats.linregress(
    annual_data["Year"], annual_data["Anomaly"]
)

# Calculate trend line
annual_data["Trend"] = slope * annual_data["Year"] + intercept

# Calculate warming rate
warming_per_decade = slope * 10
warming_per_century = slope * 100
total_warming = annual_data["Trend"].iloc[-1] - annual_data["Trend"].iloc[0]

print("=== Warming Trend Analysis ===")
print(f"Linear trend slope: {slope:.5f}¬∞C per year")
print(f"Warming rate: {warming_per_decade:.3f}¬∞C per decade")
print(f"Warming rate: {warming_per_century:.2f}¬∞C per century")
print(f"Total warming (1880-{annual_data['Year'].max()}): {total_warming:.2f}¬∞C")
print(f"\nR¬≤ value: {r_value**2:.4f} (how well the trend fits the data)")
print(f"P-value: {p_value:.2e} (statistical significance - very low means highly significant)")

if p_value < 0.001:
    print("\n‚úì The warming trend is HIGHLY STATISTICALLY SIGNIFICANT (p < 0.001)")

## 4. Visualizations

In [None]:
# Main visualization: Temperature anomaly over time
fig, ax = plt.subplots(figsize=(14, 7))

# Plot annual anomalies
ax.plot(
    annual_data["Year"],
    annual_data["Anomaly"],
    color="steelblue",
    linewidth=1.5,
    label="Annual Anomaly",
    alpha=0.7,
)

# Plot trend line
ax.plot(
    annual_data["Year"],
    annual_data["Trend"],
    color="red",
    linewidth=2.5,
    label=f"Linear Trend ({warming_per_decade:.2f}¬∞C/decade)",
    linestyle="--",
)

# Add baseline reference
ax.axhline(y=0, color="gray", linestyle="-", linewidth=1, alpha=0.5, label="1951-1980 Baseline")

# Highlight recent warming
ax.fill_between(
    annual_data[annual_data["Year"] >= 2000]["Year"],
    0,
    annual_data[annual_data["Year"] >= 2000]["Anomaly"],
    alpha=0.2,
    color="red",
    label="21st Century Warming",
)

# Formatting
ax.set_xlabel("Year", fontsize=13, fontweight="bold")
ax.set_ylabel("Temperature Anomaly (¬∞C)", fontsize=13, fontweight="bold")
ax.set_title(
    "Global Temperature Anomaly (1880-2024)\nRelative to 1951-1980 Baseline",
    fontsize=15,
    fontweight="bold",
    pad=20,
)
ax.legend(loc="upper left", fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üìà Visualization shows clear warming trend, especially accelerating after 1980")

In [None]:
# Create a 5-year moving average to smooth short-term fluctuations
annual_data["Moving_Avg_5yr"] = annual_data["Anomaly"].rolling(window=5, center=True).mean()

fig, ax = plt.subplots(figsize=(14, 7))

# Plot both annual and smoothed data
ax.plot(
    annual_data["Year"],
    annual_data["Anomaly"],
    color="lightblue",
    linewidth=1,
    label="Annual Anomaly",
    alpha=0.5,
)
ax.plot(
    annual_data["Year"],
    annual_data["Moving_Avg_5yr"],
    color="darkblue",
    linewidth=2.5,
    label="5-Year Moving Average",
)

ax.axhline(y=0, color="gray", linestyle="-", linewidth=1, alpha=0.5)

ax.set_xlabel("Year", fontsize=13, fontweight="bold")
ax.set_ylabel("Temperature Anomaly (¬∞C)", fontsize=13, fontweight="bold")
ax.set_title(
    "Global Temperature with 5-Year Moving Average", fontsize=15, fontweight="bold", pad=20
)
ax.legend(loc="upper left", fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üìä Moving average removes year-to-year variability, showing clearer long-term trend")

In [None]:
# Acceleration analysis: Compare warming rates across periods
periods = [
    (1880, 1950, "Early Period"),
    (1950, 1980, "Mid-Century"),
    (1980, 2000, "Late 20th Century"),
    (2000, 2024, "21st Century"),
]

warming_rates = []
period_labels = []

print("=== Warming Rate by Period ===")
for start, end, label in periods:
    period_data = annual_data[(annual_data["Year"] >= start) & (annual_data["Year"] <= end)]
    if len(period_data) > 1:
        slope_period, _, _, _, _ = stats.linregress(period_data["Year"], period_data["Anomaly"])
        rate_per_decade = slope_period * 10
        warming_rates.append(rate_per_decade)
        period_labels.append(label)
        print(f"{label} ({start}-{end}): {rate_per_decade:.3f}¬∞C/decade")

# Visualize acceleration
fig, ax = plt.subplots(figsize=(10, 6))
colors = ["skyblue", "lightcoral", "orange", "red"]
bars = ax.bar(period_labels, warming_rates, color=colors, alpha=0.8, edgecolor="black")

ax.set_ylabel("Warming Rate (¬∞C/decade)", fontsize=12, fontweight="bold")
ax.set_title("Acceleration of Global Warming Over Time", fontsize=14, fontweight="bold", pad=15)
ax.grid(True, alpha=0.3, axis="y")

# Add value labels on bars
for _i, (bar, rate) in enumerate(zip(bars, warming_rates)):
    height = bar.get_height()
    ax.text(
        bar.get_x() + bar.get_width() / 2.0,
        height,
        f"{rate:.3f}¬∞C",
        ha="center",
        va="bottom",
        fontsize=11,
        fontweight="bold",
    )

plt.xticks(rotation=15, ha="right")
plt.tight_layout()
plt.show()

print(
    f"\n‚ö†Ô∏è  Warming has accelerated by {(warming_rates[-1] / warming_rates[0]):.1f}x from early period to 21st century"
)

## 5. Statistical Significance Testing

In [None]:
# Compare pre-1980 vs post-1980 temperatures
pre_1980 = annual_data[annual_data["Year"] < 1980]["Anomaly"]
post_1980 = annual_data[annual_data["Year"] >= 1980]["Anomaly"]

# Perform t-test
t_stat, p_val_ttest = stats.ttest_ind(pre_1980, post_1980)

print("=== Statistical Comparison: Pre-1980 vs Post-1980 ===")
print(f"\nPre-1980 average anomaly:  {pre_1980.mean():.3f}¬∞C (n={len(pre_1980)} years)")
print(f"Post-1980 average anomaly: {post_1980.mean():.3f}¬∞C (n={len(post_1980)} years)")
print(f"Difference: {post_1980.mean() - pre_1980.mean():.3f}¬∞C")
print(f"\nT-statistic: {t_stat:.2f}")
print(f"P-value: {p_val_ttest:.2e}")

if p_val_ttest < 0.001:
    print("\n‚úì The temperature difference is HIGHLY STATISTICALLY SIGNIFICANT (p < 0.001)")
    print("  This means there is less than 0.1% chance this warming occurred by random chance.")

In [None]:
# Find the warmest years
warmest_years = annual_data.nlargest(10, "Anomaly")[["Year", "Anomaly"]]

print("=== Top 10 Warmest Years on Record ===")
print(warmest_years.to_string(index=False))

recent_years = warmest_years[warmest_years["Year"] >= 2000]
print(f"\n‚ö†Ô∏è  {len(recent_years)} of the 10 warmest years occurred since 2000")

## 6. Key Findings Summary

In [None]:
# Generate summary report
print("=" * 60)
print("CLIMATE ANALYSIS SUMMARY")
print("=" * 60)
print(
    f"\nüìÖ Data Period: {annual_data['Year'].min()} to {annual_data['Year'].max()} ({len(annual_data)} years)"
)
print("\nüå°Ô∏è  TEMPERATURE TRENDS:")
print(f"   ‚Ä¢ Overall warming: {total_warming:.2f}¬∞C since 1880")
print(f"   ‚Ä¢ Current warming rate: {warming_per_decade:.2f}¬∞C per decade")
print(
    f"   ‚Ä¢ Warmest year on record: {warmest_years.iloc[0]['Year']:.0f} ({warmest_years.iloc[0]['Anomaly']:.2f}¬∞C)"
)
print(
    f"   ‚Ä¢ {len(warmest_years[warmest_years['Year'] >= 2000])}/10 warmest years occurred since 2000"
)
print("\nüìä STATISTICAL SIGNIFICANCE:")
print(f"   ‚Ä¢ Linear trend p-value: {p_value:.2e} (highly significant)")
print(f"   ‚Ä¢ Pre/post-1980 comparison p-value: {p_val_ttest:.2e} (highly significant)")
print(f"   ‚Ä¢ R¬≤ of linear trend: {r_value**2:.4f} (strong correlation)")
print("\n‚ö° ACCELERATION:")
print(f"   ‚Ä¢ 1880-1950: {warming_rates[0]:.3f}¬∞C/decade")
print(f"   ‚Ä¢ 2000-2024: {warming_rates[-1]:.3f}¬∞C/decade")
print(f"   ‚Ä¢ Acceleration factor: {(warming_rates[-1] / warming_rates[0]):.1f}x")
print("\n‚úÖ CONCLUSION:")
print("   The data shows clear, statistically significant evidence of global warming.")
print("   The warming trend is accelerating, with recent decades showing the")
print("   highest temperatures in the entire 144-year record.")
print("=" * 60)

## üéì What You Learned

In just 10-30 minutes, you:

1. ‚úÖ Loaded and explored 144 years of global temperature data
2. ‚úÖ Calculated warming trends using statistical methods
3. ‚úÖ Created professional visualizations
4. ‚úÖ Performed significance testing
5. ‚úÖ Analyzed the acceleration of climate change
6. ‚úÖ Understood the scientific evidence for global warming

## üöÄ Next Steps

### Ready for More?

**Tier 1: SageMaker Studio Lab (1-2 hours, free)**
- Analyze multiple climate variables (precipitation, sea level, etc.)
- Use persistent storage for larger datasets
- Build more complex models with saved checkpoints
- Collaborate with team members

**Tier 2: AWS Starter (2-4 hours, $5-15)**
- Store climate data in S3
- Process data with Lambda functions
- Query historical data with Athena
- Set up automated monitoring

**Tier 3: Production Infrastructure (4-5 days, $50-500/month)**
- Multi-model ensemble analysis with 20+ CMIP6 models
- Distributed computing with Dask on AWS Batch
- Zarr data format on S3 for 100GB+ datasets
- AI-powered climate insights with Amazon Bedrock
- Full CloudFormation deployment

## üìö Learn More

- **Dataset Source:** [NASA GISS Surface Temperature Analysis](https://data.giss.nasa.gov/gistemp/)
- **IPCC Reports:** [Climate Change 2023 Synthesis Report](https://www.ipcc.ch/report/ar6/syr/)
- **NOAA Climate Data:** [National Centers for Environmental Information](https://www.ncei.noaa.gov/)

---

**ü§ñ Generated with [Claude Code](https://claude.com/claude-code)**