# Append Climate Data using Free Open-Meteo API

This notebook fetches the same climate variables as your NetCDF approach, but using the **completely FREE** Open-Meteo API - no API key or credit card required!


In [14]:
import requests
import numpy as np
from datetime import datetime, timedelta
import time
import csv
import pandas as pd

# NO API KEY NEEDED! Open-Meteo is completely free
print("🌟 Using Open-Meteo API - completely FREE, no registration required!")


🌟 Using Open-Meteo API - completely FREE, no registration required!


In [17]:
def get_weather_data(lat, lon, start_date, end_date):
    """
    Fetch historical weather data from Open-Meteo API (FREE!)
    """
    url = "https://archive-api.open-meteo.com/v1/archive"
    params = {
        'latitude': lat,
        'longitude': lon,
        'start_date': start_date,
        'end_date': end_date,
        'daily': [
            'temperature_2m_min',
            'temperature_2m_max', 
            'temperature_2m_mean',
            'relative_humidity_2m_mean',
            'precipitation_sum',
            'wind_speed_10m_max'
        ],
        'timezone': 'auto'
    }
    
    try:
        response = requests.get(url, params=params)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error fetching weather data: {e}")
        return None


## Example Usage

Let's test the function with a sample location and see what data we get back.


In [18]:
# Example: Get weather data for Siena, Italy (where you have mushroom data)
# Coordinates for Siena: 43.318, 11.330
lat, lon = 43.318, 11.330
start_date = "2023-09-01"  # September is mushroom season
end_date = "2023-09-30"

print(f"🔍 Fetching weather data for Siena ({lat}, {lon})")
print(f"📅 Date range: {start_date} to {end_date}")

weather_data = get_weather_data(lat, lon, start_date, end_date)

if weather_data:
    print("✅ Success! Here's what we got:")
    print(f"📊 Keys in response: {list(weather_data.keys())}")
    
    # Show the daily data structure
    daily_data = weather_data['daily']
    print(f"📈 Daily weather variables: {list(daily_data.keys())}")
    print(f"🗓️  Number of days: {len(daily_data['time'])}")
else:
    print("❌ Failed to fetch weather data")


🔍 Fetching weather data for Siena (43.318, 11.33)
📅 Date range: 2023-09-01 to 2023-09-30
✅ Success! Here's what we got:
📊 Keys in response: ['latitude', 'longitude', 'generationtime_ms', 'utc_offset_seconds', 'timezone', 'timezone_abbreviation', 'elevation', 'daily_units', 'daily']
📈 Daily weather variables: ['time', 'temperature_2m_min', 'temperature_2m_max', 'temperature_2m_mean', 'relative_humidity_2m_mean', 'precipitation_sum', 'wind_speed_10m_max']
🗓️  Number of days: 30


In [19]:
# Let's look at the actual data values
if weather_data:
    daily = weather_data['daily']
    
    print("📋 Sample of the weather data:")
    print("="*50)
    
    # Show first 5 days
    for i in range(min(5, len(daily['time']))):
        date = daily['time'][i]
        temp_min = daily['temperature_2m_min'][i]
        temp_max = daily['temperature_2m_max'][i]
        temp_mean = daily['temperature_2m_mean'][i]
        humidity = daily['relative_humidity_2m_mean'][i]
        precipitation = daily['precipitation_sum'][i]
        wind = daily['wind_speed_10m_max'][i]
        
        print(f"📅 {date}")
        print(f"   🌡️  Temp: {temp_min:.1f}°C to {temp_max:.1f}°C (avg: {temp_mean:.1f}°C)")
        print(f"   💧 Humidity: {humidity:.1f}%")
        print(f"   🌧️  Precipitation: {precipitation:.1f}mm")
        print(f"   💨 Wind: {wind:.1f}km/h")
        print()


📋 Sample of the weather data:
📅 2023-09-01
   🌡️  Temp: 14.9°C to 25.5°C (avg: 19.9°C)
   💧 Humidity: 70.0%
   🌧️  Precipitation: 0.0mm
   💨 Wind: 11.2km/h

📅 2023-09-02
   🌡️  Temp: 14.1°C to 28.4°C (avg: 21.2°C)
   💧 Humidity: 64.0%
   🌧️  Precipitation: 0.0mm
   💨 Wind: 8.2km/h

📅 2023-09-03
   🌡️  Temp: 16.9°C to 30.5°C (avg: 23.6°C)
   💧 Humidity: 58.0%
   🌧️  Precipitation: 0.0mm
   💨 Wind: 13.8km/h

📅 2023-09-04
   🌡️  Temp: 18.6°C to 27.8°C (avg: 22.6°C)
   💧 Humidity: 56.0%
   🌧️  Precipitation: 0.0mm
   💨 Wind: 22.9km/h

📅 2023-09-05
   🌡️  Temp: 15.4°C to 26.0°C (avg: 20.7°C)
   💧 Humidity: 50.0%
   🌧️  Precipitation: 0.0mm
   💨 Wind: 23.6km/h



## Apply to Your Mushroom Data

Now let's see how to use this with your actual mushroom observation data.


## Add 15-Day Weather History to Mushroom Dataset

Now let's create a function that adds weather data for the past 15 days (P1 = today, P15 = 14 days ago) for each observation in your dataset.


In [21]:
def get_15day_weather_history(lat, lon, observation_date):
    """
    Get weather data for the 15 days leading up to and including the observation date
    P1 = observation date, P2 = 1 day before, ..., P15 = 14 days before
    """
    from datetime import datetime, timedelta
    
    # Parse the observation date
    if isinstance(observation_date, str):
        # Handle different date formats
        try:
            if 'T' in observation_date or '+' in observation_date:
                # ISO format with timezone
                obs_date = pd.to_datetime(observation_date).date()
            else:
                obs_date = pd.to_datetime(observation_date).date()
        except:
            print(f"⚠️ Could not parse date: {observation_date}")
            return None
    else:
        obs_date = observation_date
    
    # Calculate date range (15 days ending on observation date)
    end_date = obs_date
    start_date = obs_date - timedelta(days=14)
    
    # Get weather data
    weather_data = get_weather_data(lat, lon, start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d'))
    
    if weather_data and 'daily' in weather_data:
        daily = weather_data['daily']
        
        # Create result dictionary with P1-P15 columns
        result = {}
        weather_vars = ['tmin', 'tmax', 'temp', 'rel_humidity', 'precipitation', 'wind_speed']
        api_vars = ['temperature_2m_min', 'temperature_2m_max', 'temperature_2m_mean', 
                   'relative_humidity_2m_mean', 'precipitation_sum', 'wind_speed_10m_max']
        
        # Map API variables to our variable names
        var_mapping = dict(zip(api_vars, weather_vars))
        
        for api_var, our_var in var_mapping.items():
            if api_var in daily:
                values = daily[api_var]
                # P1 is the most recent (observation day), P15 is 14 days ago
                # So we reverse the order since API returns chronological order
                values = list(reversed(values))
                
                for i in range(min(15, len(values))):
                    col_name = f"{our_var}_P{i+1}"
                    result[col_name] = values[i] if i < len(values) else None
        
        return result
    
    return None

# Test the function
print("🧪 Testing 15-day weather function...")
test_result = get_15day_weather_history(43.318, 11.330, "2023-09-15")
if test_result:
    print("✅ Function works! Sample columns:")
    for key in list(test_result.keys())[:10]:  # Show first 10 columns
        print(f"   {key}: {test_result[key]}")
    print(f"📊 Total columns created: {len(test_result)}")
else:
    print("❌ Function test failed")


🧪 Testing 15-day weather function...
✅ Function works! Sample columns:
   tmin_P1: 16.4
   tmin_P2: 16.1
   tmin_P3: 15.6
   tmin_P4: 15.3
   tmin_P5: 15.5
   tmin_P6: 14.1
   tmin_P7: 14.5
   tmin_P8: 14.7
   tmin_P9: 13.9
   tmin_P10: 13.7
📊 Total columns created: 90


In [22]:
# Load the actual dataset
print("📂 Loading your mushroom dataset...")
df = pd.read_csv('data/inaturalist_boletus_edulis_with_el_aspect_corine.csv')

print(f"📊 Dataset shape: {df.shape}")
print(f"📋 Columns: {list(df.columns)}")
print(f"\n🔍 First few rows:")
print(df.head(2))

# Check coordinate columns
if 'y' in df.columns and 'x' in df.columns:
    print(f"✅ Found coordinates: y (latitude) and x (longitude)")
    print(f"📍 Coordinate ranges:")
    print(f"   Latitude: {df['y'].min():.3f} to {df['y'].max():.3f}")
    print(f"   Longitude: {df['x'].min():.3f} to {df['x'].max():.3f}")
else:
    print("❌ No coordinate columns found")


📂 Loading your mushroom dataset...
📊 Dataset shape: (10000, 9)
📋 Columns: ['Unnamed: 0', 'species', 'location', 'observed_on', 'y', 'x', 'elevation', 'aspect', 'LC']

🔍 First few rows:
   Unnamed: 0         species                          location  \
0           0  Boletus edulis    (60.2299535805, 29.9891631234)   
1           1  Boletus edulis  (40.3352794963, -105.6781500558)   

                 observed_on          y           x  elevation      aspect  \
0  2012-08-11 12:53:00+04:00  60.229954   29.989163       88.0  202.955841   
1  2010-08-07 13:31:00-06:00  40.335279 -105.678150     2941.0  158.927826   

      LC  
0  128.0  
1    NaN  
✅ Found coordinates: y (latitude) and x (longitude)
📍 Coordinate ranges:
   Latitude: -46.074 to 70.336
   Longitude: -155.921 to 174.993


In [23]:
def add_weather_to_dataset(df, max_rows=None, delay_between_requests=0.1):
    """
    Add 15-day weather history to the mushroom dataset
    
    Parameters:
    - df: DataFrame with columns 'y' (lat), 'x' (lon), 'observed_on'
    - max_rows: Maximum number of rows to process (for testing)
    - delay_between_requests: Delay in seconds between API calls to be respectful
    """
    import time
    from tqdm import tqdm
    
    # Create a copy of the dataframe
    result_df = df.copy()
    
    # Limit rows for testing if specified
    if max_rows:
        result_df = result_df.head(max_rows)
        print(f"🔬 Processing first {max_rows} rows for testing...")
    
    # Initialize all weather columns with NaN
    weather_vars = ['tmin', 'tmax', 'temp', 'rel_humidity', 'precipitation', 'wind_speed']
    for var in weather_vars:
        for day in range(1, 16):  # P1 to P15
            col_name = f"{var}_P{day}"
            result_df[col_name] = np.nan
    
    print(f"🌦️ Adding weather data for {len(result_df)} observations...")
    print(f"📊 This will add {len(weather_vars) * 15} = {len(weather_vars) * 15} new columns")
    
    # Track progress
    successful = 0
    failed = 0
    
    # Process each observation
    for idx, row in tqdm(result_df.iterrows(), total=len(result_df), desc="Fetching weather"):
        try:
            lat = row['y']
            lon = row['x'] 
            obs_date = row['observed_on']
            
            # Get weather data for this observation
            weather_data = get_15day_weather_history(lat, lon, obs_date)
            
            if weather_data:
                # Add weather data to the row
                for col_name, value in weather_data.items():
                    if col_name in result_df.columns:
                        result_df.at[idx, col_name] = value
                successful += 1
            else:
                failed += 1
                if failed <= 5:  # Only print first few failures
                    print(f"⚠️ Failed to get weather for row {idx}: lat={lat}, lon={lon}, date={obs_date}")
            
            # Be respectful to the API
            if delay_between_requests > 0:
                time.sleep(delay_between_requests)
                
        except Exception as e:
            failed += 1
            if failed <= 5:
                print(f"❌ Error processing row {idx}: {e}")
    
    print(f"✅ Weather data fetching complete!")
    print(f"📈 Successful: {successful}")
    print(f"❌ Failed: {failed}")
    print(f"📊 Success rate: {successful/(successful+failed)*100:.1f}%")
    
    return result_df

# Test with a small sample first
print("🧪 Testing with first 3 rows...")
test_df = add_weather_to_dataset(df, max_rows=3, delay_between_requests=0.5)


🧪 Testing with first 3 rows...
🔬 Processing first 3 rows for testing...
🌦️ Adding weather data for 3 observations...
📊 This will add 90 = 90 new columns


Fetching weather: 100%|██████████| 3/3 [00:02<00:00,  1.27it/s]

✅ Weather data fetching complete!
📈 Successful: 3
❌ Failed: 0
📊 Success rate: 100.0%





In [24]:
# Check the results of the test
if 'test_df' in locals():
    print("🔍 Examining test results...")
    print(f"📊 New shape: {test_df.shape}")
    
    # Show the new weather columns
    weather_cols = [col for col in test_df.columns if any(var in col for var in ['tmin_P', 'tmax_P', 'temp_P', 'rel_humidity_P', 'precipitation_P', 'wind_speed_P'])]
    print(f"🌦️ Weather columns added: {len(weather_cols)}")
    print(f"📋 Sample weather columns: {weather_cols[:10]}")
    
    # Show sample data for first row
    if len(test_df) > 0:
        print(f"\n📝 Sample weather data for first observation:")
        first_row = test_df.iloc[0]
        print(f"📅 Date: {first_row['observed_on']}")
        print(f"📍 Location: ({first_row['y']:.3f}, {first_row['x']:.3f})")
        
        # Show temperature data P1-P5
        temp_data = [(col, first_row[col]) for col in weather_cols if 'temp_P' in col][:5]
        for col, val in temp_data:
            print(f"   {col}: {val}°C" if pd.notna(val) else f"   {col}: No data")
    
    print("\n✅ Test successful! Ready to process the full dataset.")


🔍 Examining test results...
📊 New shape: (3, 99)
🌦️ Weather columns added: 90
📋 Sample weather columns: ['tmin_P1', 'tmin_P2', 'tmin_P3', 'tmin_P4', 'tmin_P5', 'tmin_P6', 'tmin_P7', 'tmin_P8', 'tmin_P9', 'tmin_P10']

📝 Sample weather data for first observation:
📅 Date: 2012-08-11 12:53:00+04:00
📍 Location: (60.230, 29.989)
   temp_P1: 12.0°C
   temp_P2: 11.3°C
   temp_P3: 11.7°C
   temp_P4: 14.9°C
   temp_P5: 17.7°C

✅ Test successful! Ready to process the full dataset.


## Process Full Dataset

**⚠️ Important:** The full dataset has ~10,000 observations. This will take a significant amount of time (several hours) due to API rate limiting. The script is designed to be respectful to the free API service.

**💡 Tip:** You might want to process in batches or run this overnight.


In [None]:
# UNCOMMENT THIS BLOCK TO PROCESS THE FULL DATASET
# WARNING: This will take several hours to complete!

"""
# Process the full dataset
print("🚀 Starting full dataset processing...")
print(f"📊 Total observations to process: {len(df)}")
print("⏰ Estimated time: Several hours (with API delays)")

# Process with a small delay to be respectful to the free API
final_df = add_weather_to_dataset(df, delay_between_requests=0.2)

# Save the results
output_file = 'data/inaturalist_boletus_edulis_with_el_aspect_corine_weather.csv'
final_df.to_csv(output_file, index=False)

print(f"✅ Complete! Saved to: {output_file}")
print(f"📊 Final dataset shape: {final_df.shape}")
print(f"🌦️ Weather variables added: tmin, tmax, temp, rel_humidity, precipitation, wind_speed")
print(f"📅 For each variable: P1 (observation day) through P15 (14 days before)")
"""

# For now, let's create a smaller sample for demonstration
print("📝 Creating a sample dataset (100 rows) for demonstration...")
sample_df = add_weather_to_dataset(df.head(100), delay_between_requests=0.1)

# Save the sample
sample_output = 'data/sample_mushrooms_with_weather.csv'
sample_df.to_csv(sample_output, index=False)
print(f"✅ Sample saved to: {sample_output}")

print(f"\n📊 Sample dataset summary:")
print(f"   Shape: {sample_df.shape}")
print(f"   Original columns: {len(df.columns)}")
print(f"   New columns added: {len(sample_df.columns) - len(df.columns)}")
print(f"   Weather variables: 6 (tmin, tmax, temp, rel_humidity, precipitation, wind_speed)")
print(f"   Days per variable: 15 (P1-P15)")
print(f"   Total weather columns: 6 × 15 = 90")


📝 Creating a sample dataset (100 rows) for demonstration...
🌦️ Adding weather data for 100 observations...
📊 This will add 90 = 90 new columns


Fetching weather:  41%|████      | 41/100 [00:15<00:19,  3.06it/s]

Error fetching weather data: 400 Client Error: Bad Request for url: https://archive-api.open-meteo.com/v1/archive?latitude=nan&longitude=nan&start_date=2024-03-16&end_date=2024-03-30&daily=temperature_2m_min&daily=temperature_2m_max&daily=temperature_2m_mean&daily=relative_humidity_2m_mean&daily=precipitation_sum&daily=wind_speed_10m_max&timezone=auto
⚠️ Failed to get weather for row 40: lat=nan, lon=nan, date=2024-03-30 17:24:06+13:00


Fetching weather:  45%|████▌     | 45/100 [00:17<00:17,  3.15it/s]

## How to Run the Full Dataset

To process your complete dataset (~10,000 observations), uncomment the code block above and run it. Here's what it will do:

**📊 Input:** `data/inaturalist_boletus_edulis_with_el_aspect_corine.csv` (9 columns)  
**📋 Output:** `data/inaturalist_boletus_edulis_with_el_aspect_corine_weather.csv` (99 columns)

**🌦️ Weather Variables Added:**
- `tmin_P1` to `tmin_P15` - Daily minimum temperature (°C)
- `tmax_P1` to `tmax_P15` - Daily maximum temperature (°C)  
- `temp_P1` to `temp_P15` - Daily mean temperature (°C)
- `rel_humidity_P1` to `rel_humidity_P15` - Daily mean relative humidity (%)
- `precipitation_P1` to `precipitation_P15` - Daily precipitation sum (mm)
- `wind_speed_P1` to `wind_speed_P15` - Daily max wind speed (km/h)

**📅 Time Period Meaning:**
- `P1` = Weather on the observation date  
- `P2` = Weather 1 day before observation
- `P3` = Weather 2 days before observation
- ...
- `P15` = Weather 14 days before observation

**⏱️ Processing Time:** Approximately 3-5 hours (with API rate limiting)

**💾 File Size:** The output file will be approximately 50-100MB
