# Adjusting Miner Scoring in the Bittensor Subnet: Climate Zone Analysis  

## Introduction  

In this notebook, we aim to lay the groundwork for implementing a climate-based reward function in the Bittensor subnet. This involves mapping grid locations to climate types, analyzing prediction difficulty for each climate zone, and setting up the framework for dynamic reward adjustments.  

### The Reward Function  

The reward function we use to evaluate miner performance is defined as:  

$$
\text{Reward} = \max\left(0, \frac{z - \text{RMSE}}{z}\right)
$$

Where:  
- \( z \): A value that defines the acceptable error margin for a given task.  
- \( \text{RMSE} \): The root mean squared error of the miner’s prediction compared to the ground truth.  

This reward function works as follows:  
- If the miner's prediction is perfect (\( \text{RMSE} = 0 \)), the reward is 1 (maximum reward).  
- As the \( \text{RMSE} \) increases, the reward decreases proportionally.  
- If \( \text{RMSE} \) exceeds \( z \), the reward becomes 0, incentivizing miners to improve their predictions.  

### Why Adjust \( z \) Values?  

The \( z \) value is crucial because it sets the threshold for acceptable error. A higher \( z \) makes the task easier by allowing more error, while a lower \( z \) makes it harder by narrowing the acceptable margin for error.  

Since climate prediction difficulty varies by region:  
- **Easy Regions**: For example, predicting temperature over large, stable oceans may have low variability, making it easier to forecast accurately. Here, a lower \( z \) value can be used.  
- **Challenging Regions**: Areas like mountains or deserts with significant temperature fluctuations are more difficult to predict. For these regions, a higher z value would compensate for that, while lowering \( z \) encourages miners to invest more effort and resources into improving accuracy. So we can play with that.

### Initially: Using SOTA Models to Calibrate \( z \)  

To ensure the reward function reflects realistic difficulty levels:  
1. We use a state-of-the-art (SOTA) AI climate forecasting model to compute the average RMSE for each climate type.  
2. The observed RMSE values will guide the initial \( z \) values, ensuring that rewards align with the inherent difficulty of predicting each climate zone.  
3. Over time, \( z \) values can be manually adjusted to focus efforts on regions where improved predictions provide the most value.  

---

In this notebook, we will:  
1. Visualize the Köppen-Geiger climate zones to understand the diversity of climate types.  
2. Create a grid-based dictionary for efficient climate type lookup.  
3. Use a SOTA climate forecasting model to evaluate RMSE for each climate zone.  
4. Define initial \( z \) values based on the SOTA model’s performance.  

This step-by-step process ensures a fair, data-driven approach to incentivizing accurate climate forecasting in the Bittensor subnet.


## 1. Köppen-Geiger Climate Zones  

The Köppen-Geiger climate classification system is a widely used method for categorizing global climates based on observed temperature, precipitation, and seasonal patterns. It divides the Earth's surface into distinct climate types, each with a unique combination of letters to denote characteristics such as temperature range, humidity levels, and seasonal behavior.  

![Köppen-Geiger Climate Map](../images/Koeppen-Geiger-climate-classification.jpg "Köppen-Geiger Climate Zones Map")  

### Key Features of the Classification System:  
1. **Primary Categories**: The first letter represents the major climate group:  
   - **A**: Tropical  
   - **B**: Dry  
   - **C**: Temperate  
   - **D**: Cold  
   - **E**: Polar  

2. **Secondary Categories**: The second letter adds details about precipitation patterns:  
   - **f**: Fully humid  
   - **w**: Winter dry season  
   - **s**: Summer dry season  
   - **m**: Monsoonal  

3. **Tertiary Categories**: The third letter describes temperature variations:  
   - **a**: Hot summer  
   - **b**: Warm summer  
   - **c**: Cool summer  
   - **d**: Extremely cold winter  
   - **h**: Hot arid  
   - **k**: Cold arid  

### Climate Classes as a Basis for \( z \) Values  

In this notebook, we will use all the climate classes visualized in the Köppen-Geiger map as the basis for defining \( z \) values. Each class will have its own \( z \) value, reflecting the prediction difficulty for that climate type. For example:  
- **Cfa (Warm temperate, fully humid, hot summer)**: Found in large parts of the southeastern United States, this region is moderately challenging to predict and might have an initial \( z = 3 \).  
- **Dfb (Cold, fully humid, warm summer)**: Found in much of Canada and northern Europe, this zone might require a slightly higher \( z \) due to greater variability.  

### Visualization and Climate Zones  

A visualization of the Köppen-Geiger climate zones will help us understand the spatial distribution of these classifications. For regions without documented climate data, such as some parts of the oceans, we will assign an **UNKNOWN** class. This ensures comprehensive coverage and enables predictions across all areas of interest.  

### Setting \( z \) Values  

The exact \( z \) values for each climate type will be calculated later in this notebook, using a SOTA climate forecasting model to establish a baseline RMSE for each class. These \( z \) values will ensure that rewards in the Bittensor subnet are appropriately scaled to reflect the inherent difficulty of making accurate predictions in each climate zone.  


## 2. Create a grid-based dictionary for efficient climate type lookup.  
Since the data source that we will use contain a grid with some pin points on the earth of locations in terms of lat lon with 0.25 degree steps we will create a dictionary lookup where we can see for each of these points on earth what the corresponding climates are. we will start by showing an example of a datapoint from ERA5 (a grid with corresponding tempratures) and then we will 

In [7]:
# lets show a simple example of a datapoint
import sys
sys.path.append('../../')
from climate.data.era5_loader import ERA5DataLoader
import numpy as np

x = ERA5DataLoader()

lat_start = np.random.uniform(x.lat_range[0], x.lat_range[1] - x.area_sample_range[1])
lat_end = lat_start + np.random.uniform(*x.area_sample_range)
lon_start = np.random.uniform(x.lon_range[0], x.lon_range[1] - x.area_sample_range[1])
lon_end = lon_start + np.random.uniform(*x.area_sample_range)

start_time, end_time, predict_hours = x._sample_time_range()

data = x.get_data(
    lat_start=lat_start, 
    lat_end=lat_end, 
    lon_start=lon_start, 
    lon_end=lon_end, 
    start_time=start_time, 
    end_time=end_time
)

# Extract latitude and longitude
lat_lon_points = data[..., :2]  # Select only the latitude and longitude
unique_lat_lon = lat_lon_points.view(-1, 2).unique(dim=0)  # Flatten and get unique points

# Convert to a list of tuples if desired
unique_lat_lon_list = [tuple(point.tolist()) for point in unique_lat_lon]

print("Unique latitude and longitude points:")
print(unique_lat_lon_list)

Unique latitude and longitude points:
[(-74.75, -164.75), (-74.75, -164.5), (-74.75, -164.25), (-74.75, -164.0), (-74.75, -163.75), (-74.75, -163.5), (-74.75, -163.25), (-74.75, -163.0), (-74.75, -162.75), (-74.75, -162.5), (-74.75, -162.25), (-74.75, -162.0), (-74.75, -161.75), (-74.5, -164.75), (-74.5, -164.5), (-74.5, -164.25), (-74.5, -164.0), (-74.5, -163.75), (-74.5, -163.5), (-74.5, -163.25), (-74.5, -163.0), (-74.5, -162.75), (-74.5, -162.5), (-74.5, -162.25), (-74.5, -162.0), (-74.5, -161.75), (-74.25, -164.75), (-74.25, -164.5), (-74.25, -164.25), (-74.25, -164.0), (-74.25, -163.75), (-74.25, -163.5), (-74.25, -163.25), (-74.25, -163.0), (-74.25, -162.75), (-74.25, -162.5), (-74.25, -162.25), (-74.25, -162.0), (-74.25, -161.75), (-74.0, -164.75), (-74.0, -164.5), (-74.0, -164.25), (-74.0, -164.0), (-74.0, -163.75), (-74.0, -163.5), (-74.0, -163.25), (-74.0, -163.0), (-74.0, -162.75), (-74.0, -162.5), (-74.0, -162.25), (-74.0, -162.0), (-74.0, -161.75), (-73.75, -164.75), (-73

As shown above, the latitude and longitude points are spaced 0.25 degrees apart, so we will need to create a climate dictionary for each pair of latitude and longitude, with a 0.25-degree interval between the points.

In [9]:
def load_koeppen_geiger_data(file_path="/root/ClimateAI/climate/data/Koeppen-Geiger-ASCII.txt"):
    """Load climate data from the Koeppen-Geiger-ASCII.txt file into a dictionary."""
    all_lats = np.arange(-89.75, 90.25, 0.5)
    all_lons = np.arange(-179.75, 180.25, 0.5)
    climate_data = dict()
    for lat in all_lats:
        for lon in all_lons:
            if float(lat) not in climate_data:
                climate_data[float(lat)] = {}

            # initially set all classes to UNKNOWN
            climate_data[float(lat)][float(lon)] = "UNKNOWN"  

    counter = 0
    
    with open(file_path, "r") as file:
        # Skip the header
        next(file)
        
        # Process each line in the file
        for line in file:
            parts = line.split()
            if len(parts) == 3:  # Ensure valid data line
                lat, lon, cls = parts
                lat = float(lat)
                lon = float(lon)

                # Update the climate class
                climate_data[lat][lon] = cls
                counter += 1

    print(f"Loaded {counter} climate data points.")
    return climate_data

koeppen_geiger_data = load_koeppen_geiger_data()

Loaded 92416 climate data points.


In [10]:
list(koeppen_geiger_data.keys())[:5]  # Show the first 5 keys

[-89.75, -89.25, -88.75, -88.25, -87.75]

The Köppen-Geiger data from this source is initially provided with a grid resolution of 0.5 degrees. To increase the resolution to 0.25 degrees and make it suitable for our use case, we will generate a new Köppen-Geiger grid with 0.25-degree intervals. For each new point in this refined grid, we will assign the climate class based on the most frequent class found in the surrounding area. The following Python code can help achieve this by interpolating the original grid data to a finer resolution and assigning the dominant class within the neighborhood of each new point.

In [11]:
from collections import defaultdict

def refine_grid(original_grid, step=0.25):
    """
    Refine a grid to a finer resolution with interpolation.
    
    Args:
        original_grid (dict): Original grid as a dictionary of dictionaries.
        step (float): Step size for the new grid.
    
    Returns:
        dict: New dictionary with refined grid.
    """
    def combine_values(weights, values):
        """Combine values proportionally based on weights."""
        counts = defaultdict(float)
        for weight, value in zip(weights, values):
            counts[value] += weight
        return max(counts, key=counts.get)  # Return the value with the highest weight
    
    # Extract keys and create a new grid range
    keys = sorted(original_grid.keys())
    new_keys = [round(i * step, 2) for i in range(int(keys[0] / step), int(keys[-1] / step) + 1)]
    
    # Initialize the refined grid
    refined_grid = defaultdict(dict)
    
    # Interpolate for each row and column
    for y in new_keys:
        for x in new_keys:
            # Find surrounding keys
            y0 = max([k for k in keys if k <= y], default=min(keys))
            y1 = min([k for k in keys if k >= y], default=max(keys))
            x0 = max([k for k in keys if k <= x], default=min(keys))
            x1 = min([k for k in keys if k >= x], default=max(keys))

            # Get weights for interpolation
            wy0 = 1 - (y - y0) / (y1 - y0) if y1 != y0 else 1
            wy1 = 1 - wy0
            wx0 = 1 - (x - x0) / (x1 - x0) if x1 != x0 else 1
            wx1 = 1 - wx0

            # Collect surrounding values and weights
            weights = [wy0 * wx0, wy0 * wx1, wy1 * wx0, wy1 * wx1]
            values = [
                original_grid[y0][x0], original_grid[y0][x1],
                original_grid[y1][x0], original_grid[y1][x1]
            ]

            # Compute the combined value
            refined_grid[y][x] = combine_values(weights, values)
    
    # Convert defaultdict to a standard dictionary
    return {k: dict(v) for k, v in refined_grid.items()}


In [12]:
koeppen_grid_025 = refine_grid(koeppen_geiger_data, step=0.25)

In [14]:
import pickle

with open("../data/koeppen_geiger_climate_grid_025.pkl", "wb") as f:
    pickle.dump(x, f)