## Simple baseline with Bioclimatic Cubes — ResNet18 + Binary Cross Entropy

To demonstrate the potential of single modality data such as Bioclimatic cubes, we provide a straightforward baseline that is baseline on a modified ResNet18 and Binary Cross Entropy but still ranks highly on the leaderboard. The model itself should learn the relationship between the precise climatic history of a given location and its species composition.

Considering the significant extent for enhancing performance of this baseline, we encourage you to experiment with various techniques, architectures, losses, etc.

#### **Have Fun!**

In [1]:
import os
import torch
import tqdm
import numpy as np
import pandas as pd
import torchvision.models as models
import torchvision.transforms as transforms
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import CosineAnnealingLR
from sklearn.metrics import precision_recall_fscore_support

## Data description

The Bioclimatic Cubes are created from **four** monthly GeoTIFF CHELSA (https://chelsa-climate.org/timeseries/) time series climatic rasters with a resolution of 30 arc seconds, i.e. approximately 1km. The four variables are the precipitation (pr), maximum- (taxmax), minimum- (tasmin), and mean (tax) daily temperatures per month from January 2000 to June 2019. We provide the data in three forms: (i) raw rasters (GeoTiff images), (ii) CSV file with pre-extracted values for each location, i.e., surveyId, and (iii) data cubes as tensor object (.pt).

In this notebook, we will work with just the cubes. The cubes are structured as follows.
**Shape**: `(n_year, n_month, n_bio)` where:
- `n_year` = 19 (ranging from 2000 to 2018)
- `n_month` = 12 (ranging from January 01 to December 12)
- `n_bio` = 4 comprising [`pr` (precipitation), `tas` (mean daily air temperature), `tasmin`, `tasmax`]

The datacubes can simply be loaded as tensors using PyTorch with the following command :

```python
import torch
torch.load('path_to_file.pt')
```

**References:**
- *Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, P., Kessler, M. (2017): Climatologies at high resolution for the Earth land surface areas. Scientific Data. 4 170122. https://doi.org/10.1038/sdata.2017.122*

- *Karger D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E, Linder, H.P., Kessler, M. Data from: Climatologies at high resolution for the earth’s land surface areas. Dryad Digital Repository. http://dx.doi.org/doi:10.5061/dryad.kd1d4*

## Prepare custom dataset loader

We have to sloightly update the Dataset to provide the relevant data in the appropriate format.

In [2]:
class TrainDataset(Dataset):
    def __init__(self, data_dir, metadata, subset, transform=None):
        self.subset = subset
        self.transform = transform
        self.data_dir = data_dir
        self.metadata = metadata
        self.metadata = self.metadata.dropna(subset="speciesId").reset_index(drop=True)
        self.metadata['speciesId'] = self.metadata['speciesId'].astype(int)
        self.label_dict = self.metadata.groupby('surveyId')['speciesId'].apply(list).to_dict()
        
        self.metadata = self.metadata.drop_duplicates(subset="surveyId").reset_index(drop=True)

    def __len__(self):
        return len(self.metadata)

    def __getitem__(self, idx):
        
        survey_id = self.metadata.surveyId[idx]
        sample = torch.load(os.path.join(self.data_dir, f"GLC25-PA-{self.subset}-bioclimatic_monthly_{survey_id}_cube.pt"), weights_only=True)
        species_ids = self.label_dict.get(survey_id, [])  # Get list of species IDs for the survey ID
        label = torch.zeros(self.num_classes).scatter(0, torch.tensor(species_ids), torch.ones(len(species_ids)))

        # Ensure the sample is in the correct format for the transform
        if isinstance(sample, torch.Tensor):
            sample = sample.permute(1, 2, 0)  # Change tensor shape from (C, H, W) to (H, W, C)
            sample = sample.numpy()  

        if self.transform:
            sample = self.transform(sample)

        return sample, label, survey_id
    
class TestDataset(TrainDataset):
    def __init__(self, data_dir, metadata, subset, transform=None):
        self.subset = subset
        self.transform = transform
        self.data_dir = data_dir
        self.metadata = metadata
        
    def __getitem__(self, idx):
        
        survey_id = self.metadata.surveyId[idx]
        sample = torch.load(os.path.join(self.data_dir, f"GLC25-PA-{self.subset}-bioclimatic_monthly_{survey_id}_cube.pt"), weights_only=True)
        
        if isinstance(sample, torch.Tensor):
            sample = sample.permute(1, 2, 0)  # Change tensor shape from (C, H, W) to (H, W, C)
            sample = sample.numpy()

        if self.transform:
            sample = self.transform(sample)

        return sample, survey_id

### Load metadata and prepare data loaders

In [12]:
# Dataset and DataLoader
batch_size = 256
transform = transforms.Compose([
    transforms.ToTensor()
])

# Load Training metadata
path_data = "/home/gt/DATA/geolifeclef-2025"
train_data_path = os.path.join(path_data, "BioclimTimeSeries/cubes/PA-train")
train_metadata_path = os.path.join(path_data, "GLC25_PA_metadata_train.csv")
train_metadata = pd.read_csv(train_metadata_path)
train_dataset = TrainDataset(train_data_path, train_metadata, subset="train", transform=transform)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)

# Load Test metadata
test_data_path = os.path.join(path_data, "BioclimTimeSeries/cubes/PA-test")
test_metadata_path = os.path.join(path_data, "GLC25_PA_metadata_test.csv")
test_metadata = pd.read_csv(test_metadata_path)
test_dataset = TestDataset(test_data_path, test_metadata, subset="test", transform=transform)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4)

## Define and initialize the ModifiedResNet18 model

To utilize the bioclimatic cubes, which have a shape of [4,19,12] (RASTER-TYPE, YEAR, and MONTH), some minor adjustments must be made to the vanilla ResNet-18. It's important to note that this is just one method for ensuring compatibility with the unusual tensor shape, and experimentation is encouraged.

In [13]:
class ModifiedResNet18(nn.Module):
    def __init__(self, num_classes):
        super(ModifiedResNet18, self).__init__()

        self.norm_input = nn.LayerNorm([4,19,12])
        self.resnet18 = models.resnet18(weights=None)
        # We have to modify the first convolutional layer to accept 4 channels instead of 3
        self.resnet18.conv1 = nn.Conv2d(4, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.resnet18.maxpool = nn.Identity()
        self.ln = nn.LayerNorm(1000)
        self.fc = nn.Linear(1000, num_classes)

    def forward(self, x):
        x = self.norm_input(x)
        x = self.resnet18(x)
        x = self.ln(x)
        x = self.fc(x)
        return x

In [14]:
def set_seed(seed):
    # Set seed for Python's built-in random number generator
    torch.manual_seed(seed)
    # Set seed for numpy
    np.random.seed(seed)
    # Set seed for CUDA if available
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        # Set cuDNN's random number generator seed for deterministic behavior
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

set_seed(69)

In [15]:
# Check if cuda is available
device = torch.device("cpu")

if torch.cuda.is_available():
    device = torch.device("cuda")
    print("DEVICE = CUDA")

num_classes = 11255 # Number of all unique classes within the PO and PA data.
model = ModifiedResNet18(num_classes).to(device)

DEVICE = CUDA


## Training Loop

Nothing special, just a standard Pytorch training loop.

In [16]:
# Hyperparameters
learning_rate = 0.0002
num_epochs = 20
positive_weigh_factor = 1.0

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
scheduler = CosineAnnealingLR(optimizer, T_max=25, verbose=True)



In [17]:
print(f"Training for {num_epochs} epochs started.")

for epoch in range(num_epochs):
    model.train()
    for batch_idx, (data, targets, _) in tqdm.tqdm(enumerate(train_loader), total=len(train_loader), leave=False):
        data = data.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        outputs = model(data)
        pos_weight = targets*positive_weigh_factor  # All positive weights are equal to 10
        criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}")
    scheduler.step()
    print("Scheduler:",scheduler.state_dict())

# Save the trained model
model.eval()
torch.save(model.state_dict(), "resnet18-with-bioclimatic-cubes.pth")

Training for 20 epochs started.


                                                                                                                                                                                           

Epoch 1/20, Loss: 0.006464727688580751
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 1, 'verbose': True, '_step_count': 2, '_get_lr_called_within_step': False, '_last_lr': [0.0001992114701314478]}


                                                                                                                                                                                           

Epoch 2/20, Loss: 0.005606101825833321
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 2, 'verbose': True, '_step_count': 3, '_get_lr_called_within_step': False, '_last_lr': [0.0001968583161128631]}


                                                                                                                                                                                           

Epoch 3/20, Loss: 0.004869306460022926
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 3, 'verbose': True, '_step_count': 4, '_get_lr_called_within_step': False, '_last_lr': [0.00019297764858882514]}


                                                                                                                                                                                           

Epoch 4/20, Loss: 0.004654968623071909
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 4, 'verbose': True, '_step_count': 5, '_get_lr_called_within_step': False, '_last_lr': [0.00018763066800438636]}


                                                                                                                                                                                           

Epoch 5/20, Loss: 0.004428307991474867
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 5, 'verbose': True, '_step_count': 6, '_get_lr_called_within_step': False, '_last_lr': [0.00018090169943749476]}


                                                                                                                                                                                           

Epoch 6/20, Loss: 0.004599055740982294
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 6, 'verbose': True, '_step_count': 7, '_get_lr_called_within_step': False, '_last_lr': [0.00017289686274214118]}


                                                                                                                                                                                           

Epoch 7/20, Loss: 0.004398504737764597
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 7, 'verbose': True, '_step_count': 8, '_get_lr_called_within_step': False, '_last_lr': [0.000163742398974869]}


                                                                                                                                                                                           

Epoch 8/20, Loss: 0.004502722527831793
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 8, 'verbose': True, '_step_count': 9, '_get_lr_called_within_step': False, '_last_lr': [0.00015358267949789966]}


                                                                                                                                                                                           

Epoch 9/20, Loss: 0.004673454444855452
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 9, 'verbose': True, '_step_count': 10, '_get_lr_called_within_step': False, '_last_lr': [0.00014257792915650726]}


                                                                                                                                                                                           

Epoch 10/20, Loss: 0.00449385354295373
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 10, 'verbose': True, '_step_count': 11, '_get_lr_called_within_step': False, '_last_lr': [0.00013090169943749474]}


                                                                                                                                                                                           

Epoch 11/20, Loss: 0.0042219385504722595
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 11, 'verbose': True, '_step_count': 12, '_get_lr_called_within_step': False, '_last_lr': [0.00011873813145857248]}


                                                                                                                                                                                           

Epoch 12/20, Loss: 0.004247903358191252
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 12, 'verbose': True, '_step_count': 13, '_get_lr_called_within_step': False, '_last_lr': [0.00010627905195293135]}


                                                                                                                                                                                           

Epoch 13/20, Loss: 0.004098457284271717
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 13, 'verbose': True, '_step_count': 14, '_get_lr_called_within_step': False, '_last_lr': [9.372094804706867e-05]}


                                                                                                                                                                                           

Epoch 14/20, Loss: 0.004416175652295351
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 14, 'verbose': True, '_step_count': 15, '_get_lr_called_within_step': False, '_last_lr': [8.126186854142755e-05]}


                                                                                                                                                                                           

Epoch 15/20, Loss: 0.004540497902780771
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 15, 'verbose': True, '_step_count': 16, '_get_lr_called_within_step': False, '_last_lr': [6.90983005625053e-05]}


                                                                                                                                                                                           

Epoch 16/20, Loss: 0.003989492077380419
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 16, 'verbose': True, '_step_count': 17, '_get_lr_called_within_step': False, '_last_lr': [5.742207084349274e-05]}


                                                                                                                                                                                           

Epoch 17/20, Loss: 0.0037517051678150892
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 17, 'verbose': True, '_step_count': 18, '_get_lr_called_within_step': False, '_last_lr': [4.6417320502100316e-05]}


                                                                                                                                                                                           

Epoch 18/20, Loss: 0.0039222766645252705
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 18, 'verbose': True, '_step_count': 19, '_get_lr_called_within_step': False, '_last_lr': [3.6257601025131026e-05]}


                                                                                                                                                                                           

Epoch 19/20, Loss: 0.0038201878778636456
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 19, 'verbose': True, '_step_count': 20, '_get_lr_called_within_step': False, '_last_lr': [2.7103137257858868e-05]}


                                                                                                                                                                                           

Epoch 20/20, Loss: 0.003970972262322903
Scheduler: {'T_max': 25, 'eta_min': 0.0, 'base_lrs': [0.0002], 'last_epoch': 20, 'verbose': True, '_step_count': 21, '_get_lr_called_within_step': False, '_last_lr': [1.909830056250527e-05]}


## Test Loop

Again, nothing special, just a standard inference.

In [13]:
with torch.no_grad():
    all_predictions = []
    surveys = []
    top_k_indices = None
    for data, surveyID in tqdm.tqdm(test_loader, total=len(test_loader)):

        data = data.to(device)
        
        outputs = model(data)
        predictions = torch.sigmoid(outputs).cpu().numpy()

        # Sellect top-25 values as predictions
        top_25 = np.argsort(-predictions, axis=1)[:, :25] 
        if top_k_indices is None:
            top_k_indices = top_25
        else:
            top_k_indices = np.concatenate((top_k_indices, top_25), axis=0)

        surveys.extend(surveyID.cpu().numpy())

100%|██████████| 231/231 [00:21<00:00, 10.58it/s]


## Save prediction file! 🎉🥳🙌🤗

In [14]:
data_concatenated = [' '.join(map(str, row)) for row in top_k_indices]

pd.DataFrame(
    {'surveyId': surveys,
     'predictions': data_concatenated,
    }).to_csv("submission.csv", index = False)