### To do list for implementation from scratch

*Resources*

* https://brsoff.github.io/tutorials/beginner/finetuning_torchvision_models_tutorial.html
* https://rumn.medium.com/part-1-ultimate-guide-to-fine-tuning-in-pytorch-pre-trained-model-and-its-configuration-8990194b71e
* https://pytorch.org/vision/stable/models/generated/torchvision.models.vgg16.html#torchvision.models.vgg16

*Stappenplan*
* Look at details of how SMILIES implemented VGG.
* Prepare and transform data (according to needs of pretrained model)
* Look into what pytorch calls 'feature extraction' (only changing classification head), not 'fine tuning' 

### Imports

In [1]:
import torch
import os
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import rasterio
import numpy as np
from torchvision import models, transforms

### Dataset class definition

The pretrained VGG "Accepts PIL.Image, batched (B, C, H, W) and single (C, H, W) image torch.Tensor objects"

* must convert the TIF into a format that it can be read by PIL, or use single Tensor objects
* But if using batch i assume PIL must be used?

In [None]:

class ImageDataset(Dataset):
    def __init__(self, image_dir, labels_df):
        self.image_dir = image_dir
        self.labels_df = labels_df

    def __len__(self):
        return len(self.labels_df)

    def __getitem__(self, idx):
        img_name = self.labels_df.index[idx]
        label = self.labels_df.iloc[idx, 0]
        img_path = os.path.join(self.image_dir, f"{img_name}.tif")

        # Open image as PIL (required by pretrained VGG)
        with Image.open(img_path) as im:
            PIL_img = im
        
        return PIL_img, label




#######################################################################################



In [3]:
with Image.open("/home/nadjaflechner/Palsa_data/dataset_100m/760_77_50_2016_neg_crop_0.tif") as im:
    print(type(im))

<class 'PIL.TiffImagePlugin.TiffImageFile'>


### Data preparation 

In [None]:

image_dir = "/home/nadjaflechner/Palsa_data/dataset_100m/"
labels_file = "/home/nadjaflechner/Palsa_data/binary_palsa_labels_100m.csv"

# Load the labels from the CSV file
labels_df = pd.read_csv(labels_file, index_col=0).head(5000)

# Split the dataset into training and validation sets
train_df = labels_df.head(4000)
val_df = labels_df.drop(train_df.index)

# Create the datasets and data loaders
train_dataset = ImageDataset(image_dir, train_df )
val_dataset = ImageDataset(image_dir, val_df )

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)