### **1. Generate `train.txt` and `val.txt`**

The text files are used for loading Eurosat Data stored in `..\data\`. They look like this:

```
<path_to_image> <label>
```
For example:
```
/path/to/image1.tif    0
/path/to/image2.tif    3
...
```

The .txt-files are generate from the script below:

In [1]:
import os
from glob import glob
import random

def generate_split_txt(root_folder, out_txt_path, split_ratio=0.8, seed=42):
    """
    Creates train/val .txt files from a root image folder organized by class.
    Supports .tif and .jpg files.
    """
    class_names = sorted(os.listdir(root_folder))
    class_to_idx = {cls: idx for idx, cls in enumerate(class_names)}

    all_samples = []
    for cls in class_names:
        tif_paths = glob(os.path.join(root_folder, cls, "*.tif"))
        jpg_paths = glob(os.path.join(root_folder, cls, "*.jpg"))
        image_paths = tif_paths + jpg_paths
        for path in image_paths:
            all_samples.append(f"{path} {class_to_idx[cls]}")

    if not all_samples:
        print(f"⚠️  No image files found in: {root_folder}")
        return

    random.seed(seed)
    random.shuffle(all_samples)
    split_idx = int(len(all_samples) * split_ratio)
    train_samples = all_samples[:split_idx]
    val_samples = all_samples[split_idx:]

    with open(out_txt_path.replace(".txt", "_train.txt"), "w") as f:
        f.write("\n".join(train_samples))
    with open(out_txt_path.replace(".txt", "_val.txt"), "w") as f:
        f.write("\n".join(val_samples))

    print(f"✅ Created train/val splits for: {root_folder}")
    print(f"   → Train: {len(train_samples)} samples")
    print(f"   → Val:   {len(val_samples)} samples")

# Execution
generate_split_txt("../data/eurosat_ms", "../data_splits/eurosat_ms.txt")
generate_split_txt("../data/eurosat_rgb", "../data_splits/eurosat_rgb.txt")

✅ Created train/val splits for: ../data/eurosat_ms
   → Train: 21600 samples
   → Val:   5400 samples
✅ Created train/val splits for: ../data/eurosat_rgb
   → Train: 21600 samples
   → Val:   5400 samples


### 2. **Create Training Subsets (10%, 25%, 50%, 100%)**

The Goal is to measure how model performance improves as the training data size increases. To ensure fair and meaningful comparisons across runs, the validation set remains fixed.

The following textfiles were generated and include the complete dataset:

```
../data_splits/eurosat_ms_train.txt
../data_splits/eurosat_rgb_train.txt
```

To subsample:

* Randomly select a percentage of lines from that file
* Save them into new files like:

  ```
  ../data_splits/eurosat_ms_train_10.txt
  ../data_splits/eurosat_ms_train_25.txt
  ../data_splits/eurosat_ms_train_50.txt
  ```

Do this for RGB and MS too:

In [2]:
import random

def subsample_txt_file(input_path, output_prefix, percentages=[10, 25, 50], seed=42):
    with open(input_path, 'r') as f:
        lines = f.readlines()
    
    random.seed(seed)
    random.shuffle(lines)
    
    for p in percentages:
        count = int(len(lines) * (p / 100))
        subset = lines[:count]
        out_path = f"{output_prefix}_{p}.txt"
        with open(out_path, 'w') as f_out:
            f_out.writelines(subset)
        print(f"Saved {p}% subset to {out_path} ({count} samples)")

# Example usage
subsample_txt_file("../data_splits/eurosat_ms_train.txt", "../data_splits/eurosat_ms_train", percentages=[10, 25, 50, 75])
subsample_txt_file("../data_splits/eurosat_rgb_train.txt", "../data_splits/eurosat_rgb_train", percentages=[10, 25, 50, 75])

Saved 10% subset to ../data_splits/eurosat_ms_train_10.txt (2160 samples)
Saved 25% subset to ../data_splits/eurosat_ms_train_25.txt (5400 samples)
Saved 50% subset to ../data_splits/eurosat_ms_train_50.txt (10800 samples)
Saved 75% subset to ../data_splits/eurosat_ms_train_75.txt (16200 samples)
Saved 10% subset to ../data_splits/eurosat_rgb_train_10.txt (2160 samples)
Saved 25% subset to ../data_splits/eurosat_rgb_train_25.txt (5400 samples)
Saved 50% subset to ../data_splits/eurosat_rgb_train_50.txt (10800 samples)
Saved 75% subset to ../data_splits/eurosat_rgb_train_75.txt (16200 samples)


In [15]:
# Search for .pth files recursively
for root, dirs, files in os.walk("."):
    for file in files:
        if file.endswith(".pth"):
            print(os.path.join(root, file))