# Train a sensor processing model using a Convolutional Variational Autoencoder 

Using the Julian-8897-Conv-VAE-PyTorch implementation to train a sensor processing model based on convolutional variational autoencoder. 

The parameters of the training are described by an experiment run of type "sensorprocessing_conv_vae". The result of runing the code in this notebook is the model files that are stored in the experiment directory. 

As the model files will have unpredictable date-time dependent names, after running a satisfactory model, the mode name and directory will need to be copied to the experiment/run yaml file, in the model_subdir and model_checkpoint fields.


In [None]:
# NOTEBOOK CHANGES OVERVIEW
# I changed the notebook to calculate validation loss when training the VAE if validation_data_dir is defined in the config file.
# Config file example: experiment_configs/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation.yaml

# CHANGES
# Ignore this cell. This is a workaround.

import os, sys, shutil, pprint, pathlib
from pathlib import Path

NB_DIR = Path.cwd() 
SRC_DIR = (NB_DIR / "..").resolve()
if str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

from exp_run_config import Config
Config.PROJECTNAME = "BerryPicker"

conv_vae_dir = Path(os.path.expanduser(os.path.expandvars(
    Config()["conv_vae"]["code_dir"]
))).resolve()

assert conv_vae_dir.exists(), f"Conv-VAE-PyTorch not found at: {conv_vae_dir}"
if str(conv_vae_dir) not in sys.path:
    sys.path.insert(0, str(conv_vae_dir))

from demonstration.demonstration import Demonstration, get_simple_transform
from conv_vae import get_conv_vae_config, create_configured_vae_json, train, create_configured_vae_json

***ExpRun**: Loading pointer config file:
	/home/al5d/.config/BerryPicker/mainsettings.yaml
***ExpRun**: Loading machine-specific config file:
	~/WORK/BerryPicker/cfg/settings.yaml


### Exp/run initialization
Create the exp/run-s that describe the parameters of the training. 
Some of the code here is structured in such a way as to make the notebook automatizable with papermill.

In [2]:
# *** Initialize the variables with default values 
# *** This cell should be tagged as parameters     
# *** If papermill is used, some of the values will be overwritten 

# If it is set to true, the exprun will be recreated from scratch
creation_style = "exist-ok"

# If not None, set an external experiment path
external_path = None
# If not None, set an output path
data_path = None
# If not None, set the epochs to something different than the exp
epochs = None

# Specify and load the experiment
experiment = "sensorprocessing_conv_vae"
# run = "sp_vae_128" 
# run = "sp_vae_128_300epochs" 
# run = "sp_vae_128_300epochs_validation" 
run = "sp_vae_128_300epochs_validation_leaf" 
# run = "sp_vae_256" 
# run = "sp_vae_256_300epochs" 


In [3]:
if external_path:
    external_path = pathlib.Path(external_path)
    assert external_path.exists()
    Config().set_experiment_path(external_path)
    Config().copy_experiment("sensorprocessing_conv_vae")
    Config().copy_experiment("robot_al5d")
    Config().copy_experiment("demonstration")
if data_path:
    data_path = pathlib.Path(data_path)
    assert data_path.exists()
    Config().set_experiment_data(data_path)

exp = Config().get_experiment(experiment, run, creation_style=creation_style)
if epochs:
    exp["epochs"] = epochs
pprint.pprint(exp)

***ExpRun**: Configuration for exp/run: sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf successfully loaded
Experiment:
    class: ConvVaeSensorProcessing
    data_dir: /home/al5d/WORK/BerryPicker/data/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf
    epochs: 300
    exp_run_sys_indep_file: /home/al5d/WORK/BerryPicker/src/BerryPicker/src/experiment_configs/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf.yaml
    experiment_name: sensorprocessing_conv_vae
    image_size:
    - 64
    - 64
    json_template_name: conv-vae-config-default.json
    latent_size: 128
    model_dir: models
    model_name: VAE_Robot
    run_name: sp_vae_128_300epochs_validation_leaf
    save_period: 5
    subrun_name: null
    time_started: '2025-09-20 14:18:52.842151'
    training_data:
    - - leaf
      - '2025_09_17__13_47_21'
      - dev0
    - - leaf
      - '2025_09_17__13_47_55'
      - dev0
    - - leaf
      - '2025_09_17__13_48_24'
      - dev0
    - - l

### Create the training data for the Conv-VAE

We collect the training data for the Conv-VAE by gathering all the pictures from all the demonstrations of a specific task. One can select the pictures by creating a specific task, and copy there all the relevant demonstrations. 

The collected pictures are put in a newly created training directory for the run:

```
$experiment\vae-training-data\Images\*.jpg
```

In [4]:
# CHANGES
# Creates validation data if configured

def copy_images_to_training_dir(exp, training_image_dir):
    """Copy all the images specified in the training_data and validation_data fields to their respective directory."""
    
    # Adds validation if found in config
    data_directories = {'training': [exp["training_data_dir"], exp["training_data"]]}
    if exp.get("validation_data_dir") is not None:
        data_directories['validation'] = [exp["validation_data_dir"], exp["validation_data"]]

    # Iterates over training and validation config information
    for key, value in data_directories.items():
        path = value[0]
        directories = value[1]

        data_dir = pathlib.Path(exp.data_dir(), pathlib.Path(exp.data_dir(), path))
        image_dir = pathlib.Path(data_dir, "Images")
        image_dir.mkdir(exist_ok = False, parents=True)

        count = 0
        transform = get_simple_transform()
        print(f"***Train-Conv-VAE***: Copying {key} images to {key} directory") # Prints out if creating on training or validation images

        for val in directories:
            run, demo_name, camera = val
            exp_demo = Config().get_experiment("demonstration", run)
            demo = Demonstration(exp_demo, demo_name)
            for i in range(demo.metadata["maxsteps"]):
                training_image_path = pathlib.Path(image_dir, f"{key}_{count:05d}.jpg")
                demo.write_image(i, training_image_path, camera=camera, transform=transform)
                count += 1
        print(f"***Train-Conv-VAE***: Copying {key} images to {key} directory done") # Marks completion of all training or validation demonstrations

In [5]:
# Deciding on the location of the training data
training_data_dir = pathlib.Path(exp.data_dir(), exp["training_data_dir"])
training_image_dir = pathlib.Path(training_data_dir, "Images")
# We assume that if the directory, exists, it had been previously populated with images
if not training_image_dir.exists():
    copy_images_to_training_dir(exp, training_image_dir=training_image_dir)
else:
    print(f"***Train-Conv-VAE***: Training image dir {training_image_dir} already exists. Do not repeat the copying.")            

***Train-Conv-VAE***: Copying training images to training directory
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***Train-Conv-VAE***: Copying training images to training directory done
***Train-Conv-VAE***: Copying validation images to validation directory
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/leaf successfully loaded
***Train-Conv-VAE***: Copying validation images to validation directory done


# Run the training

Actually run the training. This is done by creating the json-based configuration file of the Conv-VAE library with the parameters specified in the library. Then we call the code of the library to perform the training, and copy the last checkpoint as the final model.
If the final model exists, just exit. 

In [6]:
# CHANGES
# Training now includes validation and early stopping if config included
#   FUNCTION: create_configured_vae_json() - adds valid_loader
#   FUNCTION: train() - configures valid_loader before training

model_target_path = pathlib.Path(exp.data_dir(), "model.pth")
json_target_path = pathlib.Path(exp.data_dir(), "config.json")

if model_target_path.exists():
    print("***Train-Conv-VAE*** already completed for this exp/run")
else:
    # Create the vae configuration, based on the experiment
    file = create_configured_vae_json(exp)
    print(file)
    vae_config = get_conv_vae_config(file)
    # actually run the training
    print(f'***Train-Conv-VAE***: Running the trainer from scratch for {vae_config["trainer"]["epochs"]} epochs')
    exp.start_timer("training")
    trainer = train(vae_config)
    # If validation present, use best model, else use last model
    if "validation_data_dir" in exp:
        checkpoint_path = pathlib.Path(trainer.checkpoint_dir, f"model_best.pth")
    else:
        checkpoint_path = pathlib.Path(trainer.checkpoint_dir, f"checkpoint-epoch{trainer.epochs}.pth")

    json_path = pathlib.Path(trainer.checkpoint_dir, "config.json")

    if checkpoint_path.exists():
        print(f"***Train-Conv-VAE***: Copying the checkpoint from {checkpoint_path} to {model_target_path}")
        model_target_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy(checkpoint_path, model_target_path)
        # target_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy(json_path, json_target_path)
    else:
        print(f"***Train-Conv-VAE***: The checkpoint file {checkpoint_path} does not exist. Cannot copy it to model.pth")    
        exp.end_timer("training")



/home/al5d/WORK/BerryPicker/src/BerryPicker/src/sensorprocessing/conv-vae-config-default.json
{'name': 'VAE_Robot', 'n_gpu': 1, 'arch': {'type': 'VanillaVAE', 'args': {'in_channels': 3, 'latent_dims': 128, 'flow': False}}, 'data_loader': {'type': 'CelebDataLoader', 'args': {'data_dir': '/home/al5d/WORK/BerryPicker/data/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf/vae-training-data', 'batch_size': 64, 'shuffle': True, 'validation_split': 0.0, 'num_workers': 2}}, 'optimizer': {'type': 'Adam', 'args': {'lr': 0.005, 'weight_decay': 0.0, 'amsgrad': True}}, 'loss': 'elbo_loss', 'metrics': [], 'lr_scheduler': {'type': 'StepLR', 'args': {'step_size': 50, 'gamma': 0.1}}, 'trainer': {'epochs': 300, 'save_dir': '/home/al5d/WORK/BerryPicker/data/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf/models', 'save_period': 5, 'verbosity': 2, 'monitor': 'min val_loss', 'early_stop': 100, 'tensorboard': True}, 'valid_loader': {'type': 'CelebDataLoader', 'args': {'data_dir':

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  self._data.total[key] += value * n
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Seri

***Train-Conv-VAE***: Copying the checkpoint from /home/al5d/WORK/BerryPicker/data/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf/models/models/VAE_Robot/0920_141856/model_best.pth to /home/al5d/WORK/BerryPicker/data/sensorprocessing_conv_vae/sp_vae_128_300epochs_validation_leaf/model.pth
