# Train a sensor processing model using a Convolutional Variational Autoencoder 

Using the Julian-8897-Conv-VAE-PyTorch implementation to train a sensor processing model based on convolutional variational autoencoder. 

The parameters of the training are described by an experiment run of type "sensorprocessing_conv_vae". The result of runing the code in this notebook is the model files that are stored in the experiment directory. 

As the model files will have unpredictable date-time dependent names, after running a satisfactory model, the mode name and directory will need to be copied to the experiment/run yaml file, in the model_subdir and model_checkpoint fields.


In [8]:
import sys
sys.path.append("..")
from exp_run_config import Config
Config.PROJECTNAME = "BerryPicker"

import pathlib
import shutil
import pprint
from demonstration.demonstration import Demonstration, get_simple_transform

# adding the Julian-8897-Conv-VAE-PyTorch into the path
vaepath = pathlib.Path(Config()["conv_vae"]["code_dir"]).expanduser()
sys.path.append(str(vaepath))
print(sys.path)

# At some point in the development, this hack was necessary for some reason. 
# It seems that as of Feb 2025, the code runs on Windows and Linux without it.
#temp = pathlib.PosixPath
#pathlib.PosixPath = pathlib.WindowsPath

from conv_vae import get_conv_vae_config, create_configured_vae_json, train

['C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.13_3.13.2544.0_x64__qbz5n2kfra8p0\\python313.zip', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.13_3.13.2544.0_x64__qbz5n2kfra8p0\\DLLs', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.13_3.13.2544.0_x64__qbz5n2kfra8p0\\Lib', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.13_3.13.2544.0_x64__qbz5n2kfra8p0', 'c:\\Users\\lotzi\\Work\\_VirtualEnvs\\BerryPicker', '', 'c:\\Users\\lotzi\\Work\\_VirtualEnvs\\BerryPicker\\Lib\\site-packages', 'c:\\Users\\lotzi\\Work\\_VirtualEnvs\\BerryPicker\\Lib\\site-packages\\win32', 'c:\\Users\\lotzi\\Work\\_VirtualEnvs\\BerryPicker\\Lib\\site-packages\\win32\\lib', 'c:\\Users\\lotzi\\Work\\_VirtualEnvs\\BerryPicker\\Lib\\site-packages\\Pythonwin', '..', '..', '..', 'c:\\Users\\lotzi\\Work\\_Code\\Conv-VAE-PyTorch', '..', '..', 'c:\\Users\\lotzi\\Work\\_Code\\Conv-VAE-PyTorch']


### Exp/run initialization
Create the exp/run-s that describe the parameters of the training. 
Some of the code here is structured in such a way as to make the notebook automatizable with papermill.

In [None]:
# *** Initialize the variables with default values 
# *** This cell should be tagged as parameters     
# *** If papermill is used, some of the values will be overwritten 

# If it is set to true, the exprun will be recreated from scratch
creation_style = "exist-ok"

# If not None, set an external experiment path
external_path = None
# If not None, set an output path
data_path = None
# If not None, set the epochs to something different than the exp
epochs = None

# Specify and load the experiment
experiment = "sensorprocessing_conv_vae"
run = "sp_vae_128" 
# run = "sp_vae_128_300epochs" 
# run = "sp_vae_256" 
# run = "sp_vae_256_300epochs" 

#### Temporary values - these would be overwritten by the flow  #####
creation_style = "exist-ok"
data_path = "c:/Users/lotzi/Work/_Data/BerryPicker-Flows/BC-touch-apple/result"
experiment = "sensorprocessing_conv_vae"
external_path = "c:/Users/lotzi/Work/_Data/BerryPicker-Flows/BC-touch-apple/exprun"
run = "_flow_sp_conv_vae_0001"

In [11]:
if external_path:
    external_path = pathlib.Path(external_path)
    assert external_path.exists()
    Config().set_exprun_path(external_path)
    Config().copy_experiment("sensorprocessing_conv_vae")
    Config().copy_experiment("robot_al5d")
    Config().copy_experiment("demonstration")
if data_path:
    data_path = pathlib.Path(data_path)
    assert data_path.exists()
    Config().set_results_path(data_path)

exp = Config().get_experiment(experiment, run, creation_style=creation_style)
if epochs:
    exp["epochs"] = epochs
pprint.pprint(exp)

***ExpRun**: Experiment config path changed to c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\exprun
***ExpRun**: Experiment sensorprocessing_conv_vae copied to
c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\exprun\sensorprocessing_conv_vae
***ExpRun**: Experiment robot_al5d copied to
c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\exprun\robot_al5d
***ExpRun**: Experiment demonstration copied to
c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\exprun\demonstration
***ExpRun**: Experiment data path changed to c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result
***ExpRun**: Configuration for exp/run: sensorprocessing_conv_vae/_flow_sp_conv_vae_0001 successfully loaded
Experiment:
    class: ConvVaeSensorProcessing
    clean_checkpoints: true
    data_dir: c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result\sensorprocessing_conv_vae\_flow_sp_conv_vae_0001
    epochs: 15
    exp_run_sys_indep_file: c:\Users\lotzi\Wor

### Create the training data for the Conv-VAE

We collect the training data for the Conv-VAE by gathering all the pictures from all the demonstrations of a specific task. One can select the pictures by creating a specific task, and copy there all the relevant demonstrations. 

The collected pictures are put in a newly created training directory for the run:

```
$experiment\vae-training-data\Images\*.jpg
```

In [12]:
def copy_images_to_training_dir(exp, training_image_dir):
    """Copy all the images specified in the training_data field to the training directory."""
    count = 0
    transform = get_simple_transform()
    print("***Train-Conv-VAE***: Copying training images to training directory")
    for val in exp["training_data"]:
        run, demo_name, camera = val
        exp_demo = Config().get_experiment("demonstration", run)
        demo = Demonstration(exp_demo, demo_name)
        for i in range(demo.metadata["maxsteps"]):
            training_image_path = pathlib.Path(training_image_dir, f"train_{count:05d}.jpg")
            demo.write_image(i, training_image_path, camera=camera, transform=transform)
            count += 1
    print(f"***Train-Conv-VAE***: Copying training images to training directory done")


In [13]:

# Deciding on the location of the training data
training_data_dir = pathlib.Path(exp.data_dir(), exp["training_data_dir"])
training_image_dir = pathlib.Path(training_data_dir, "Images")
# We assume that if the directory, exists, it had been previously populated with images
if not training_image_dir.exists():
    training_image_dir.mkdir(exist_ok = False, parents=True)
    copy_images_to_training_dir(exp, training_image_dir=training_image_dir)
else:
    print(f"***Train-Conv-VAE***: Training image dir {training_image_dir} already exists. Do not repeat the copying.")            


***Train-Conv-VAE***: Copying training images to training directory
***ExpRun**: Configuration for exp/run: demonstration/touch-apple successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/touch-apple successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/touch-apple successfully loaded
***ExpRun**: Configuration for exp/run: demonstration/touch-apple successfully loaded
***Train-Conv-VAE***: Copying training images to training directory done


# Run the training

Actually run the training. This is done by creating the json-based configuration file of the Conv-VAE library with the parameters specified in the library. Then we call the code of the library to perform the training, and copy the last checkpoint as the final model.
If the final model exists, just exit. 

In [None]:
model_target_path = pathlib.Path(exp.data_dir(), "model.pth")
json_target_path = pathlib.Path(exp.data_dir(), "config.json")

if model_target_path.exists():
    print("***Train-Conv-VAE*** already completed for this exp/run")
else:
    # Create the vae configuration, based on the experiment
    file = create_configured_vae_json(exp)
    print(file)
    vae_config = get_conv_vae_config(file)
    # actually run the training
    print(f'***Train-Conv-VAE***: Running the trainer from scratch for {vae_config["trainer"]["epochs"]} epochs')
    exp.start_timer("training")
    trainer = train(vae_config)
    checkpoint_path = pathlib.Path(trainer.checkpoint_dir, f"checkpoint-epoch{trainer.epochs}.pth")

    json_path = pathlib.Path(trainer.checkpoint_dir, "config.json")

    if checkpoint_path.exists():
        print(f"***Train-Conv-VAE***: Copying the checkpoint from {checkpoint_path} to {model_target_path}")
        model_target_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy(checkpoint_path, model_target_path)
        # target_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy(json_path, json_target_path)
        if exp["clean_checkpoints"]:
            for i in range(trainer.epochs+1):
                checkpoint = pathlib.Path(trainer.checkpoint_dir, f"checkpoint-epoch{i}.pth")
                if checkpoint.exists():
                    print(f"removing checkpoint file: {checkpoint}")
                    checkpoint.unlink()
    else:
        print(f"***Train-Conv-VAE***: The checkpoint file {checkpoint_path} does not exist. Cannot copy it to model.pth")    
        exp.end_timer("training")
    # cleaning up the checkpoints
            

C:\Users\lotzi\Work\_Code\BerryPicker\src\sensorprocessing\conv-vae-config-default.json
{'name': 'VAE_Robot', 'n_gpu': 1, 'arch': {'type': 'VanillaVAE', 'args': {'in_channels': 3, 'latent_dims': 128, 'flow': False}}, 'data_loader': {'type': 'CelebDataLoader', 'args': {'data_dir': 'c:\\Users\\lotzi\\Work\\_Data\\BerryPicker-Flows\\BC-touch-apple\\result\\sensorprocessing_conv_vae\\_flow_sp_conv_vae_0001\\vae-training-data', 'batch_size': 64, 'shuffle': True, 'validation_split': 0.2, 'num_workers': 2}}, 'optimizer': {'type': 'Adam', 'args': {'lr': 0.005, 'weight_decay': 0.0, 'amsgrad': True}}, 'loss': 'elbo_loss', 'metrics': [], 'lr_scheduler': {'type': 'StepLR', 'args': {'step_size': 50, 'gamma': 0.1}}, 'trainer': {'epochs': 15, 'save_dir': 'c:\\Users\\lotzi\\Work\\_Data\\BerryPicker-Flows\\BC-touch-apple\\result\\sensorprocessing_conv_vae\\_flow_sp_conv_vae_0001\\models', 'save_period': 5, 'verbosity': 2, 'monitor': 'min val_loss', 'early_stop': 10, 'tensorboard': True}}
c:\Users\lotzi

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  self._data.total[key] += value * n
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Seri

***Train-Conv-VAE***: Copying the checkpoint from c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result\sensorprocessing_conv_vae\_flow_sp_conv_vae_0001\models\models\VAE_Robot\1205_145441\checkpoint-epoch15.pth to c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result\sensorprocessing_conv_vae\_flow_sp_conv_vae_0001\model.pth
removing checkpoint file: c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result\sensorprocessing_conv_vae\_flow_sp_conv_vae_0001\models\models\VAE_Robot\1205_145441\checkpoint-epoch5.pth
removing checkpoint file: c:\Users\lotzi\Work\_Data\BerryPicker-Flows\BC-touch-apple\result\sensorprocessing_conv_vae\_flow_sp_conv_vae_0001\models\models\VAE_Robot\1205_145441\checkpoint-epoch10.pth
