# Train behavior cloning

Train a behavior cloning based robot controller. 
* Code for loading and pre-processing the training data, typically from a set of demonstrations as specified in an exp/run
* Train the behavior cloning controller. 
* The trained controllers should be saved into the exp/run

In [1]:
import sys
sys.path.append("..")

from exp_run_config import Config
Config.PROJECTNAME = "BerryPicker"

import pathlib
from tqdm import tqdm
import pprint
import torch
torch.manual_seed(1)

from bc_trainingdata import create_trainingdata_bc
from bc_factory import create_bc_model
from tensorboardX import SummaryWriter

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = "cpu"
print(f"Using device: {device}")

***ExpRun**: Loading pointer config file:
	C:\Users\lboloni\.config\BerryPicker\mainsettings.yaml
***ExpRun**: Loading machine-specific config file:
	G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\settings-LotziYoga.yaml
Using device: cuda


In [2]:
experiment = "behavior_cloning"
# run = "bc_mlp_00"
# run = "bc_lstm_00"
run = "bc_lstm_resid_00"
exp = Config().get_experiment(experiment, run, creation_style="discard-old")
pprint.pprint(exp)
spexp = Config().get_experiment(exp["sp_experiment"], exp["sp_run"])

***ExpRun**: No system dependent experiment file
	 G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\experiment-config\LotziYoga\behavior_cloning\bc_lstm_resid_00_sysdep.yaml,
	 that is ok, proceeding.
***ExpRun**: Configuration for exp/run: behavior_cloning/bc_lstm_resid_00 successfully loaded
***ExpRun**: Removing existing experiment directory c:\Users\lboloni\Documents\Code\_TempData\BerryPicker-experiments\behavior_cloning\bc_lstm_resid_00
Experiment:
    control_size: 6
    controller: bc_LSTM_Residual
    controller_file: controller.pth
    data_dir: c:\Users\lboloni\Documents\Code\_TempData\BerryPicker-experiments\behavior_cloning\bc_lstm_resid_00
    epochs: 10
    exp_run_sys_indep_file: C:\Users\lboloni\Documents\Code\_Checkouts\BerryPicker\src\experiment_configs\behavior_cloning\bc_lstm_resid_00.yaml
    experiment_name: behavior_cloning
    hidden_size: 32
    loss: MSELoss
    optimizer: Adam
    optimizer_lr: '0.001'
    run_name: bc_lstm_resid_00
    seque

### Training an RNN model
Functions for training an RNN type model. These models assume that the input is a sequence $[z_{t-k},...z_{t}]$ while the output is the next action $a_{t+1}$

In [3]:
def validate_bc_rnn(model, criterion, data, device):
    """Calculates the average validation error for the behavior cloning model using an RNN with the specific criterion function. Uses the z_validation an a_validation fields in "data". The inputs and the targets a list of individual input and target. 
    CHECK: I think that the target is supposed to be the last output of the RNN when the whole input string had been passed through it. 
    The model is reset before each of the strings (i.e. state is not transferred)
    model: an LSTM or similar model that can consume a sequence of inputs
    criterion: any function that calculates the distance between the targets
    """
    num_sequences = data["z_validation"].shape[0]
    model.eval()
    val_loss = 0
    with torch.no_grad():  # Disable gradient computation
        for i in range(num_sequences):
            # Forward pass
            input_seq = data["z_validation"][i].to(device)
            target = data["a_validation"][i].to(device)
            # Reshape for batch compatibility
            input_seq = input_seq.unsqueeze(0)  # Shape: [1, sequence_length, latent_size]
            target = target.unsqueeze(0)        # Shape: [1, latent_size]
            outputs = model(input_seq)
            loss = criterion(outputs, target)
            # Accumulate loss
            val_loss += loss.item()
    avg_loss = val_loss / num_sequences
    return avg_loss


In [4]:
def train_bc_rnn(model, optimizer, criterion, data, num_epochs, writer = None):
    """Train a behavior cloning model of the LSTM class.
    Uses a writer
    """
    exp.start_timer("train")
    num_sequences = data["z_train"].shape[0]

    for epoch in tqdm(range(num_epochs)):
        model.train()
        
        # Loop over each sequence in the batch
        training_loss = 0
        for i in range(num_sequences):
            # Prepare input and target
            input_seq = data["z_train"][i].to(device)
            target = data["a_train"][i].to(device)

            # Reshape for batch compatibility
            input_seq = input_seq.unsqueeze(0)  # Shape: [1, sequence_length, latent_size]
            target = target.unsqueeze(0)        # Shape: [1, latent_size]

            # Forward pass
            output = model(input_seq)
            loss = criterion(output, target)
            training_loss += loss.item()
            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        avg_training_loss = training_loss / num_sequences
        avg_validation_loss = validate_bc_rnn(model, criterion, data, device)
        if writer is not None:
            writer.add_scalar("TrainingLoss", avg_training_loss, epoch)
            writer.add_scalar("ValidationLoss", avg_validation_loss, epoch)
            writer.flush()
        if (epoch+1) % 2 == 0: # was 0
            print(f'Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_training_loss:.4f} Validation Loss: {avg_validation_loss:.4f} ')
    print("Training complete.")
    exp.end_timer("train")


### Train the model 

Creates and trains a behavior cloning model specified by the exp.

In [5]:
# model, criterion, optimizer = create_bc_model(exp, spexp, device)
model, criterion, optimizer = create_bc_model(exp, spexp, device)
print(model)

KeyError: 'optimizer-lr'

In [None]:

data = create_trainingdata_bc(exp, spexp, device="cpu")
# Training Loop
num_epochs = exp["epochs"]

# Create a SummaryWriter instance
# where does the logdir go???
writer = SummaryWriter(logdir="/home/lboloni/runs/example")
train_bc_rnn(
    model, optimizer, criterion, data=data,
    num_epochs=num_epochs, writer=writer)
writer.close()
controller_path = pathlib.Path(exp.data_dir(), exp["controller_file"])
torch.save(model.state_dict(), controller_path)


***Timer*** data_preparation started
***ExpRun**: Experiment default config C:\Users\lboloni\Documents\Code\_Checkouts\BerryPicker\src\experiment_configs\demonstration\_defaults_demonstration.yaml was empty, ok.
***ExpRun**: No system dependent experiment file
	 G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\experiment-config\LotziYoga\demonstration\random-both-cameras_sysdep.yaml,
	 that is ok, proceeding.
***ExpRun**: Configuration for exp/run: demonstration/random-both-cameras successfully loaded
***ExpRun**: Experiment default config C:\Users\lboloni\Documents\Code\_Checkouts\BerryPicker\src\experiment_configs\demonstration\_defaults_demonstration.yaml was empty, ok.
***ExpRun**: No system dependent experiment file
	 G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\experiment-config\LotziYoga\demonstration\random-both-cameras_sysdep.yaml,
	 that is ok, proceeding.
***ExpRun**: Configuration for exp/run: demonstration/random-both-cameras successfull

  inputs_list.append(torch.tensor(input_seq))
  targets_list.append(torch.tensor(target))


***ExpRun**: Experiment default config C:\Users\lboloni\Documents\Code\_Checkouts\BerryPicker\src\experiment_configs\demonstration\_defaults_demonstration.yaml was empty, ok.
***ExpRun**: No system dependent experiment file
	 G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\experiment-config\LotziYoga\demonstration\random-both-cameras_sysdep.yaml,
	 that is ok, proceeding.
***ExpRun**: Configuration for exp/run: demonstration/random-both-cameras successfully loaded
***ExpRun**: Experiment default config C:\Users\lboloni\Documents\Code\_Checkouts\BerryPicker\src\experiment_configs\demonstration\_defaults_demonstration.yaml was empty, ok.
***ExpRun**: No system dependent experiment file
	 G:\My Drive\LotziStudy\Code\PackageTracking\BerryPicker\settings\experiment-config\LotziYoga\demonstration\random-both-cameras_sysdep.yaml,
	 that is ok, proceeding.
***ExpRun**: Configuration for exp/run: demonstration/random-both-cameras successfully loaded
***Timer*** data_preparation

 20%|██        | 2/10 [00:13<00:57,  7.16s/it]

Epoch [2/10], Training Loss: 2.3125 Validation Loss: 2.2532 


 40%|████      | 4/10 [00:26<00:38,  6.43s/it]

Epoch [4/10], Training Loss: 1.4555 Validation Loss: 1.4097 


 60%|██████    | 6/10 [00:37<00:23,  5.85s/it]

Epoch [6/10], Training Loss: 0.8408 Validation Loss: 0.8982 


 80%|████████  | 8/10 [00:49<00:12,  6.04s/it]

Epoch [8/10], Training Loss: 0.5551 Validation Loss: 0.5984 


100%|██████████| 10/10 [01:04<00:00,  6.43s/it]

Epoch [10/10], Training Loss: 0.3683 Validation Loss: 0.5939 
Training complete.
***Timer*** train finished in 64.273731 seconds



