# Causal Knowledge Transfer for Safe Reinforcement Learning

### Train Agents
#### Sumo Network File
When editing the sumo network (nets/simple_unprotected_right.net.xml) never edit the xml directly. Instead, go to nets/netconfig and make desired changes there. Generate the new net.xml by executing generate config.sh
#### Sumo Route File
This is part of the generate_config.sh now.
#### Reward Function
* TODO: Come up with a fitting reward function that penalises collisions
#### RL Training
* TODO: Come up with Hyperparameters for the training loop

**Desired Output: Trained_Model.zip**

### Creating the Sumo Environment

In [None]:
# Make sure SUMO_HOME is set!
sumo_home = %env SUMO_HOME
from pathlib import Path

FRICTION = 1
END_TIME = 3600
REPEAT_PERIOD = 10

config_directory = Path().joinpath('nets', 'simple_unprotected_right')
config_files = {
    'netccfg': config_directory.joinpath('simple_unprotected_right.netccfg'),
    'duarcfg': config_directory.joinpath('simple_unprotected_right.duarcfg'),
    'net.xml': config_directory.joinpath('simple_unprotected_right.net.xml'),
    'rou.xml': config_directory.joinpath('simple_unprotected_right.rou.xml'),
    'routes.rou.xml': config_directory.joinpath('routes.rou.xml'),
    'config.rou.xml': config_directory.joinpath('config.rou.xml'),
}

findAllRoutes = Path(sumo_home).joinpath('tools', 'findAllRoutes.py')
vehicle2flow = Path(sumo_home).joinpath('tools', 'route', 'vehicle2flow.py')

! netconvert --configuration-file {config_files['netccfg']} --default.friction {FRICTION}

! python {findAllRoutes} -n {config_files['net.xml']} -o {config_files['routes.rou.xml']} -s southJunction,westJunction -t junctionEast,junctionNorth

! duarouter --configuration-file {config_files['duarcfg']}

! python {vehicle2flow} {config_files['config.rou.xml']} -o {config_files['rou.xml']} -e {END_TIME} -r {REPEAT_PERIOD}


In [None]:
from env.SumoEnvironmentGenerator import SumoEnvironmentGenerator
from pathlib import Path

environments = SumoEnvironmentGenerator(
    net_file=str(Path().joinpath('nets', 'simple_unprotected_right', 'simple_unprotected_right.net.xml')),
    route_file=str(Path().joinpath('nets', 'simple_unprotected_right', 'simple_unprotected_right.rou.xml')),
    sumocfg_file=str(Path().joinpath('nets', 'simple_unprotected_right', 'simple_unprotected_right.sumocfg')),
    duration=3600,
    learning_data_csv_name=str(Path().joinpath('env', 'training_data', 'output.csv')),
)

### Training and saving the Model

In [None]:
from stable_baselines3.a2c import A2C

%load_ext tensorboard
env = environments.get_training_env()
model = A2C(
    env=env,
    policy='MlpPolicy',
    # learning_rate=0.001,
    # learning_starts=0,
    # train_freq=1,
    # target_update_interval=500,
    # exploration_fraction=0.05,
    # exploration_final_eps=0.01,
    verbose=1,
    tensorboard_log='dqn_sumo_tensorboard'
)
model.learn(100_000, tb_log_name='a2c')
model.save(Path().joinpath('env', 'training_data', 'a2c'))

Giving the model a test run in an evaluation environment

In [None]:
from stable_baselines3.a2c import A2C

env = environments.get_demonstration_env()
model = A2C(env=env, policy='MlpPolicy').load(Path().joinpath('env', 'training_data', 'a2c'))

obs, info = env.reset()
done = False
while not done:
    action, _state = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
env.close()

### Produce Traces
Run the simulation repeatably to produce traces (data) for causal discovery.

#### Data Selection
TODO: Select which columns we want to do Causal Discovery on
#### Data Summary
TODO: Incorporate old data summary script

**Desired Output: One CSV File containing all interesting data**


In [None]:
import numpy as np
from stable_baselines3.a2c import A2C

simulation_output_path = Path().joinpath('data', 'a2c_50_f0.5')
Path.mkdir(simulation_output_path, parents=True, exist_ok=True)

rng = np.random.default_rng()
random_friction_values = rng.uniform(0.45, 0.55, size=100)

for experiment, friction_coefficient in enumerate(random_friction_values):
    env = environments.get_generation_env(output_prefix=str(simulation_output_path.joinpath(str(experiment).zfill(4))))
    model = A2C(env=env, policy='MlpPolicy').load(Path().joinpath('env', 'training_data', 'a2c'))

    obs, info = env.reset()
    # Revisit Friction calculation
    vehicletype = env.sumo.vehicletype
    vehicletype.setDecel('car50Custom', vehicletype.getDecel('car50Custom') * friction_coefficient)
    vehicletype.setEmergencyDecel('car50Custom', vehicletype.getEmergencyDecel('car50Custom') * friction_coefficient)
    for traffic_signal in env.traffic_signals.values():
        for lane in traffic_signal.lanes:
            traffic_signal.sumo.lane.setParameter(lane, 'frictionCoefficient', friction_coefficient)

    done = False
    while not done:
        action, _state = model.predict(obs, deterministic=True)
        obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
    env.close()

#### Data Summary

In [None]:
import glob
from pathlib import Path
import xml.etree.ElementTree as ElementTree
import pandas as pd

data_folder = Path().joinpath('data')
experiments = glob.glob(str(data_folder.joinpath('*')))

for experiment in experiments:
    experiment_path = Path(experiment)
    statistics_files = glob.glob(str(experiment_path.joinpath('*_statistics.xml')))
    ids = [path.split('/')[-1].split('_')[0] for path in statistics_files]

    data = []
    for id in ids:
        statistics_file = experiment_path.joinpath(id + '_statistics.xml')
        collisions_file = experiment_path.joinpath(id + '_collisions.xml')
        statistics_xml = ElementTree.parse(statistics_file).getroot()
        collisions_xml = ElementTree.parse(collisions_file).getroot()

        row = {
            'experiment': experiment,
            'index': int(id),
            'desiredSpeed': 50
        }

        for key, value in {**statistics_xml.find('vehicleTripStatistics').attrib,
                           **statistics_xml.find('safety').attrib}.items():
            match key:
                case 'count' | 'emergencyStops' | 'emergencyBraking':
                    row[key] = int(value)
                case 'collisions':
                    row['rearEndCollisions'] = sum(
                        'southEast' in child.attrib.get('victim') for child in collisions_xml)
                    row['lateralCollisions'] = sum(
                        'southEast' in child.attrib.get('collider') for child in collisions_xml)
                    row[key] = row['rearEndCollisions'] + row['lateralCollisions']
                case _:
                    row[key] = float(value)

        data.append(row)

    df = pd.DataFrame(data)
    df.to_csv(experiment_path.joinpath('.summary.csv'), index=False)

In [None]:
import pandas as pd
from pathlib import Path
import seaborn as sns

data_folder = Path().joinpath('data')

data_full_friction = pd.read_csv(data_folder.joinpath('a2c_50', '.summary.csv'))
data_half_friction = pd.read_csv(data_folder.joinpath('a2c_50_f0.5', '.summary.csv'))
data_low_speed = pd.read_csv(data_folder.joinpath('a2cs30', '.summary.csv'))

data = pd.concat([data_full_friction, data_half_friction, data_low_speed], ignore_index=True)

sns.displot(data=data, label='Collisions', x='collisions', hue='experiment', kind='kde')
sns.displot(data=data, label='Collisions', x='rearEndCollisions', hue='experiment', kind='kde')
sns.displot(data=data, label='Collisions', x='lateralCollisions', hue='experiment', kind='kde')
sns.displot(data=data, label='Collisions', x='speed', hue='experiment', kind='kde')

### Causal Discovery
Discover causal graph

TODO: Decide which discovery algorithm to use

TODO: Figure out how to incorporate R code in Jupyter Notebook

**Desired Output: Causal Graph XML File**

In [None]:
# TODO

### Fit MLMs
Fit MLMs based on Causal Discovery Graph

TODO: Parse Graph XML into MLM parameters / formulae

**Desired Output: MLM**

In [None]:
# TODO

### Produce Interventions

#### Covariate Shift Distribution
* Create a distribution for the covariate (friction) shift
* Sample from distribution
    * Fulfill Assumption: sparse sample data is representative for covariate shift ground truth
* Produce Traces for sparse input data

#### Crank MLM the other way
* Calculate Intervention Distribution by inputting sparse data into MLM

**Desired Output: Intervention Distribution**

In [None]:
# TODO

### Generate Posterior Distributions
TODO: Generate Posterior Distributions without intervention

TODO: Generate Posterior Distributions with intervention

**Desired Output: Two XML Files**

In [None]:
# TODO

### Query
Compare Distributions and decide, which part of the model to retrain.

TODO: Classify the data / model in parts

In [None]:
# TODO

### Evaluation

#### Agent
compare new resulting agent (partially continued training depending on Query) to:
* Old agent (Lower performance bound)
* Completely newly trained agent (upper performance bound)
* (New Agent that is trained completely on new data (without Query))

#### Intervention
Function: Number of Covariate Shift Samples --> Wasserstein distance: Intervention vs. ground truth (distribution)

#### MLM
* Wasserstein Distance: Effect of Intervention vs. ground truth effect
* Maybe also as a function of the number of retrain samples



In [None]:
# TODO

### Ideas
* Maybe no change in friction but rather only in the requirements
* Sophisticated Query: Causal Graph of Transfer Learning --> Generate Posterior for different transfer learning options --> Rank and choose best.
* Collision / Penalty Factor for managing Safety/Performance tradeoff