# Input Drift Experiment Outline

*Needs to be double checked*

**Goal**: To identify drift from streams of unseen chest x-ray images

**Mehods:**

1. **Statistical drift detection:** Distance-based measure like Maximum Mean Discrepancy (MMD) to determine the separation between training (reference) data and unseen chest x-rays: 
   - For image data, we will reduce the dimensionality before running the statistical test. 
   - We then run standard CheXpert pre-processing steps and train a drift detector. 
   - We detect data drift by predicting on a batch of x-ray images (spread out over a pre-defined period of time). 
   - We return the **p-value and the threshold** of the test that results in a drift declaration.


2. **Artificial Neural Network (ANN) based drift detection:** Train an autoencoder to learn how to efficiently compress and encode reference data:
   - The AE detector tries to reconstruct the input it receives.
   -  If the unseen, input x-ray cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier (drift).
   - The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.


**Requirements:**

- CheXpert training data (reference data)
- Padchest filtered/curated data (new data to be probed for drift)
- Alibi Detect Python library (package with boilerplate code to facilitate methods)

In [3]:
from pathlib import Path

import azureml
from IPython.display import display, Markdown
from azureml.core import Datastore, Experiment, ScriptRunConfig, Workspace, RunConfiguration
from azureml.core.dataset import Dataset
from azureml.core.environment import Environment
from azureml.core.runconfig import DockerConfiguration
from azureml.exceptions import UserErrorException

from model_drift import settings

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cloudpickle 2.0.0 (d:\code\mlopsday2\medimaging-modeldriftmonitoring\.venv\lib\site-packages), Requirement.parse('cloudpickle<2.0.0,>=1.1.0'), {'azureml-dataprep'}).


Azure ML SDK Version:  1.34.0


In [2]:
# Connect to workspace
subscription_id = '9ca8df1a-bf40-49c6-a13f-66b72a85f43c'
resource_group = 'MLOps-Prototype'
workspace_name = 'MLOps_shared'

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

print("Workspace:", ws.name)

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Library configuration succeeded
Workspace: MLOps_shared


In [3]:
# Get Dataset
try:
    chex_dataset = Dataset.get_by_name(ws, name='chexpert')
    print('Found existing dataset')
except UserErrorException:
    # Import a FileDataset pointing to files in 'png' folder and its subfolders recursively
    datastore = Datastore.get(ws, 'chexpert')
    datastore_paths = [(datastore, '/')]
    chex_dataset = Dataset.File.from_files(path=datastore_paths)
    print('Created Dataset')

Found existing dataset


In [8]:
env_name = "vae"

environment_file = settings.CONDA_ENVIRONMENT_FILE
project_dir = Path("./experiment")
pytorch_env = Environment.from_conda_specification(env_name, file_path =str(environment_file))

pytorch_env.register(workspace=ws)
build = pytorch_env.build(workspace=ws)

pytorch_env.environment_variables["RSLEX_DIRECT_VOLUME_MOUNT"] = "True"

In [11]:
# Copy data files into experiement directory
import pandas as pd
import shutil

valid_fn = settings.CHEXPERT_VALID_CSV
train_fn = settings.CHEXPERT_TRAIN_CSV

shutil.copy(valid_fn, project_dir.joinpath("valid.csv"))
shutil.copy(train_fn, project_dir.joinpath("train.csv"))

df = pd.read_csv(train_fn)
df = df[df['Frontal/Lateral'] != 'Lateral']
df.to_csv(project_dir.joinpath("train-frontal-only.csv"), index=False)


In [117]:
dbg = False

log_refresh_rate = 25
if dbg:
    log_refresh_rate = 2

# Name experiement
experiment_name = 'chexpert-vae' if not dbg else 'chexpert-vae-dbg'
exp = Experiment(workspace=ws, name=experiment_name)

print("Experiment:", exp.name)
print("Environment:", pytorch_env.name)

run_config = RunConfiguration()

run_config.environment = pytorch_env
run_config.docker = DockerConfiguration(use_docker=True, shm_size="100G")
run_config.target = cluster_name


args = [
    '--data_folder', chex_dataset.as_named_input('chexpertv1').as_mount(),
    '--run_azure', 1,
    '--output_dir', './outputs',

    '--batch_size', 32,
    "--base_lr", 1e-4,

    "--image_size", 128,
    '--max_epochs', 50 if not dbg else 5,
    '--num_workers', -1,

    '--progress_bar_refresh_rate', log_refresh_rate,
    "--log_every_n_steps", log_refresh_rate,
    "--flush_logs_every_n_steps", log_refresh_rate,
    "--accelerator", "ddp",
    "--channels", 1,

    "--step_size", 3,
    "--lr_scheduler", "plateau",

    "--auto_scale_batch_size", False,
    "--auto_lr_find", False,
    "--train_csv", "train-frontal-only.csv",

    "--width", 320,
    "--z", 64,
    "--layer_count", 3,

    "--terminate_on_nan", True,

    "--log_recon_images", 32
    ]

if dbg:

    args += [
        '--limit_train_batches', 5,
    ]


config = ScriptRunConfig(
    source_directory = str(project_dir), 
    script = "train.py",
    arguments=args,
)

config.run_config = run_config

Experiment: chexpert-vae
Compute Target: nc24-uswest2
Environment: vae


In [118]:
config.run_config.target = "nc24-uswest2"

run = exp.submit(config)
display(Markdown(f"""
- Experiement: [{run.experiment.name}]({run.experiment.get_portal_url()})
- Run: [{run.display_name}]({run.get_portal_url()})
- Target: {config.run_config.target}
"""))

Submitting d:\Code\MLOpsDay2\ModelMonitoring\Model_Monitoring\1.Experiment1_input_analysis\image_based_drift\experiment directory for run. The size of the directory >= 25 MB, so it can take a few minutes.



- Experiement: [chexpert-vae](https://ml.azure.com/experiments/chexpert-vae?wsid=/subscriptions/9ca8df1a-bf40-49c6-a13f-66b72a85f43c/resourcegroups/MLOps-Prototype/workspaces/MLOps_shared&tid=72f988bf-86f1-41af-91ab-2d7cd011db47)
- Run: [dreamy_yogurt_j14cm892](https://ml.azure.com/runs/chexpert-vae_1632705750_17ad5846?wsid=/subscriptions/9ca8df1a-bf40-49c6-a13f-66b72a85f43c/resourcegroups/MLOps-Prototype/workspaces/MLOps_shared&tid=72f988bf-86f1-41af-91ab-2d7cd011db47)


## Explain hyper drive
TODO

In [121]:
from azureml.train.hyperdrive import GridParameterSampling, RandomParameterSampling, BanditPolicy, HyperDriveConfig, uniform, PrimaryMetricGoal, choice, loguniform
run_config = RunConfiguration()

cluster_name = "NC24s-v2-usw2-lp"

run_config.environment = pytorch_env
run_config.docker = DockerConfiguration(use_docker=True, shm_size="100G")
run_config.target = cluster_name


param_sampling = RandomParameterSampling(
    {
        "layer": choice(3, 4),
        "batch_size": choice(16, 32),
        "image_size": choice(128,256),
        "z": choice(32, 64, 128),
        "width": choice(160, 240, 320),
        "base_lr": choice(1e-4, 1e-5, 1e-6),
        "kl_coeff": choice(0.1)
    }
)

experiment_name = 'chexpert-vae-tune'
exp = Experiment(workspace=ws, name=experiment_name)
config.run_config = run_config
hyperdrive_config = HyperDriveConfig(run_config=config,
                                     hyperparameter_sampling=param_sampling, 
                                     policy=None,
                                     primary_metric_name='val/recon_loss_frontal',
                                     primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                                     max_total_runs=6*12,
                                     max_concurrent_runs=6)

In [122]:
# start the HyperDrive run
hyperdrive_run = exp.submit(hyperdrive_config)
display(Markdown(f"""
- Experiement: [{hyperdrive_run.experiment.name}]({hyperdrive_run.experiment.get_portal_url()})
- Run: [{hyperdrive_run.display_name}]({hyperdrive_run.get_portal_url()})
- Target: {config.run_config.target}
"""))


- Experiement: [chexpert-vae-tune](https://ml.azure.com/experiments/chexpert-vae-tune?wsid=/subscriptions/9ca8df1a-bf40-49c6-a13f-66b72a85f43c/resourcegroups/MLOps-Prototype/workspaces/MLOps_shared&tid=72f988bf-86f1-41af-91ab-2d7cd011db47)
- Run: [sleepy_leg_4q5lsqsg](https://ml.azure.com/runs/HD_bdd55aa3-b6c1-4a43-89b8-c5b00b22c4a3?wsid=/subscriptions/9ca8df1a-bf40-49c6-a13f-66b72a85f43c/resourcegroups/MLOps-Prototype/workspaces/MLOps_shared&tid=72f988bf-86f1-41af-91ab-2d7cd011db47)
