# Use a custom model quickly

Reuse the data(loader) from the "ML pipe"

Import the libraries needed

In [1]:
import sys
import os

If you run this Notebook on Colab, you need to install the Virtual Environment with Poetry yourself (what I understood):

In [2]:
running_in_colab = 'google.colab' in str(get_ipython())
if running_in_colab:
    print('You are running this on COLAB so installing the environment here')
    os.chdir("/content")    
    !git clone https://github.com/petteriTeikari/minivess_mlops.git
    !pip install poetry
    os.chdir("/content/minivess_mlops")
    !poetry config virtualenvs.in-project true
    # Running "!poetry install --no-ansi" is needed or not?
    !poetry install
    !poetry shell
    # https://stackoverflow.com/a/65440080/6412152
    sys.path.insert(0,'/content/minivess_mlops')
else:
    print('Assuming that you are runnign this from IDE,\n'
          'or some other environment where you have your Jupyter kernel created from Poetry files')

Assuming that you are runnign this from IDE, or some other environment where you have your Jupyter kernel created from Poetry files


Import the other modules now available from Poetry environment and from the Github repo

In [3]:
from loguru import logger

from src.run_training import parse_args_to_dict
from src.training.experiment import define_experiment_data
from src.utils.config_utils import import_config


* 'schema_extra' has been renamed to 'json_schema_extra'
PyTorch is available but CUDA is not. Defaulting to SciPy for SVD


Import helper subfunction(s)

In [4]:
def get_dataloaders(experim_dataloaders: dict):
    # Get the "validation" and "train" dataloaders from the dictionary
    fold_name = 'fold0'
    split_names = list(experim_dataloaders[fold_name].keys())
    fold_key = experim_dataloaders.get(fold_name)
    if fold_key is not None:
        try:
            train = experim_dataloaders[fold_name]['TRAIN']
            val = experim_dataloaders[fold_name]['VAL']['MINIVESS']
        except Exception as e:
            raise IOError('Could not get the dataloaders from the dictionary, error = {}'.format(e))
    else:
        raise IOError('Fold name = "{}" not found in the dataloaders dictionary'.format(fold_name))
    return train, val

Input arguments for the training (you can add all the input arguments supported by `run_training.py` here

In [5]:
input_args = ['-c', 'tutorials/train_demo']

# Fake these as coming from the command line to match the main code (run_training.py)
sys.argv = ['notebook_run']  # Jupyter has all the extra crap, so replace that with this
for sysargv in input_args:
    sys.argv.append(sysargv)
args = parse_args_to_dict()

2023-10-25 02:20:55.408 | INFO     | src.run_training:parse_args_to_dict:73 - Parsed input arguments:
2023-10-25 02:20:55.409 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   task_config_file: tutorials/train_demo
2023-10-25 02:20:55.409 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   run_mode: train
2023-10-25 02:20:55.410 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   data_dir: /mnt/minivess-dvc-cache
2023-10-25 02:20:55.410 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   output_dir: /mnt/minivess-artifacts
2023-10-25 02:20:55.411 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   s3_mount: False
2023-10-25 02:20:55.411 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   local_rank: 0
2023-10-25 02:20:55.411 | INFO     | src.utils.general_utils:print_dict_to_logger:40 -   project_name: MINIVESS_segmentation_TEST


Create the config with Hydra from the .yaml file(s)

In [6]:
config, exp_run = import_config(args=args, task_cfg_name=args['task_config_file'])

2023-10-25 02:20:55.710 | INFO     | src.utils.config_utils:hydra_import_config:89 - Initializing Hydra with config_path = "/home/petteri/PycharmProjects/minivess_mlops/notebooks/../configs", job_name = "MINIVESS_segmentation_TEST", version_base = "1.2"
2023-10-25 02:20:55.711 | INFO     | src.utils.config_utils:hydra_import_config:91 - Hydra overrides list = ['+tutorials=train_demo']
2023-10-25 02:20:55.712 | DEBUG    | src.utils.config_utils:get_mounts_from_args:409 - Getting mount names from the args
2023-10-25 02:20:55.713 | INFO     | ml_tests.mount_tests:debug_mounts:9 - Username = petteri, UID = 1000, GID = 1000
2023-10-25 02:20:55.713 | DEBUG    | ml_tests.mount_tests:debug_mounts:12 - MOUNT: /mnt/minivess-dvc-cache
2023-10-25 02:20:55.714 | DEBUG    | ml_tests.mount_tests:debug_mounts:19 -  owned by petteri:petteri (owner:group)
2023-10-25 02:20:55.715 | DEBUG    | ml_tests.mount_tests:debug_mounts:22 -  owned by 1000:1000 (owner:group)
2023-10-25 02:20:55.716 | DEBUG    | ml_

Import the dataloaders (now the data augmentations are here as well as data transformations)

In [None]:
_, _, experim_dataloaders, exp_run = (
        define_experiment_data(config=config,
                               exp_run=exp_run))

# Get the "validation" and "train" dataloaders from the dictionary
train, val = get_dataloaders(experim_dataloaders)

Now you are ready to train your new model that you just wanna quickly test without
wanting to have a battle with the config .YAML files

Add maybe some fastai demo with MLflow autologging:
[https://github.com/mlflow/mlflow/blob/master/examples/fastai/train.py](https://github.com/mlflow/mlflow/blob/master/examples/fastai/train.py)

In [None]:
# Iterate the dataloaders for demo
no_of_epochs = 3
logger.info('Training for {} epochs'.format(no_of_epochs))
for epoch in range(no_of_epochs):
    
    logger.info('Epoch {}/{}'.format(epoch, no_of_epochs - 1))

    # Train
    logger.info('train with {} batches'.format(len(train))) 
    for i, batch in enumerate(train):
        images, mask = batch['image'], batch['label']

    # Validation
    logger.info('validate with {} batches'.format(len(train))) 
    for j, batch in enumerate(val):
        images, mask = batch['image'], batch['label']

logger.info('Training done!')