Custom Code Injection #1

pattonw · 2020-09-16T13:52:49Z

It is unlikely that the default config files will be able to cover all desired use cases for dacapo. It will be necessary to allow super users to customize as much as they would like of the dacapo pipeline. However we do want the following:

minimize repetition. Avoid having to handle the master process separately from worker processes.
maximize customization: It should be easy to replace any part of the dacapo process. Reading data, training pipeline, evaluation, post_processing, writing to db, etc. These should all be generic enough that users can replace any part.
provide a single obvious way of overriding specific behavior.

Current state:
custom local dacapo.PostProcessors are supported via config arguments. First you give the path to your post_processor_module. This module is added to the path. Then you should be able to import your custom module.
Pros:

Allows you to include any custom dacapo.PostProcessor without having to write a custom run script.
Cons:
Modifying the sys.path variable and depending on local files seems like it could get unwieldy with naming conflicts when overwriting many parts of the framework.
Local files may not be available for workers. Might need extra work such as configuring mount directories etc.
Sharing project requires copying the python environment, local code, and configurations.

Proposal #1:
Functional interface, something such as:

def run_all(
    data_configs,
    model_configs,
    post_processor_configs,
    ...,
    task_configs,
    train_iteration = None, # Optional generator that yields training batches
    post_processor = None, # Optional dacapo.PostProcessor
    loss = None, # Optional torch.nn.Module
    ...,
    data = None, # Optional dacapo.Data
):
    ...

Pros:

Easy to replace any part that is exposed in run_all.
Could run the exact same script as master and as worker, it would then be job of dacapo to figure out when run_all was called by the master or by the workers (probably via environment variable).
User has full control. Only depends on local files if user wants it to. Customization could be very minimal, i.e. a single run.py script containing custom code.
Cons:
Custom code requires a custom run script. Everyone who wants to do something similar (or every similar dacapo project) would need at least a copy of the same boilerplate run.py script.
Confusing function names. Every worker would call run_all, but then only run a single specific configuration.

Proposal #2:
Plugin system: documentation for python package plugin systems here.
Pros:

Sharing your environment and configurations is enough.
Easier to enforce some structure on custom code, making backwards compatibility easier going forward.
No custom script required. Could make a command-line tool that can handle more than just the default setup.
Cons.
Higher barrier to user customization. (Could probably be alleviated through providing templates)

The text was updated successfully, but these errors were encountered:

Wills changes

pattonw mentioned this issue Sep 16, 2020

Road map #7

Open

14 tasks

pattonw self-assigned this Sep 22, 2020

pattonw added a commit that referenced this issue Mar 24, 2021

Merge pull request #1 from pattonw/wills-changes

806c325

Wills changes

pattonw pushed a commit to e11bio/dacapo that referenced this issue Feb 14, 2024

Attention unet (funkelab#1)

3b908a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Code Injection #1

Custom Code Injection #1

pattonw commented Sep 16, 2020

Custom Code Injection #1

Custom Code Injection #1

Comments

pattonw commented Sep 16, 2020