You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is unlikely that the default config files will be able to cover all desired use cases for dacapo. It will be necessary to allow super users to customize as much as they would like of the dacapo pipeline. However we do want the following:
minimize repetition. Avoid having to handle the master process separately from worker processes.
maximize customization: It should be easy to replace any part of the dacapo process. Reading data, training pipeline, evaluation, post_processing, writing to db, etc. These should all be generic enough that users can replace any part.
provide a single obvious way of overriding specific behavior.
Current state:
custom local dacapo.PostProcessors are supported via config arguments. First you give the path to your post_processor_module. This module is added to the path. Then you should be able to import your custom module.
Pros:
Allows you to include any custom dacapo.PostProcessor without having to write a custom run script.
Cons:
Modifying the sys.path variable and depending on local files seems like it could get unwieldy with naming conflicts when overwriting many parts of the framework.
Local files may not be available for workers. Might need extra work such as configuring mount directories etc.
Sharing project requires copying the python environment, local code, and configurations.
Proposal #1:
Functional interface, something such as:
Easy to replace any part that is exposed in run_all.
Could run the exact same script as master and as worker, it would then be job of dacapo to figure out when run_all was called by the master or by the workers (probably via environment variable).
User has full control. Only depends on local files if user wants it to. Customization could be very minimal, i.e. a single run.py script containing custom code.
Cons:
Custom code requires a custom run script. Everyone who wants to do something similar (or every similar dacapo project) would need at least a copy of the same boilerplate run.py script.
Confusing function names. Every worker would call run_all, but then only run a single specific configuration.
Proposal #2:
Plugin system: documentation for python package plugin systems here.
Pros:
Sharing your environment and configurations is enough.
Easier to enforce some structure on custom code, making backwards compatibility easier going forward.
No custom script required. Could make a command-line tool that can handle more than just the default setup.
Cons.
Higher barrier to user customization. (Could probably be alleviated through providing templates)
The text was updated successfully, but these errors were encountered:
It is unlikely that the default config files will be able to cover all desired use cases for dacapo. It will be necessary to allow super users to customize as much as they would like of the dacapo pipeline. However we do want the following:
Current state:
custom local
dacapo.PostProcessors
are supported via config arguments. First you give the path to yourpost_processor_module
. This module is added to the path. Then you should be able to import your custom module.Pros:
dacapo.PostProcessor
without having to write a custom run script.Cons:
sys.path
variable and depending on local files seems like it could get unwieldy with naming conflicts when overwriting many parts of the framework.Proposal #1:
Functional interface, something such as:
Pros:
run_all
.run_all
was called by the master or by the workers (probably via environment variable).run.py
script containing custom code.Cons:
run.py
script.run_all
, but then only run a single specific configuration.Proposal #2:
Plugin system: documentation for python package plugin systems here.
Pros:
Cons.
The text was updated successfully, but these errors were encountered: