Conversation
cschreep
left a comment
There was a problem hiding this comment.
One major component is multi-tenancy. We need to take a config variable (e.g. env var ROGER_DATA_SOURCE={bdc,nida,sparc}) and execute conditional logic in the pipeline to ensure that the correct data, parser, etc. are used.
|
to expand on Carl's idea a little bit , i think we might want to maybe formulate this into a meta data. as a case study if examine the pattern used in get_kgx_files in the tranql-translate pipeline. There we have two main variables that drive the tranql-translate graph :
and by changing ROGER_KGX_DATASET_VERSION we can build a completly new graph with out code change. Similarly if we devise a meta data yaml or some sort of versioned input files such as nida that would be used by the annotate pipeline (have get_db_gap files or the get_topmed_files tasks be controlled) , in a similar pattern we could probably avoid having to build multiple pipelines for getting datasets. I think this pattern has the advantage of
|
|
Based on our discussion from earlier I think we are good on this PR, further modifications on meta-data based approach maybe handled on a separate branch. |
|
We can't merge this PR since it isn't backwards-compatible with BDC, which we will be deploying to a new namespace in the very near future. |
|
Here are is summary of the changes : From config side we have the following changes When doing annotation dug_inputs config is looked up to match metadata.yaml for datasets in dug_inputs matching the version and the names specified in <annotation_base_data_uri>/<dug_inputs.dataset_version>/ The above applies similarly to kgx files. For environment variable We can do |
No description provided.