Remove MLFlow Integration from Base Environment for Streamlined Configuration#421
Remove MLFlow Integration from Base Environment for Streamlined Configuration#421
Conversation
|
Why are we removing MLFlow from base @lvijnck ? It's useful even locally, I used it myself quite a lot when doing embedding benchmark work |
What exactly do you use locally in that case? For lot's of use-cases it's really in the way, as it adds a stateful component to local testing. That being said, I think we need to go for the ability to run locally either with MLFlow, or without MLFlow. Though Kedro does not support that currently. |
|
I used it mainly for comparing PCA plots and convergence plots between different Node2vec/Graphsage runs for the subsamples and once we have a sample of the data as mentioned in #348 we can also run modelling tests and compare how the modelling runs compare. Also having it locally can be good for development work, e.g. for inference pipeline I used it for extracting the model from local mlflow artifact storage. If it was up to me whether we should keep it in base, I would keep it but maybe let's discuss it in a broader group. I agree that ideal scenario would be to have a choice to run the pipeline with or without mlflow tracking |
* extract * add metadata * rm from base * add mlflow catalog entries * add * add section * add section * rm tpl
…git branch content (#427) * add CLI * rm scripts e2e * working cli commands * submit improved * new docs for submit command * fix unit tests * add cursorrules file * add notes on AI generation * Update kedro.md (#412) Formatting update in 'Dynamic pipelines' section * Update git-crypt.md (#411) Added WSL for Windows instructions to install git-crypt and gpg * Generate documentation for cross val/model selection (#390) * cross val docs * Update docs/src/data_science/model_selection.md Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> * Update docs/src/data_science/model_selection.md Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> --------- Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> * git-crypt lock -a (#417) * Add PGP key for new member (#416) * Add 1 git-crypt collaborator New collaborators: 267E0673 Mateusz Wasilewski <mateusz@everycure.org> * Add 1 git-crypt collaborator New collaborators: 4828A68F May-lim <may@everycure.org> * Fix robokop paths in catalog to work for both cloud and base (#410) * push all columns to BQ with robo * rm unneeded special cloud env robokop files * fix path in globals for test * Make namespace a parameter for e2e execution script (#413) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * introduce script for submitting workflows * update from RELEASE to RUN * cleanup * push * cleanup * add convenience script * add changes * bump * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * list transient errors * fix test * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * not usign another namespace for now * setup correct ep * fix issue chunyu * fix chunyu updates * add embeddings * move * update model * pass model correctly * more retries * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * revert endpoint * correct model * enable namespace specification for commands * fixes * Update pipelines/matrix/conf/test/globals.yml * Apply suggestions from code review Co-authored-by: Pascal Bro <pascal@everycure.org> * Apply suggestions from code review Co-authored-by: Pascal Bro <pascal@everycure.org> * bump --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> * Add 1 git-crypt collaborator (#419) New collaborators: 1030BF3E malanjary <malanjary@scripps.edu> * Implement recall@N in pipeline (#311) * first draft notebooks * first version but running into testing errors * resolved error, basic test works * fix N error on test data * tidy up test * added option for multiple values of N * minor change to class * modify n_values for testing * Update pipelines/matrix/tests/pipelines/test_evaluation.py including suggestion from alexei Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> * resolving merge conflict --------- Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> * Fix exp creation race condition (#420) * fix race condition * push fix * Apply node filtering to clinical trails/drug list/disease lists before integrating (#408) * add * add category label to synonymizatiion * add filtering * update * add updates] * revert * working setup * add neo4j checking to synonymizer * fix reporting node * rn synonymizer updates * rm comment * introduce working example * cleanup * add logging * rm write * add datashader plot * push * push * cleanup * add changes * fix file clashing * add * add * add * expand reporting * revert all dev changes * final update * final update * variable name cleanup * add dataset transcoding to docs * coalesce * fix errornous input * disable test * add link * add test * rm comment * fix error --------- Co-authored-by: piotrkan <pkaniewski998@gmail.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> * ulimit and tqdm disable in test and ci * fix lint * add username label * fix * upgrade kedro * update * add instructions * add svgs * add all * add correct img * correct * Remove MLFlow for base env (#421) * extract * add metadata * rm from base * add mlflow catalog entries * add * add section * add section * rm tpl * Fix code duplication issue in evaluation pipeline (#423) * add * add category label to synonymizatiion * add filtering * update * add updates] * revert * working setup * add neo4j checking to synonymizer * fix reporting node * rn synonymizer updates * rm comment * introduce working example * cleanup * add logging * rm write * add datashader plot * push * push * cleanup * add helper function for disease-centric matrix * moved remove pairs method * use helper function in time split method * checkout main to get rid of unwanted changes --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: piotrkan <pkaniewski998@gmail.com> * fix links --------- Co-authored-by: may-lim <may@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: Cheng-Han Chung <jchung@renci.org> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: piotrkan <pkaniewski998@gmail.com>
Description
to add
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist:
enhancementorbug)/run-testscheck at the end of PR collaboration work to execute integration tests