Skip to content

Make Namespace Parameterizable for E2E Execution Script#413

Merged
pascalwhoop merged 132 commits intomainfrom
fixes/full-matrix-run
Sep 17, 2024
Merged

Make Namespace Parameterizable for E2E Execution Script#413
pascalwhoop merged 132 commits intomainfrom
fixes/full-matrix-run

Conversation

@pascalwhoop
Copy link
Copy Markdown
Contributor

@pascalwhoop pascalwhoop commented Sep 16, 2024

  • e2e execution script namespace as a parameter
  • fix trials data path is a tsv
  • ...

Comment thread pipelines/matrix/conf/base/embeddings/parameters.yml Outdated
Comment thread pipelines/matrix/conf/test/globals.yml
Comment thread pipelines/matrix/templates/argo_wf_spec.tmpl Outdated
Comment thread pipelines/matrix/src/matrix/pipelines/embeddings/nodes.py Outdated
Comment thread pipelines/matrix/conf/test/globals.yml Outdated
piotrkan and others added 4 commits September 17, 2024 17:18
Co-authored-by: Pascal Bro <pascal@everycure.org>
Co-authored-by: Pascal Bro <pascal@everycure.org>
@piotrkan piotrkan marked this pull request as ready for review September 17, 2024 16:21
@pascalwhoop pascalwhoop enabled auto-merge (squash) September 17, 2024 16:21
@pascalwhoop pascalwhoop merged commit 7c26e4e into main Sep 17, 2024
pascalwhoop added a commit that referenced this pull request Sep 19, 2024
* move all ARGO CD targeting to a new `infra` branch

* run CI on terraform on infra branch intsead

* add pre commit for github actions

* github actions changes to make path filtering happen at workflow level

* paths

* s

* also run infra deployment only on specific filters with dorny

* bump

* x

* checkout for infra branch

* add clone permissions

* xi

* x

* bump

* bump

* rm old matrix module

* make deploy dependent on plan

* rm file

* concurrency to 1

* move concurrency for CI

* update to target infra branch

* avoid defaul

* bump

* increase mlflow size again

* mlflow ephemeral storage bug

* x

* x

* increase mlflow size further

* pubmedbert endpoint

* added spec

* deleted obsolete file

* added quick locust for endpoints on k8s

* add tmp gateway for api

* turn on filestore driver

* turn on filestore driver

* do not run plan in env

* bump

* added project reference for gcs backend

* rm backend and provider

* cleanup

* avoid attempt to create bucket

* test different env for terraform

* try with ro user

* test jwt token permissions

* bump

* test with new filter for ref on rw user

* do not lock when planning

* avoid reading

* debug

* try breaking this

* b

* change env

* debug again

* avoid deploy for nwo

* make openai parameterized via env variable

* ignore cache directories

* parametrize endpoints in makefile

* send random number of requests in locust request

* add joblib caching and proper compliance to OAI response

* bake model into image

* gen fake data with locust

* updated system to behave as expected in scale up-down behavior

* cleanup readme

* update scaling

* introduce script for submitting workflows

* update from RELEASE to RUN

* cleanup

* push

* cleanup

* add convenience script

* add changes

* bump

* Dev/bte trapi deploy helm (#260)

* added helm chart for deploying bte-trapi locally

* changed bte-trapi.yaml template, removed bte-trapi application folder in new branch

* MLFlow to GCS (#293)

* add example

* add work

* push changes

* rm breakpoint

* call save

* rm breakpoint

* reenable save

* rm debugging stuff

* commit changes

* rm mlflow file

* rm lock file

* rm test

* allow proxying

* add the release version to path

* add changes

* rm subpath in mlflow

* Update onboarding.md

* push

* revert

* revert

* disable miniop

* rm minio user

* correct

* reenable

* set artifact location

* revert commenting

* Add 1 git-crypt collaborator (#343)

New collaborators:

	225C3B75 ahueb <alan@hueb.org>

* Update index.md (#336)

* Update index.md

Updated onboarding content with remaining information from Notion which hasn't already been pulled across

* Update docs/src/onboarding/index.md

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Update index.md

---------

Co-authored-by: Pascal Bro <pascal@everycure.org>

* New script to retry docker compose_down in CI and debug when it's having issues (#357)

* add new script to debug docker issues

* cleanup structure a bit

* Add Robokop data to ingestion pipeline (#188)

* add

* add todos

* add todo pointers

* Robokop Ingestion Pipeline

added fields for Robokop ingestion.

* update gitignore

* update ignore

* ignore idea files

* add pointers for fabrication

* Edits from Laurens comments

modified files after Laurens comments

* Cleaned up

removed KC "TODO"s. Fixed Typos

* flushing out additional columns per real robokop data

* aligning fabricator column data with schema

* renaming node name

* removing duplicate spark_csv

* updating

* reverting to String

* removing fabricator details

* using LazySparkDataset, removing schema info

* run pre-commit

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* fixing typo

* fix layers

* take subset of columns

* setting header to true

* overriding catalog due to change in raw path

* add

* fix dataset name as - is not supported by bq

* Update spark.yml

* add new node function for robokop nodes

* update

* set unit seperator

* add descriptiopn

---------

Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>
Co-authored-by: Pascal Brokmeier <pascal@everycure.org>

* Update pipelines/matrix/conf/base/fabricator/parameters.yml

---------

Co-authored-by: elliottsharp <elliott.sharp@hotmail.com>
Co-authored-by: Pascal Bro <pascal@everycure.org>
Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>

* add if statement to only debug on infra branch

* rm 2

* add default artifact root

* connect to pgsql

* connect to correct svc

* rm coc

* Update .github/ISSUE_TEMPLATE/onboarding.md

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Update pipelines/matrix/src/matrix/hooks.py

Co-authored-by: Pascal Bro <pascal@everycure.org>

* list transient errors

* fix test

* moved specs in the right places

* fix wrong reference to old namespace for httproute

* update cert ref

* update cert ref

* update API endpoint to support 2 models

* update memory requirements

* add pdb

* introduce spot based API backing

* Add 1 git-crypt collaborator

New collaborators:

	7BEAB3B9 Joe Sykora <joseph@everycure.org>

* replace from preemptible to spot

* Update infra/modules/stacks/compute_cluster/gke.tf

* Update services/pubmedbert_embeddings/README.md

* not usign another namespace for now

* setup correct ep

* fix issue chunyu

* fix chunyu updates

* add embeddings

* move

* update model

* pass model correctly

* more retries

* Add 1 git-crypt collaborator

New collaborators:

	7BEAB3B9 Joe Sykora <joseph@everycure.org>

* replace from preemptible to spot

* bump

* bump

* revert endpoint

* correct model

* enable namespace specification for commands

* fixes

* Update pipelines/matrix/conf/test/globals.yml

* Apply suggestions from code review

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Apply suggestions from code review

Co-authored-by: Pascal Bro <pascal@everycure.org>

* bump

---------

Co-authored-by: Laurens Vijnck <laurens@everycure.org>
Co-authored-by: Alan <alan@hueb.org>
Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>
Co-authored-by: elliottsharp <elliott.sharp@hotmail.com>
Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>
Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>
pascalwhoop added a commit that referenced this pull request Sep 19, 2024
…git branch content (#427)

* add CLI

* rm scripts e2e

* working cli commands

* submit improved

* new docs for submit command

* fix unit tests

* add cursorrules file

* add notes on AI generation

* Update kedro.md (#412)

Formatting update in 'Dynamic pipelines' section

* Update git-crypt.md (#411)

Added WSL for Windows instructions to install git-crypt and gpg

* Generate documentation for cross val/model selection (#390)

* cross val docs

* Update docs/src/data_science/model_selection.md

Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>

* Update docs/src/data_science/model_selection.md

Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>

---------

Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>

* git-crypt lock -a (#417)

* Add PGP key for new member (#416)

* Add 1 git-crypt collaborator

New collaborators:

	267E0673 Mateusz Wasilewski <mateusz@everycure.org>

* Add 1 git-crypt collaborator

New collaborators:

	4828A68F May-lim <may@everycure.org>

* Fix robokop paths in catalog to work for both cloud and base (#410)

* push all columns to BQ with robo

* rm unneeded special cloud env robokop files

* fix path in globals for test

* Make namespace a parameter for e2e execution script (#413)

* move all ARGO CD targeting to a new `infra` branch

* run CI on terraform on infra branch intsead

* add pre commit for github actions

* github actions changes to make path filtering happen at workflow level

* paths

* s

* also run infra deployment only on specific filters with dorny

* bump

* x

* checkout for infra branch

* add clone permissions

* xi

* x

* bump

* bump

* rm old matrix module

* make deploy dependent on plan

* rm file

* concurrency to 1

* move concurrency for CI

* update to target infra branch

* avoid defaul

* bump

* increase mlflow size again

* mlflow ephemeral storage bug

* x

* x

* increase mlflow size further

* pubmedbert endpoint

* added spec

* deleted obsolete file

* added quick locust for endpoints on k8s

* add tmp gateway for api

* turn on filestore driver

* turn on filestore driver

* do not run plan in env

* bump

* added project reference for gcs backend

* rm backend and provider

* cleanup

* avoid attempt to create bucket

* test different env for terraform

* try with ro user

* test jwt token permissions

* bump

* test with new filter for ref on rw user

* do not lock when planning

* avoid reading

* debug

* try breaking this

* b

* change env

* debug again

* avoid deploy for nwo

* make openai parameterized via env variable

* ignore cache directories

* parametrize endpoints in makefile

* send random number of requests in locust request

* add joblib caching and proper compliance to OAI response

* bake model into image

* gen fake data with locust

* updated system to behave as expected in scale up-down behavior

* cleanup readme

* update scaling

* introduce script for submitting workflows

* update from RELEASE to RUN

* cleanup

* push

* cleanup

* add convenience script

* add changes

* bump

* Dev/bte trapi deploy helm (#260)

* added helm chart for deploying bte-trapi locally

* changed bte-trapi.yaml template, removed bte-trapi application folder in new branch

* MLFlow to GCS (#293)

* add example

* add work

* push changes

* rm breakpoint

* call save

* rm breakpoint

* reenable save

* rm debugging stuff

* commit changes

* rm mlflow file

* rm lock file

* rm test

* allow proxying

* add the release version to path

* add changes

* rm subpath in mlflow

* Update onboarding.md

* push

* revert

* revert

* disable miniop

* rm minio user

* correct

* reenable

* set artifact location

* revert commenting

* Add 1 git-crypt collaborator (#343)

New collaborators:

	225C3B75 ahueb <alan@hueb.org>

* Update index.md (#336)

* Update index.md

Updated onboarding content with remaining information from Notion which hasn't already been pulled across

* Update docs/src/onboarding/index.md

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Update index.md

---------

Co-authored-by: Pascal Bro <pascal@everycure.org>

* New script to retry docker compose_down in CI and debug when it's having issues (#357)

* add new script to debug docker issues

* cleanup structure a bit

* Add Robokop data to ingestion pipeline (#188)

* add

* add todos

* add todo pointers

* Robokop Ingestion Pipeline

added fields for Robokop ingestion.

* update gitignore

* update ignore

* ignore idea files

* add pointers for fabrication

* Edits from Laurens comments

modified files after Laurens comments

* Cleaned up

removed KC "TODO"s. Fixed Typos

* flushing out additional columns per real robokop data

* aligning fabricator column data with schema

* renaming node name

* removing duplicate spark_csv

* updating

* reverting to String

* removing fabricator details

* using LazySparkDataset, removing schema info

* run pre-commit

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* Update pipelines/matrix/conf/base/ingestion/catalog.yml

Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>

* fixing typo

* fix layers

* take subset of columns

* setting header to true

* overriding catalog due to change in raw path

* add

* fix dataset name as - is not supported by bq

* Update spark.yml

* add new node function for robokop nodes

* update

* set unit seperator

* add descriptiopn

---------

Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>
Co-authored-by: Pascal Brokmeier <pascal@everycure.org>

* Update pipelines/matrix/conf/base/fabricator/parameters.yml

---------

Co-authored-by: elliottsharp <elliott.sharp@hotmail.com>
Co-authored-by: Pascal Bro <pascal@everycure.org>
Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>

* add if statement to only debug on infra branch

* rm 2

* add default artifact root

* connect to pgsql

* connect to correct svc

* rm coc

* Update .github/ISSUE_TEMPLATE/onboarding.md

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Update pipelines/matrix/src/matrix/hooks.py

Co-authored-by: Pascal Bro <pascal@everycure.org>

* list transient errors

* fix test

* moved specs in the right places

* fix wrong reference to old namespace for httproute

* update cert ref

* update cert ref

* update API endpoint to support 2 models

* update memory requirements

* add pdb

* introduce spot based API backing

* Add 1 git-crypt collaborator

New collaborators:

	7BEAB3B9 Joe Sykora <joseph@everycure.org>

* replace from preemptible to spot

* Update infra/modules/stacks/compute_cluster/gke.tf

* Update services/pubmedbert_embeddings/README.md

* not usign another namespace for now

* setup correct ep

* fix issue chunyu

* fix chunyu updates

* add embeddings

* move

* update model

* pass model correctly

* more retries

* Add 1 git-crypt collaborator

New collaborators:

	7BEAB3B9 Joe Sykora <joseph@everycure.org>

* replace from preemptible to spot

* bump

* bump

* revert endpoint

* correct model

* enable namespace specification for commands

* fixes

* Update pipelines/matrix/conf/test/globals.yml

* Apply suggestions from code review

Co-authored-by: Pascal Bro <pascal@everycure.org>

* Apply suggestions from code review

Co-authored-by: Pascal Bro <pascal@everycure.org>

* bump

---------

Co-authored-by: Laurens Vijnck <laurens@everycure.org>
Co-authored-by: Alan <alan@hueb.org>
Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>
Co-authored-by: elliottsharp <elliott.sharp@hotmail.com>
Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>
Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>

* Add 1 git-crypt collaborator (#419)

New collaborators:

	1030BF3E malanjary <malanjary@scripps.edu>

* Implement recall@N in pipeline (#311)

* first draft notebooks

* first version but running into testing errors

* resolved error, basic test works

* fix N error on test data

* tidy up test

* added option for multiple values of N

* minor change to class

* modify n_values for testing

* Update pipelines/matrix/tests/pipelines/test_evaluation.py

including suggestion from alexei

Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com>

* resolving merge conflict

---------

Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com>

* Fix exp creation race condition (#420)

* fix race condition

* push fix

* Apply node filtering to clinical trails/drug list/disease lists before integrating (#408)

* add

* add category label to synonymizatiion

* add filtering

* update

* add updates]

* revert

* working setup

* add neo4j checking to synonymizer

* fix reporting node

* rn synonymizer updates

* rm comment

* introduce working example

* cleanup

* add logging

* rm write

* add datashader plot

* push

* push

* cleanup

* add changes

* fix file clashing

* add

* add

* add

* expand reporting

* revert all dev changes

* final update

* final update

* variable name cleanup

* add dataset transcoding to docs

* coalesce

* fix errornous input

* disable test

* add link

* add test

* rm comment

* fix error

---------

Co-authored-by: piotrkan <pkaniewski998@gmail.com>
Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com>

* ulimit and tqdm disable in test and ci

* fix lint

* add username label

* fix

* upgrade kedro

* update

* add instructions

* add svgs

* add all

* add correct img

* correct

* Remove MLFlow for base env (#421)

* extract

* add metadata

* rm from base

* add mlflow catalog entries

* add

* add section

* add section

* rm tpl

* Fix code duplication issue in evaluation pipeline  (#423)

* add

* add category label to synonymizatiion

* add filtering

* update

* add updates]

* revert

* working setup

* add neo4j checking to synonymizer

* fix reporting node

* rn synonymizer updates

* rm comment

* introduce working example

* cleanup

* add logging

* rm write

* add datashader plot

* push

* push

* cleanup

* add helper function for disease-centric matrix

* moved remove pairs method

* use helper function in time split method

* checkout main to get rid of unwanted changes

---------

Co-authored-by: Laurens Vijnck <laurens@everycure.org>
Co-authored-by: piotrkan <pkaniewski998@gmail.com>

* fix links

---------

Co-authored-by: may-lim <may@everycure.org>
Co-authored-by: leelancashire <drllancashire@gmail.com>
Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com>
Co-authored-by: Cheng-Han Chung <jchung@renci.org>
Co-authored-by: Laurens Vijnck <laurens@everycure.org>
Co-authored-by: Alan <alan@hueb.org>
Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com>
Co-authored-by: elliottsharp <elliott.sharp@hotmail.com>
Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Co-authored-by: Jason Reilly <jdr0887@gmail.com>
Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com>
Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com>
Co-authored-by: piotrkan <pkaniewski998@gmail.com>
@pascalwhoop pascalwhoop changed the title Make namespace a parameter for e2e execution script Make Namespace Parameterizable for E2E Execution Script Nov 1, 2024
@pascalwhoop pascalwhoop added the enhancement improving an existing system or feature to work better. label Nov 1, 2024
@oliverw1 oliverw1 deleted the fixes/full-matrix-run branch January 21, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement improving an existing system or feature to work better.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants