Add IAM as terraform module for code centric IAM management of the project by alexeistepa · Pull Request #628 · everycure-org/matrix

alexeistepa · 2024-11-07T13:50:50Z

Add IAM module for terraform so data science can do kedro submit

pascalwhoop · 2024-11-07T20:49:12Z

Surprised the infra pipeline isn't triggered now 🤔

pascalwhoop · 2024-11-07T20:49:47Z

Ah target branch needs to be "infra"

* Bugfix/gpu resources (#621) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * Update argo-workflows.yaml * point at diff branch * allow 7687 TLS traffic * allow all namespaces to expose neo4j port * gateway only supports 443 and 80 and 8080 * Update infra/argo/applications/dev-namespaces/values.yaml Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> * Release article for v0.2.2 (#611) * require PR description going forward * add stale catcher * push latest * Apply suggestions from code review Co-authored-by: leelancashire <drllancashire@gmail.com> * bump * update post * get done --------- Co-authored-by: leelancashire <drllancashire@gmail.com> * hotfix: documentation deployment failed to load OIDC token * Add test for missing entries in Kedro catalog & remove unused entries (#600) * remove unused datasets * add unit test for unused kedro catalog entries * remove commented out code, create new data_release_with_embeddings pipeline * fix embeddings pipeline * Revert "remove commented out code, create new data_release_with_embeddings pipeline" This reverts commit 2e220eb. * comment out unused dataset * Feat/improve argo workflow submission (#565) * refactor pipeline registry * split pipelines into separate modeeling and embeddings steps * split pipelines into separate modeeling and embeddings steps * add unit tests for pipeline registry * fix unit tests for pipeline registry * add missing install statement * trim down pipeline number * formatting * fix pipeline * remove unused comments * refactor pipeline registry * argo CD refactor * rename argo.py to test_argo.py * refactor argo test * remove Argo-specific CLI * rename _generate_argo_config to generate_argo_config * add templates to ignored * refactor and add more submission tests * sort import statements * fix types in test_argo * fix bug in FusableNode * extend gitignore: * add missing TODOs * change dir structure in CLI tests * remove unused comments * add pycov to dependencies * add missing type annotation * add unit tests for _get_feed_dict * add unit tests for run * extract pipeline initialization one function up * fix pipeline / mock usage in test_run * fix all mock usages in test_run * refactor run.py, extracting run functionality to a separate function * continue refactor of run.py * fix test_run_basic * fix test_run_with_fabricator_env_error test * add more tests * intermediate stage of extending unit test coverage * fix entire function * remove unused test * add TODOs and skips to failing tests * add full argo template generation test * fix missing name in template * register integration mark * add tests for .yaml config * fix types & add test_resourc_root fixture * make fixture names identical to function names * add missing fixtures * continue fixes to argo template submission * fix argo template test * refactor argo template generation test * refactor argo template generation test * fix assert statements in argo worfflow template generator * remove argo_node_spec * add complete template to argo config test * add missing test statements * add neo4j container * add sample template * add comments, and minor refactor * add TODOs, refactor submit to a separate function * add comment explaining how params are passed * amend argo template * add unit test for submission of workflow * fix submit function * add tests for job submission * add separate test for pipeline objects * switch formatting * continue fixing the submission test * improve submission test * adjust test and submission to test for pipeline dict * update submit * update submit * submit changes * uncomment fixture * fix fixture * unskip multi-pipeline test * add tests for verbose and dry run * fix bad comment pattern in template * add cloud alongside test to pipelines * improve formatting in argo sugmnit commands * extend submit options with separate settings for submission and triggering of pipelines. * submit refactor * argo test fix * fix tests * improve documentation * fix mock_dependencies fixture * add tests for save argo config * improve type checking * refactor location of project bootstrap code * add extensive _submit test * final fix to _submit tests * add pytest to deps * add template * replace template with non-defunct one * continue refactoring... * consistently use hyphens in CLI * add argo screenshots * add argo glossary * rename argo_glossary to glossary * add documentation on local argo workflows * add argo docs to config * fix documentation on kedro submit usage * Update docs/src/infrastructure/argo_workflows_locally.md Co-authored-by: Pascal Bro <pascal@everycure.org> * aalign with main * refactor to add run / release schema * test fix * finish pipeline refactor * add spark tests * add spark tags * start refactor of paths in globals in base * update globals * remove comments form base * improve comments in cloud setup * replace int with integration in paths * save evaluation state * fix paths in embeddings * fix paths in matrix generation * update catalog in modelling * debug failing test * add new path structure to test * update lock * fix links to resources in docs * add --load to Makefile * update dependencies * update Kedro catalog to reflect the changed (release- and run- based) architecture of pipeline * update pipelines in accordance with new Kedro Data Catalog structure * remove CLI changes from this PR * add tests for pipeline registry * add missing test fixture * remove commented out paths * reqs update * align version with main * move metric.yml to the modelling dir * in matrix_generation, retain model name * adjust evaluation layers to write to evaluation layer exclusively * change dataset names in catalog * fix name of the dataset used in the pipeline * rename metric so sanity_metrics.yml * rename raws in test env * remove obsolete test dir structure * add separate release and modelling test pipelines * set run and release names in test config to test_run and test_release * swap explicit bucket name for a variable one * add runs / releases to globals for base * remove unnecessary comments * add source_gcs_bucker * use kedro_data as kedro data dir consistently * remove unused commnts * reduce code reuse by using paths from global * test paths are now identical with base * fix file names in globals * change kedro_data to kedro * add changes to globals * save pipeline registry * update tests for pipeline registry * change defaults from local to default-run/release-name * improve formatting in globals * pull release_name from env variables * hardcode relases * minor refactor in globals paths * remove obsolete paths from documentation * fix dataset name * rename evaluation result dataset * remove unused datasets * use modelling rather than embedding path * restore dataset name * add layers to catalog * add layers to filesystem * add feat to embeddings * Rename evaluation.{model}.{evaluation}.result to evaluation.{model}.{evaluation}.model_output.result for consistency with layering system * finish adding layers to Kedro catalog * restore create_pipeline as default name for pipeline * fix name of the pipeline * restor previous structure of pipeline registry * move data to ingestion * fix paths * add ingestion as separate directory * restore AMD in Makefile * minor fixes * restore old pipeline naming system * remove obsolete code from example * restore name pipeline * add test for run name sanitization mechanism * fix run name sanitization * fix failing unit tests for CLI * fix test * delete template * parametrize test for feed dict * remove pipeline list submission * move part of test * simplify test * simplify argo workflow template generation test * fix doc of submit * clarify doc * restore old path to mlflow * Update pipelines/matrix/tests/test_argo.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> --------- Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * add gds secret to workflows namespace * Enable GPU nodes on kubernetes cluster (#598) * add gpu node pools * remove commented out spot nodes * fix accelerator counts and max node counts * add gpu node pool label * fix node pool to g2-standard-16-l4-nodes * add TODO * set labels on GPU / non-GPU nodes * ensure workflow template has negative affinity * Bump commit * Update infra/modules/stacks/compute_cluster/gke.tf Co-authored-by: Pascal Bro <pascal@everycure.org> * bump * test * bump * bump * bump * bump * bump --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * change disk type to pd-ssd --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> * Bugfix/gpu fix 2 (#635) * update service account PK (#604) * Docs updates for improved onboarding flow and pipeline overview * init * initial edits to embeddings and modeling * further updates * added time split * add sentence on frequent flyers * include PMBert * changes * missing lnk removed * missing file adde * wip * bump * data-api --------- Co-authored-by: leelancashire <lee@everycure.org> * hotfix: documentation update for presentation * set explicit disk type for GPU nodes --------- Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: leelancashire <lee@everycure.org> * Add IAM as terraform module for code centric IAM management of the project (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * enable external IP for neo4j * bump * point at different branch * bump * add neo4j auth secet * fix syntax * add moa app * give data minded permission to administer the cluster (#721) * missing line * use new branch for MoA viewer * add data release app * update ns * revert * revert * add secret * purge contractor name - change already applied on different branch (#739) * add prom-graf-stack * enable metrics for argo workflows * start developing on grafana for argo-workflows * setup http forwarding again * Extend engineering permissions (#749) * extend gcp permissions for the tech team * add bq permissions * update notebooks role * right size infrastructure * enable vertical pod autoscaling * adjustments in our node configs to be more cost efficient * adjustments in our node configs to be more cost efficient * feat: grant iam.workloadIdentityPoolAdmin to tech_team_group (#760) This should allow those in the group to be able to modify workloadIdentity Federation, which a.o. things is required to get GitHub Actions from non-main and non-infra branches to run the authentication flow. * improved way of applying grafana * Avoid overwriting raw data with fabricator pipeline (#554) * avoid overwriting raw common error text * solves people having permission to overwrite raw data * working argocd in https * debug: allow the tech team to impersonate service accounts (#768) * debug: allow the tech team to impersonate service accounts * Allow set of contributors to merge PRs to infra * Roles modification to test Gemini call (#774) * adjust way we pass in insecure flag * Add argo deployment of kg-dashboard pointing at development branch (#782) Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Big memory /cost optimized nodes (#767) * add new group types * big instances get ssds * fix https redirect and insecure flag for argo * fix path to branch from Kevin * Revert the changes on permission (#779) * remove ml.admin and aiplatform.admin, add ml.developer role to test gemini call * only modify the permissions on dataminded team * revert the changes on permissions --------- Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Pascal Bro <pascal@everycure.org> * Enable Neo4J endpoint for all releases (#803) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j --------- Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * update to main * move over * update ci * Add MoA visualizer service (#712) * add deployment * update name : * setup route * change protocol * use cluster ip * add moa_vis to repo * create init script * remove suffix arg * fix env variables * fix env variables * setup init contaier * add deployment comments * push ci * push ci * switch to visualizer * add step with correct permissions * correct paths * fix paths * try fix * fix typo * fix typo * remove unused paths * clean comment * fix makefile * hook up img * fix imports * update init script * update init script * redeploy * redeploy * use env vars * fix env * fix templating * update table names * update sql query * update image tag * tag name * change the correct image tag * use correct ing * update to new data * change path to data * fix path * reintroduce assets * correct path * update filename * add to streamlit * switch to pydantic * fix display cols * remove caching and feedback col * fix conflicts * try symlink * fix interpolation * add files * fix issues * fix ci * fix template * setup correct entrypoint * rm old rs * add base settings * fix init script * fix types * add image pull policy * add gs * add gs * img path as string * bump version * pydantic testing * attempt fixes * push changes * push changes * fix settings * add image version * sync vars * push * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * move app version on page (#831) * clean up and move app version * bump v * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> * fix target revisions for various deployments --------- Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Daniel Rhodes <14894770+drhodesbrc@users.noreply.github.com>

…s observability (#834) * Bugfix/gpu resources (#621) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * Update argo-workflows.yaml * point at diff branch * allow 7687 TLS traffic * allow all namespaces to expose neo4j port * gateway only supports 443 and 80 and 8080 * Update infra/argo/applications/dev-namespaces/values.yaml Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> * Release article for v0.2.2 (#611) * require PR description going forward * add stale catcher * push latest * Apply suggestions from code review Co-authored-by: leelancashire <drllancashire@gmail.com> * bump * update post * get done --------- Co-authored-by: leelancashire <drllancashire@gmail.com> * hotfix: documentation deployment failed to load OIDC token * Add test for missing entries in Kedro catalog & remove unused entries (#600) * remove unused datasets * add unit test for unused kedro catalog entries * remove commented out code, create new data_release_with_embeddings pipeline * fix embeddings pipeline * Revert "remove commented out code, create new data_release_with_embeddings pipeline" This reverts commit 2e220eb. * comment out unused dataset * Feat/improve argo workflow submission (#565) * refactor pipeline registry * split pipelines into separate modeeling and embeddings steps * split pipelines into separate modeeling and embeddings steps * add unit tests for pipeline registry * fix unit tests for pipeline registry * add missing install statement * trim down pipeline number * formatting * fix pipeline * remove unused comments * refactor pipeline registry * argo CD refactor * rename argo.py to test_argo.py * refactor argo test * remove Argo-specific CLI * rename _generate_argo_config to generate_argo_config * add templates to ignored * refactor and add more submission tests * sort import statements * fix types in test_argo * fix bug in FusableNode * extend gitignore: * add missing TODOs * change dir structure in CLI tests * remove unused comments * add pycov to dependencies * add missing type annotation * add unit tests for _get_feed_dict * add unit tests for run * extract pipeline initialization one function up * fix pipeline / mock usage in test_run * fix all mock usages in test_run * refactor run.py, extracting run functionality to a separate function * continue refactor of run.py * fix test_run_basic * fix test_run_with_fabricator_env_error test * add more tests * intermediate stage of extending unit test coverage * fix entire function * remove unused test * add TODOs and skips to failing tests * add full argo template generation test * fix missing name in template * register integration mark * add tests for .yaml config * fix types & add test_resourc_root fixture * make fixture names identical to function names * add missing fixtures * continue fixes to argo template submission * fix argo template test * refactor argo template generation test * refactor argo template generation test * fix assert statements in argo worfflow template generator * remove argo_node_spec * add complete template to argo config test * add missing test statements * add neo4j container * add sample template * add comments, and minor refactor * add TODOs, refactor submit to a separate function * add comment explaining how params are passed * amend argo template * add unit test for submission of workflow * fix submit function * add tests for job submission * add separate test for pipeline objects * switch formatting * continue fixing the submission test * improve submission test * adjust test and submission to test for pipeline dict * update submit * update submit * submit changes * uncomment fixture * fix fixture * unskip multi-pipeline test * add tests for verbose and dry run * fix bad comment pattern in template * add cloud alongside test to pipelines * improve formatting in argo sugmnit commands * extend submit options with separate settings for submission and triggering of pipelines. * submit refactor * argo test fix * fix tests * improve documentation * fix mock_dependencies fixture * add tests for save argo config * improve type checking * refactor location of project bootstrap code * add extensive _submit test * final fix to _submit tests * add pytest to deps * add template * replace template with non-defunct one * continue refactoring... * consistently use hyphens in CLI * add argo screenshots * add argo glossary * rename argo_glossary to glossary * add documentation on local argo workflows * add argo docs to config * fix documentation on kedro submit usage * Update docs/src/infrastructure/argo_workflows_locally.md Co-authored-by: Pascal Bro <pascal@everycure.org> * aalign with main * refactor to add run / release schema * test fix * finish pipeline refactor * add spark tests * add spark tags * start refactor of paths in globals in base * update globals * remove comments form base * improve comments in cloud setup * replace int with integration in paths * save evaluation state * fix paths in embeddings * fix paths in matrix generation * update catalog in modelling * debug failing test * add new path structure to test * update lock * fix links to resources in docs * add --load to Makefile * update dependencies * update Kedro catalog to reflect the changed (release- and run- based) architecture of pipeline * update pipelines in accordance with new Kedro Data Catalog structure * remove CLI changes from this PR * add tests for pipeline registry * add missing test fixture * remove commented out paths * reqs update * align version with main * move metric.yml to the modelling dir * in matrix_generation, retain model name * adjust evaluation layers to write to evaluation layer exclusively * change dataset names in catalog * fix name of the dataset used in the pipeline * rename metric so sanity_metrics.yml * rename raws in test env * remove obsolete test dir structure * add separate release and modelling test pipelines * set run and release names in test config to test_run and test_release * swap explicit bucket name for a variable one * add runs / releases to globals for base * remove unnecessary comments * add source_gcs_bucker * use kedro_data as kedro data dir consistently * remove unused commnts * reduce code reuse by using paths from global * test paths are now identical with base * fix file names in globals * change kedro_data to kedro * add changes to globals * save pipeline registry * update tests for pipeline registry * change defaults from local to default-run/release-name * improve formatting in globals * pull release_name from env variables * hardcode relases * minor refactor in globals paths * remove obsolete paths from documentation * fix dataset name * rename evaluation result dataset * remove unused datasets * use modelling rather than embedding path * restore dataset name * add layers to catalog * add layers to filesystem * add feat to embeddings * Rename evaluation.{model}.{evaluation}.result to evaluation.{model}.{evaluation}.model_output.result for consistency with layering system * finish adding layers to Kedro catalog * restore create_pipeline as default name for pipeline * fix name of the pipeline * restor previous structure of pipeline registry * move data to ingestion * fix paths * add ingestion as separate directory * restore AMD in Makefile * minor fixes * restore old pipeline naming system * remove obsolete code from example * restore name pipeline * add test for run name sanitization mechanism * fix run name sanitization * fix failing unit tests for CLI * fix test * delete template * parametrize test for feed dict * remove pipeline list submission * move part of test * simplify test * simplify argo workflow template generation test * fix doc of submit * clarify doc * restore old path to mlflow * Update pipelines/matrix/tests/test_argo.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> --------- Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * add gds secret to workflows namespace * Enable GPU nodes on kubernetes cluster (#598) * add gpu node pools * remove commented out spot nodes * fix accelerator counts and max node counts * add gpu node pool label * fix node pool to g2-standard-16-l4-nodes * add TODO * set labels on GPU / non-GPU nodes * ensure workflow template has negative affinity * Bump commit * Update infra/modules/stacks/compute_cluster/gke.tf Co-authored-by: Pascal Bro <pascal@everycure.org> * bump * test * bump * bump * bump * bump * bump --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * change disk type to pd-ssd --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> * Bugfix/gpu fix 2 (#635) * update service account PK (#604) * Docs updates for improved onboarding flow and pipeline overview * init * initial edits to embeddings and modeling * further updates * added time split * add sentence on frequent flyers * include PMBert * changes * missing lnk removed * missing file adde * wip * bump * data-api --------- Co-authored-by: leelancashire <lee@everycure.org> * hotfix: documentation update for presentation * set explicit disk type for GPU nodes --------- Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: leelancashire <lee@everycure.org> * Add IAM as terraform module for code centric IAM management of the project (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * enable external IP for neo4j * bump * point at different branch * bump * add neo4j auth secet * fix syntax * add moa app * give data minded permission to administer the cluster (#721) * missing line * use new branch for MoA viewer * add data release app * update ns * revert * revert * add secret * purge contractor name - change already applied on different branch (#739) * add prom-graf-stack * enable metrics for argo workflows * start developing on grafana for argo-workflows * setup http forwarding again * Extend engineering permissions (#749) * extend gcp permissions for the tech team * add bq permissions * update notebooks role * right size infrastructure * enable vertical pod autoscaling * adjustments in our node configs to be more cost efficient * adjustments in our node configs to be more cost efficient * feat: grant iam.workloadIdentityPoolAdmin to tech_team_group (#760) This should allow those in the group to be able to modify workloadIdentity Federation, which a.o. things is required to get GitHub Actions from non-main and non-infra branches to run the authentication flow. * improved way of applying grafana * Avoid overwriting raw data with fabricator pipeline (#554) * avoid overwriting raw common error text * solves people having permission to overwrite raw data * working argocd in https * debug: allow the tech team to impersonate service accounts (#768) * debug: allow the tech team to impersonate service accounts * Allow set of contributors to merge PRs to infra * Roles modification to test Gemini call (#774) * adjust way we pass in insecure flag * Add argo deployment of kg-dashboard pointing at development branch (#782) Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Big memory /cost optimized nodes (#767) * add new group types * big instances get ssds * fix https redirect and insecure flag for argo * fix path to branch from Kevin * Revert the changes on permission (#779) * remove ml.admin and aiplatform.admin, add ml.developer role to test gemini call * only modify the permissions on dataminded team * revert the changes on permissions --------- Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Pascal Bro <pascal@everycure.org> * Enable Neo4J endpoint for all releases (#803) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j --------- Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * update to main * move over * update ci * Add MoA visualizer service (#712) * add deployment * update name : * setup route * change protocol * use cluster ip * add moa_vis to repo * create init script * remove suffix arg * fix env variables * fix env variables * setup init contaier * add deployment comments * push ci * push ci * switch to visualizer * add step with correct permissions * correct paths * fix paths * try fix * fix typo * fix typo * remove unused paths * clean comment * fix makefile * hook up img * fix imports * update init script * update init script * redeploy * redeploy * use env vars * fix env * fix templating * update table names * update sql query * update image tag * tag name * change the correct image tag * use correct ing * update to new data * change path to data * fix path * reintroduce assets * correct path * update filename * add to streamlit * switch to pydantic * fix display cols * remove caching and feedback col * fix conflicts * try symlink * fix interpolation * add files * fix issues * fix ci * fix template * setup correct entrypoint * rm old rs * add base settings * fix init script * fix types * add image pull policy * add gs * add gs * img path as string * bump version * pydantic testing * attempt fixes * push changes * push changes * fix settings * add image version * sync vars * push * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * move app version on page (#831) * clean up and move app version * bump v * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> * fix target revisions for various deployments * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add Grafana and Prometheus (#821) * add prom-graf-stack * revert and better namespace * new prometheus stack * move to better folder * cleanup * bump * routes fix * ip * new work * cleanup * bum * bum * wrong nestig * fix metrics * cleanup * xx * x * new dashboard * add app * update * port change * AI: test health probe * move to separate template * bump * correct type * bump * fix prometheus * add also for argocd * fix path for prometheus * WIP * kubeops * fix httproute * add docs * fix kubeops * fix health checks * right port * fix prom endpoint * cleanup * bump * cleanup * bump * Revert "Enable Neo4J endpoint for all releases (#803)" (#841) This reverts commit ef142fa. * listen to infra branch * de-duplicate data-release yaml files (#843) * debug: drop eventbus * Revert "debug: drop eventbus" This reverts commit 3ac8595. * debug: delete 3 major components of Argo Events * debug: re-enable eventbus * debug: re-enable eventsource and sensor * debug: add sync waves to control the deployment order * debug: replace 'in' operation with equality * fix: listen to events in ns argo-workflows During the demo we were listening on an articial event we created in the data-release ns * promote role to clusterrole so SA can observe across namespaces * fix: line continuation in curl multiline * debug: put payload on single line * Feat/neo4j endpoint (#842) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j * add fix * rm breakpoint * add readme * rm certs * update gitignore * rm readme * add infra to trigger * fix circular import * fix tests * add explicit not check * add cert to docker test * ensure dir loaded in ci * push fix * fix ci run * retry * fix tests * allow specifying +s from outside * add docstring * add doc * attempt push * fixmountung * add terraform to infra deploy * retry * add defaults --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * rename * use license * Merge/main to infra to main (#854) * Adds the git sha label to the workflow template and aborts submission if git state dirty. (#771) * adds two new labels: git sha of currently active branch and the flag if git is dirty. * merge two git labels into one. * Add the feature to abort if git requirements are unmet * fix moved docstring by accident * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * disable checking the status of the git repo Otherwise our data scientists will be blocked. We do however need this on main, for the final tests. * debug: attempt to circumvent shaded imports * fix: resolve circular import * fix: ensure node has outputs * fix: default kedro pipeline should not encompass data_release --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add SILC troubleshooting document (#836) * add notebook * finish * add imgs * Setup Spoke KG integration (#772) * adding targets for kgx output * adding targets for kgx output * adding targets for kgx output * adding spoke targets * missed one instance * adding spoke pipeline nodes * initial commit * moving logic to nodes.py * use release path, followd by kgx * adding spoke version * initial commit * adding spoke * adding spoke * fixing typo * initial stab at adding spoke * typo * setting some columns to none as they don't exist in spoke * using node instead of argo_node per Laurens * fixing typo * adding spoke node * removing * removing edits that will be added in a different PR * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * nullifying non-existent columns * toPandas() seems to cause memory leaks issues, this alternative seems to work more reliably * formatting * removing unused nodes and edges * Update pipelines/matrix/src/matrix/pipelines/ingestion/pipeline.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * not using argo_node * not using argo_node * reverting * commenting out spoke again * disable unused --------- Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * Walkthrough on how to build a new modeling pipeline in the MATRIX kedro pipeline (#757) * add stub of model walkthrough * add custom modelling notebook * add dependency on mljax * remove mljax * save state of walkthrough * checkpoint * update user story * update user story * update user story * update notebook * update notbook * save * sort out deps * story update * story update * story update * update user story to include custom function * extend example to support kedro run * add to experimental section of docs * remove examples from settings.py * remove files linked to example * move into walkthroughts and clean up * exclude walkhroughs from nbstripout * add missing imports and output cells * update requirements --------- Co-authored-by: leelancashire <lee@everycure.org> * Break logic up into modular components for data-release nodes (#838) * refactor: separate config from logic * remove unused logger * style: adjust function name * fix: line continuation in curl multiline * fix: tag the commit that generated the data release The GitHub action Checkout, is really more like clone, not allowing you to immediately detach. * fix: remove superfluous JSON string quotes * fix: branch off to comply with protected branch policy * Correct Argo node's output to match the single item returned by its function (#844) Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * add label hide-from-release for Release PRs (#852) * continue running even when release info could not be uploaded --------- Co-authored-by: Emil <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> --------- Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Daniel Rhodes <14894770+drhodesbrc@users.noreply.github.com> Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local>

* Bugfix/gpu resources (#621) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * Update argo-workflows.yaml * point at diff branch * allow 7687 TLS traffic * allow all namespaces to expose neo4j port * gateway only supports 443 and 80 and 8080 * Update infra/argo/applications/dev-namespaces/values.yaml Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> * Release article for v0.2.2 (#611) * require PR description going forward * add stale catcher * push latest * Apply suggestions from code review Co-authored-by: leelancashire <drllancashire@gmail.com> * bump * update post * get done --------- Co-authored-by: leelancashire <drllancashire@gmail.com> * hotfix: documentation deployment failed to load OIDC token * Add test for missing entries in Kedro catalog & remove unused entries (#600) * remove unused datasets * add unit test for unused kedro catalog entries * remove commented out code, create new data_release_with_embeddings pipeline * fix embeddings pipeline * Revert "remove commented out code, create new data_release_with_embeddings pipeline" This reverts commit 2e220ebff3a4fbcd6b5b853ea8ef6ee165c64430. * comment out unused dataset * Feat/improve argo workflow submission (#565) * refactor pipeline registry * split pipelines into separate modeeling and embeddings steps * split pipelines into separate modeeling and embeddings steps * add unit tests for pipeline registry * fix unit tests for pipeline registry * add missing install statement * trim down pipeline number * formatting * fix pipeline * remove unused comments * refactor pipeline registry * argo CD refactor * rename argo.py to test_argo.py * refactor argo test * remove Argo-specific CLI * rename _generate_argo_config to generate_argo_config * add templates to ignored * refactor and add more submission tests * sort import statements * fix types in test_argo * fix bug in FusableNode * extend gitignore: * add missing TODOs * change dir structure in CLI tests * remove unused comments * add pycov to dependencies * add missing type annotation * add unit tests for _get_feed_dict * add unit tests for run * extract pipeline initialization one function up * fix pipeline / mock usage in test_run * fix all mock usages in test_run * refactor run.py, extracting run functionality to a separate function * continue refactor of run.py * fix test_run_basic * fix test_run_with_fabricator_env_error test * add more tests * intermediate stage of extending unit test coverage * fix entire function * remove unused test * add TODOs and skips to failing tests * add full argo template generation test * fix missing name in template * register integration mark * add tests for .yaml config * fix types & add test_resourc_root fixture * make fixture names identical to function names * add missing fixtures * continue fixes to argo template submission * fix argo template test * refactor argo template generation test * refactor argo template generation test * fix assert statements in argo worfflow template generator * remove argo_node_spec * add complete template to argo config test * add missing test statements * add neo4j container * add sample template * add comments, and minor refactor * add TODOs, refactor submit to a separate function * add comment explaining how params are passed * amend argo template * add unit test for submission of workflow * fix submit function * add tests for job submission * add separate test for pipeline objects * switch formatting * continue fixing the submission test * improve submission test * adjust test and submission to test for pipeline dict * update submit * update submit * submit changes * uncomment fixture * fix fixture * unskip multi-pipeline test * add tests for verbose and dry run * fix bad comment pattern in template * add cloud alongside test to pipelines * improve formatting in argo sugmnit commands * extend submit options with separate settings for submission and triggering of pipelines. * submit refactor * argo test fix * fix tests * improve documentation * fix mock_dependencies fixture * add tests for save argo config * improve type checking * refactor location of project bootstrap code * add extensive _submit test * final fix to _submit tests * add pytest to deps * add template * replace template with non-defunct one * continue refactoring... * consistently use hyphens in CLI * add argo screenshots * add argo glossary * rename argo_glossary to glossary * add documentation on local argo workflows * add argo docs to config * fix documentation on kedro submit usage * Update docs/src/infrastructure/argo_workflows_locally.md Co-authored-by: Pascal Bro <pascal@everycure.org> * aalign with main * refactor to add run / release schema * test fix * finish pipeline refactor * add spark tests * add spark tags * start refactor of paths in globals in base * update globals * remove comments form base * improve comments in cloud setup * replace int with integration in paths * save evaluation state * fix paths in embeddings * fix paths in matrix generation * update catalog in modelling * debug failing test * add new path structure to test * update lock * fix links to resources in docs * add --load to Makefile * update dependencies * update Kedro catalog to reflect the changed (release- and run- based) architecture of pipeline * update pipelines in accordance with new Kedro Data Catalog structure * remove CLI changes from this PR * add tests for pipeline registry * add missing test fixture * remove commented out paths * reqs update * align version with main * move metric.yml to the modelling dir * in matrix_generation, retain model name * adjust evaluation layers to write to evaluation layer exclusively * change dataset names in catalog * fix name of the dataset used in the pipeline * rename metric so sanity_metrics.yml * rename raws in test env * remove obsolete test dir structure * add separate release and modelling test pipelines * set run and release names in test config to test_run and test_release * swap explicit bucket name for a variable one * add runs / releases to globals for base * remove unnecessary comments * add source_gcs_bucker * use kedro_data as kedro data dir consistently * remove unused commnts * reduce code reuse by using paths from global * test paths are now identical with base * fix file names in globals * change kedro_data to kedro * add changes to globals * save pipeline registry * update tests for pipeline registry * change defaults from local to default-run/release-name * improve formatting in globals * pull release_name from env variables * hardcode relases * minor refactor in globals paths * remove obsolete paths from documentation * fix dataset name * rename evaluation result dataset * remove unused datasets * use modelling rather than embedding path * restore dataset name * add layers to catalog * add layers to filesystem * add feat to embeddings * Rename evaluation.{model}.{evaluation}.result to evaluation.{model}.{evaluation}.model_output.result for consistency with layering system * finish adding layers to Kedro catalog * restore create_pipeline as default name for pipeline * fix name of the pipeline * restor previous structure of pipeline registry * move data to ingestion * fix paths * add ingestion as separate directory * restore AMD in Makefile * minor fixes * restore old pipeline naming system * remove obsolete code from example * restore name pipeline * add test for run name sanitization mechanism * fix run name sanitization * fix failing unit tests for CLI * fix test * delete template * parametrize test for feed dict * remove pipeline list submission * move part of test * simplify test * simplify argo workflow template generation test * fix doc of submit * clarify doc * restore old path to mlflow * Update pipelines/matrix/tests/test_argo.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> --------- Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * add gds secret to workflows namespace * Enable GPU nodes on kubernetes cluster (#598) * add gpu node pools * remove commented out spot nodes * fix accelerator counts and max node counts * add gpu node pool label * fix node pool to g2-standard-16-l4-nodes * add TODO * set labels on GPU / non-GPU nodes * ensure workflow template has negative affinity * Bump commit * Update infra/modules/stacks/compute_cluster/gke.tf Co-authored-by: Pascal Bro <pascal@everycure.org> * bump * test * bump * bump * bump * bump * bump --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * change disk type to pd-ssd --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> * Bugfix/gpu fix 2 (#635) * update service account PK (#604) * Docs updates for improved onboarding flow and pipeline overview * init * initial edits to embeddings and modeling * further updates * added time split * add sentence on frequent flyers * include PMBert * changes * missing lnk removed * missing file adde * wip * bump * data-api --------- Co-authored-by: leelancashire <lee@everycure.org> * hotfix: documentation update for presentation * set explicit disk type for GPU nodes --------- Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: leelancashire <lee@everycure.org> * Add IAM as terraform module for code centric IAM management of the project (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * enable external IP for neo4j * bump * point at different branch * bump * add neo4j auth secet * fix syntax * add moa app * give data minded permission to administer the cluster (#721) * missing line * use new branch for MoA viewer * add data release app * update ns * revert * revert * add secret * purge contractor name - change already applied on different branch (#739) * add prom-graf-stack * enable metrics for argo workflows * start developing on grafana for argo-workflows * setup http forwarding again * Extend engineering permissions (#749) * extend gcp permissions for the tech team * add bq permissions * update notebooks role * right size infrastructure * enable vertical pod autoscaling * adjustments in our node configs to be more cost efficient * adjustments in our node configs to be more cost efficient * feat: grant iam.workloadIdentityPoolAdmin to tech_team_group (#760) This should allow those in the group to be able to modify workloadIdentity Federation, which a.o. things is required to get GitHub Actions from non-main and non-infra branches to run the authentication flow. * improved way of applying grafana * Avoid overwriting raw data with fabricator pipeline (#554) * avoid overwriting raw common error text * solves people having permission to overwrite raw data * working argocd in https * debug: allow the tech team to impersonate service accounts (#768) * debug: allow the tech team to impersonate service accounts * Allow set of contributors to merge PRs to infra * Roles modification to test Gemini call (#774) * adjust way we pass in insecure flag * Add argo deployment of kg-dashboard pointing at development branch (#782) Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Big memory /cost optimized nodes (#767) * add new group types * big instances get ssds * fix https redirect and insecure flag for argo * fix path to branch from Kevin * Revert the changes on permission (#779) * remove ml.admin and aiplatform.admin, add ml.developer role to test gemini call * only modify the permissions on dataminded team * revert the changes on permissions --------- Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Pascal Bro <pascal@everycure.org> * Enable Neo4J endpoint for all releases (#803) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j --------- Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * update to main * move over * update ci * Add MoA visualizer service (#712) * add deployment * update name : * setup route * change protocol * use cluster ip * add moa_vis to repo * create init script * remove suffix arg * fix env variables * fix env variables * setup init contaier * add deployment comments * push ci * push ci * switch to visualizer * add step with correct permissions * correct paths * fix paths * try fix * fix typo * fix typo * remove unused paths * clean comment * fix makefile * hook up img * fix imports * update init script * update init script * redeploy * redeploy * use env vars * fix env * fix templating * update table names * update sql query * update image tag * tag name * change the correct image tag * use correct ing * update to new data * change path to data * fix path * reintroduce assets * correct path * update filename * add to streamlit * switch to pydantic * fix display cols * remove caching and feedback col * fix conflicts * try symlink * fix interpolation * add files * fix issues * fix ci * fix template * setup correct entrypoint * rm old rs * add base settings * fix init script * fix types * add image pull policy * add gs * add gs * img path as string * bump version * pydantic testing * attempt fixes * push changes * push changes * fix settings * add image version * sync vars * push * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * move app version on page (#831) * clean up and move app version * bump v * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> * fix target revisions for various deployments * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add Grafana and Prometheus (#821) * add prom-graf-stack * revert and better namespace * new prometheus stack * move to better folder * cleanup * bump * routes fix * ip * new work * cleanup * bum * bum * wrong nestig * fix metrics * cleanup * xx * x * new dashboard * add app * update * port change * AI: test health probe * move to separate template * bump * correct type * bump * fix prometheus * add also for argocd * fix path for prometheus * WIP * kubeops * fix httproute * add docs * fix kubeops * fix health checks * right port * fix prom endpoint * cleanup * bump * cleanup * bump * Revert "Enable Neo4J endpoint for all releases (#803)" (#841) This reverts commit ef142fabe2ac3939fecd68b4f135bc8ba8ed23c8. * listen to infra branch * de-duplicate data-release yaml files (#843) * debug: drop eventbus * Revert "debug: drop eventbus" This reverts commit 3ac8595e3edd8beaac05f3f175749335859c4ae2. * debug: delete 3 major components of Argo Events * debug: re-enable eventbus * debug: re-enable eventsource and sensor * debug: add sync waves to control the deployment order * debug: replace 'in' operation with equality * fix: listen to events in ns argo-workflows During the demo we were listening on an articial event we created in the data-release ns * promote role to clusterrole so SA can observe across namespaces * fix: line continuation in curl multiline * debug: put payload on single line * Feat/neo4j endpoint (#842) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j * add fix * rm breakpoint * add readme * rm certs * update gitignore * rm readme * add infra to trigger * fix circular import * fix tests * add explicit not check * add cert to docker test * ensure dir loaded in ci * push fix * fix ci run * retry * fix tests * allow specifying +s from outside * add docstring * add doc * attempt push * fixmountung * add terraform to infra deploy * retry * add defaults --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * rename * use license * Merge/main to infra to main (#854) * Adds the git sha label to the workflow template and aborts submission if git state dirty. (#771) * adds two new labels: git sha of currently active branch and the flag if git is dirty. * merge two git labels into one. * Add the feature to abort if git requirements are unmet * fix moved docstring by accident * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * disable checking the status of the git repo Otherwise our data scientists will be blocked. We do however need this on main, for the final tests. * debug: attempt to circumvent shaded imports * fix: resolve circular import * fix: ensure node has outputs * fix: default kedro pipeline should not encompass data_release --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add SILC troubleshooting document (#836) * add notebook * finish * add imgs * Setup Spoke KG integration (#772) * adding targets for kgx output * adding targets for kgx output * adding targets for kgx output * adding spoke targets * missed one instance * adding spoke pipeline nodes * initial commit * moving logic to nodes.py * use release path, followd by kgx * adding spoke version * initial commit * adding spoke * adding spoke * fixing typo * initial stab at adding spoke * typo * setting some columns to none as they don't exist in spoke * using node instead of argo_node per Laurens * fixing typo * adding spoke node * removing * removing edits that will be added in a different PR * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * nullifying non-existent columns * toPandas() seems to cause memory leaks issues, this alternative seems to work more reliably * formatting * removing unused nodes and edges * Update pipelines/matrix/src/matrix/pipelines/ingestion/pipeline.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * not using argo_node * not using argo_node * reverting * commenting out spoke again * disable unused --------- Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * Walkthrough on how to build a new modeling pipeline in the MATRIX kedro pipeline (#757) * add stub of model walkthrough * add custom modelling notebook * add dependency on mljax * remove mljax * save state of walkthrough * checkpoint * update user story * update user story * update user story * update notebook * update notbook * save * sort out deps * story update * story update * story update * update user story to include custom function * extend example to support kedro run * add to experimental section of docs * remove examples from settings.py * remove files linked to example * move into walkthroughts and clean up * exclude walkhroughs from nbstripout * add missing imports and output cells * update requirements --------- Co-authored-by: leelancashire <lee@everycure.org> * Break logic up into modular components for data-release nodes (#838) * refactor: separate config from logic * remove unused logger * style: adjust function name * fix: line continuation in curl multiline * fix: tag the commit that generated the data release The GitHub action Checkout, is really more like clone, not allowing you to immediately detach. * fix: remove superfluous JSON string quotes * fix: branch off to comply with protected branch policy * Correct Argo node's output to match the single item returned by its function (#844) Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * add label hide-from-release for Release PRs (#852) * continue running even when release info could not be uploaded --------- Co-authored-by: Emil <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * Fixes retention to 180d + use SSD for grafana + gives people access to submit workflows (#856) * retention 180d * better setting of values * wip * enabled everyone to submit argo workflows * bump * fix setting insecure --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * downscale infrastructure for christmas break * add explorer app * set resources to medium by default * Add the 'in' operator filtering on pipeline name in argo. (#920) This ensures both the `kg_release` and `data_release` Kedro pipelines can trigger the release workflow (which creates a draft PR). * add full apoc * Implement Workaround for Release Detection (#935) In combination with #936 , this PR allows us to mark releases (manually), thus also allowing us to trigger the final GitHub Actions workflows. * Adds 3 new git-crypt secret keys (#947) * Add 1 git-crypt collaborator New collaborators: B267AF6E emilkrause <emil.krause@dataminded.com> * Add 1 git-crypt collaborator New collaborators: F896B940 Siyan <siyan.luo@dataminded.com> * Add 1 git-crypt collaborator New collaborators: 317A2E46 Oliver Willekens <oliver.willekens@dataminded.com> --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * bump neo4j memory * bump neo4j memory * fix missing selector for wildcard cert (#976) * Enhancement/trigger test data release (#979) * dummy workflow to trigger our data release * POC dummy workflow * add documentation. * let user set the git sha * removing non-essential labels * removing non-essential labels --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> * update branch * listen to infra * move to main * register app * deploy in correct namespace * add supabase token * add exa_workflow key * Take out project-id as a variable in terraform (#987) * take out project-id as a variable * take out other network parameters as a variable * delete bootstrap file module * remove reference to the bootstrap module * Not relying on default values, putting values in a tfvars file. * enable subs to push images to submit jobs (#981) * add key * Fix the label selector in the workflow-controller Service. (#1056) * Change the selector on the ServiceMonitor to match the one from the Service it targets. (#1057) * Add a label onto argo-workflows ServiceMonitor in order to be picked up by Prometheus as a target. (#1062) * Remove git-crypt for almost everyone except admins (#1053) * delete SA, no longer versioned * do not encrypt nor version the file anymore * updated iam for secrets manager secret * update docs to remove git-crypt * avoid checking in ci * fix CI? * fix ci * bump disk * increase disk size * revert accidental change * add disk type * fix * version locks * Public data release bucket infra code (#1074) * tmp * working bucket as website * working bucket with LB in front of it * docs updates * docs for landing zone * Update variables.tf * cleanup * add mateusz admin key (#1092) * Add 1 git-crypt collaborator New collaborators: 267E0673 Mateusz Wasilewski <mateusz@everycure.org> * add search and parse router keys --------- Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com> * Update pipelines/matrix/conf/base/globals.yml --------- Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Daniel Rhodes <14894770+drhodesbrc@users.noreply.github.com> Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com>

* Bugfix/gpu resources (#621) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * Update argo-workflows.yaml * point at diff branch * allow 7687 TLS traffic * allow all namespaces to expose neo4j port * gateway only supports 443 and 80 and 8080 * Update infra/argo/applications/dev-namespaces/values.yaml Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> * Release article for v0.2.2 (#611) * require PR description going forward * add stale catcher * push latest * Apply suggestions from code review Co-authored-by: leelancashire <drllancashire@gmail.com> * bump * update post * get done --------- Co-authored-by: leelancashire <drllancashire@gmail.com> * hotfix: documentation deployment failed to load OIDC token * Add test for missing entries in Kedro catalog & remove unused entries (#600) * remove unused datasets * add unit test for unused kedro catalog entries * remove commented out code, create new data_release_with_embeddings pipeline * fix embeddings pipeline * Revert "remove commented out code, create new data_release_with_embeddings pipeline" This reverts commit 2e220ebff3a4fbcd6b5b853ea8ef6ee165c64430. * comment out unused dataset * Feat/improve argo workflow submission (#565) * refactor pipeline registry * split pipelines into separate modeeling and embeddings steps * split pipelines into separate modeeling and embeddings steps * add unit tests for pipeline registry * fix unit tests for pipeline registry * add missing install statement * trim down pipeline number * formatting * fix pipeline * remove unused comments * refactor pipeline registry * argo CD refactor * rename argo.py to test_argo.py * refactor argo test * remove Argo-specific CLI * rename _generate_argo_config to generate_argo_config * add templates to ignored * refactor and add more submission tests * sort import statements * fix types in test_argo * fix bug in FusableNode * extend gitignore: * add missing TODOs * change dir structure in CLI tests * remove unused comments * add pycov to dependencies * add missing type annotation * add unit tests for _get_feed_dict * add unit tests for run * extract pipeline initialization one function up * fix pipeline / mock usage in test_run * fix all mock usages in test_run * refactor run.py, extracting run functionality to a separate function * continue refactor of run.py * fix test_run_basic * fix test_run_with_fabricator_env_error test * add more tests * intermediate stage of extending unit test coverage * fix entire function * remove unused test * add TODOs and skips to failing tests * add full argo template generation test * fix missing name in template * register integration mark * add tests for .yaml config * fix types & add test_resourc_root fixture * make fixture names identical to function names * add missing fixtures * continue fixes to argo template submission * fix argo template test * refactor argo template generation test * refactor argo template generation test * fix assert statements in argo worfflow template generator * remove argo_node_spec * add complete template to argo config test * add missing test statements * add neo4j container * add sample template * add comments, and minor refactor * add TODOs, refactor submit to a separate function * add comment explaining how params are passed * amend argo template * add unit test for submission of workflow * fix submit function * add tests for job submission * add separate test for pipeline objects * switch formatting * continue fixing the submission test * improve submission test * adjust test and submission to test for pipeline dict * update submit * update submit * submit changes * uncomment fixture * fix fixture * unskip multi-pipeline test * add tests for verbose and dry run * fix bad comment pattern in template * add cloud alongside test to pipelines * improve formatting in argo sugmnit commands * extend submit options with separate settings for submission and triggering of pipelines. * submit refactor * argo test fix * fix tests * improve documentation * fix mock_dependencies fixture * add tests for save argo config * improve type checking * refactor location of project bootstrap code * add extensive _submit test * final fix to _submit tests * add pytest to deps * add template * replace template with non-defunct one * continue refactoring... * consistently use hyphens in CLI * add argo screenshots * add argo glossary * rename argo_glossary to glossary * add documentation on local argo workflows * add argo docs to config * fix documentation on kedro submit usage * Update docs/src/infrastructure/argo_workflows_locally.md Co-authored-by: Pascal Bro <pascal@everycure.org> * aalign with main * refactor to add run / release schema * test fix * finish pipeline refactor * add spark tests * add spark tags * start refactor of paths in globals in base * update globals * remove comments form base * improve comments in cloud setup * replace int with integration in paths * save evaluation state * fix paths in embeddings * fix paths in matrix generation * update catalog in modelling * debug failing test * add new path structure to test * update lock * fix links to resources in docs * add --load to Makefile * update dependencies * update Kedro catalog to reflect the changed (release- and run- based) architecture of pipeline * update pipelines in accordance with new Kedro Data Catalog structure * remove CLI changes from this PR * add tests for pipeline registry * add missing test fixture * remove commented out paths * reqs update * align version with main * move metric.yml to the modelling dir * in matrix_generation, retain model name * adjust evaluation layers to write to evaluation layer exclusively * change dataset names in catalog * fix name of the dataset used in the pipeline * rename metric so sanity_metrics.yml * rename raws in test env * remove obsolete test dir structure * add separate release and modelling test pipelines * set run and release names in test config to test_run and test_release * swap explicit bucket name for a variable one * add runs / releases to globals for base * remove unnecessary comments * add source_gcs_bucker * use kedro_data as kedro data dir consistently * remove unused commnts * reduce code reuse by using paths from global * test paths are now identical with base * fix file names in globals * change kedro_data to kedro * add changes to globals * save pipeline registry * update tests for pipeline registry * change defaults from local to default-run/release-name * improve formatting in globals * pull release_name from env variables * hardcode relases * minor refactor in globals paths * remove obsolete paths from documentation * fix dataset name * rename evaluation result dataset * remove unused datasets * use modelling rather than embedding path * restore dataset name * add layers to catalog * add layers to filesystem * add feat to embeddings * Rename evaluation.{model}.{evaluation}.result to evaluation.{model}.{evaluation}.model_output.result for consistency with layering system * finish adding layers to Kedro catalog * restore create_pipeline as default name for pipeline * fix name of the pipeline * restor previous structure of pipeline registry * move data to ingestion * fix paths * add ingestion as separate directory * restore AMD in Makefile * minor fixes * restore old pipeline naming system * remove obsolete code from example * restore name pipeline * add test for run name sanitization mechanism * fix run name sanitization * fix failing unit tests for CLI * fix test * delete template * parametrize test for feed dict * remove pipeline list submission * move part of test * simplify test * simplify argo workflow template generation test * fix doc of submit * clarify doc * restore old path to mlflow * Update pipelines/matrix/tests/test_argo.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> --------- Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * add gds secret to workflows namespace * Enable GPU nodes on kubernetes cluster (#598) * add gpu node pools * remove commented out spot nodes * fix accelerator counts and max node counts * add gpu node pool label * fix node pool to g2-standard-16-l4-nodes * add TODO * set labels on GPU / non-GPU nodes * ensure workflow template has negative affinity * Bump commit * Update infra/modules/stacks/compute_cluster/gke.tf Co-authored-by: Pascal Bro <pascal@everycure.org> * bump * test * bump * bump * bump * bump * bump --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * change disk type to pd-ssd --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> * Bugfix/gpu fix 2 (#635) * update service account PK (#604) * Docs updates for improved onboarding flow and pipeline overview * init * initial edits to embeddings and modeling * further updates * added time split * add sentence on frequent flyers * include PMBert * changes * missing lnk removed * missing file adde * wip * bump * data-api --------- Co-authored-by: leelancashire <lee@everycure.org> * hotfix: documentation update for presentation * set explicit disk type for GPU nodes --------- Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: leelancashire <lee@everycure.org> * Add IAM as terraform module for code centric IAM management of the project (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * enable external IP for neo4j * bump * point at different branch * bump * add neo4j auth secet * fix syntax * add moa app * give data minded permission to administer the cluster (#721) * missing line * use new branch for MoA viewer * add data release app * update ns * revert * revert * add secret * purge contractor name - change already applied on different branch (#739) * add prom-graf-stack * enable metrics for argo workflows * start developing on grafana for argo-workflows * setup http forwarding again * Extend engineering permissions (#749) * extend gcp permissions for the tech team * add bq permissions * update notebooks role * right size infrastructure * enable vertical pod autoscaling * adjustments in our node configs to be more cost efficient * adjustments in our node configs to be more cost efficient * feat: grant iam.workloadIdentityPoolAdmin to tech_team_group (#760) This should allow those in the group to be able to modify workloadIdentity Federation, which a.o. things is required to get GitHub Actions from non-main and non-infra branches to run the authentication flow. * improved way of applying grafana * Avoid overwriting raw data with fabricator pipeline (#554) * avoid overwriting raw common error text * solves people having permission to overwrite raw data * working argocd in https * debug: allow the tech team to impersonate service accounts (#768) * debug: allow the tech team to impersonate service accounts * Allow set of contributors to merge PRs to infra * Roles modification to test Gemini call (#774) * adjust way we pass in insecure flag * Add argo deployment of kg-dashboard pointing at development branch (#782) Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Big memory /cost optimized nodes (#767) * add new group types * big instances get ssds * fix https redirect and insecure flag for argo * fix path to branch from Kevin * Revert the changes on permission (#779) * remove ml.admin and aiplatform.admin, add ml.developer role to test gemini call * only modify the permissions on dataminded team * revert the changes on permissions --------- Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Pascal Bro <pascal@everycure.org> * Enable Neo4J endpoint for all releases (#803) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j --------- Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * update to main * move over * update ci * Add MoA visualizer service (#712) * add deployment * update name : * setup route * change protocol * use cluster ip * add moa_vis to repo * create init script * remove suffix arg * fix env variables * fix env variables * setup init contaier * add deployment comments * push ci * push ci * switch to visualizer * add step with correct permissions * correct paths * fix paths * try fix * fix typo * fix typo * remove unused paths * clean comment * fix makefile * hook up img * fix imports * update init script * update init script * redeploy * redeploy * use env vars * fix env * fix templating * update table names * update sql query * update image tag * tag name * change the correct image tag * use correct ing * update to new data * change path to data * fix path * reintroduce assets * correct path * update filename * add to streamlit * switch to pydantic * fix display cols * remove caching and feedback col * fix conflicts * try symlink * fix interpolation * add files * fix issues * fix ci * fix template * setup correct entrypoint * rm old rs * add base settings * fix init script * fix types * add image pull policy * add gs * add gs * img path as string * bump version * pydantic testing * attempt fixes * push changes * push changes * fix settings * add image version * sync vars * push * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * move app version on page (#831) * clean up and move app version * bump v * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> * fix target revisions for various deployments * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add Grafana and Prometheus (#821) * add prom-graf-stack * revert and better namespace * new prometheus stack * move to better folder * cleanup * bump * routes fix * ip * new work * cleanup * bum * bum * wrong nestig * fix metrics * cleanup * xx * x * new dashboard * add app * update * port change * AI: test health probe * move to separate template * bump * correct type * bump * fix prometheus * add also for argocd * fix path for prometheus * WIP * kubeops * fix httproute * add docs * fix kubeops * fix health checks * right port * fix prom endpoint * cleanup * bump * cleanup * bump * Revert "Enable Neo4J endpoint for all releases (#803)" (#841) This reverts commit ef142fabe2ac3939fecd68b4f135bc8ba8ed23c8. * listen to infra branch * de-duplicate data-release yaml files (#843) * debug: drop eventbus * Revert "debug: drop eventbus" This reverts commit 3ac8595e3edd8beaac05f3f175749335859c4ae2. * debug: delete 3 major components of Argo Events * debug: re-enable eventbus * debug: re-enable eventsource and sensor * debug: add sync waves to control the deployment order * debug: replace 'in' operation with equality * fix: listen to events in ns argo-workflows During the demo we were listening on an articial event we created in the data-release ns * promote role to clusterrole so SA can observe across namespaces * fix: line continuation in curl multiline * debug: put payload on single line * Feat/neo4j endpoint (#842) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j * add fix * rm breakpoint * add readme * rm certs * update gitignore * rm readme * add infra to trigger * fix circular import * fix tests * add explicit not check * add cert to docker test * ensure dir loaded in ci * push fix * fix ci run * retry * fix tests * allow specifying +s from outside * add docstring * add doc * attempt push * fixmountung * add terraform to infra deploy * retry * add defaults --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * rename * use license * Merge/main to infra to main (#854) * Adds the git sha label to the workflow template and aborts submission if git state dirty. (#771) * adds two new labels: git sha of currently active branch and the flag if git is dirty. * merge two git labels into one. * Add the feature to abort if git requirements are unmet * fix moved docstring by accident * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * disable checking the status of the git repo Otherwise our data scientists will be blocked. We do however need this on main, for the final tests. * debug: attempt to circumvent shaded imports * fix: resolve circular import * fix: ensure node has outputs * fix: default kedro pipeline should not encompass data_release --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add SILC troubleshooting document (#836) * add notebook * finish * add imgs * Setup Spoke KG integration (#772) * adding targets for kgx output * adding targets for kgx output * adding targets for kgx output * adding spoke targets * missed one instance * adding spoke pipeline nodes * initial commit * moving logic to nodes.py * use release path, followd by kgx * adding spoke version * initial commit * adding spoke * adding spoke * fixing typo * initial stab at adding spoke * typo * setting some columns to none as they don't exist in spoke * using node instead of argo_node per Laurens * fixing typo * adding spoke node * removing * removing edits that will be added in a different PR * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * nullifying non-existent columns * toPandas() seems to cause memory leaks issues, this alternative seems to work more reliably * formatting * removing unused nodes and edges * Update pipelines/matrix/src/matrix/pipelines/ingestion/pipeline.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * not using argo_node * not using argo_node * reverting * commenting out spoke again * disable unused --------- Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * Walkthrough on how to build a new modeling pipeline in the MATRIX kedro pipeline (#757) * add stub of model walkthrough * add custom modelling notebook * add dependency on mljax * remove mljax * save state of walkthrough * checkpoint * update user story * update user story * update user story * update notebook * update notbook * save * sort out deps * story update * story update * story update * update user story to include custom function * extend example to support kedro run * add to experimental section of docs * remove examples from settings.py * remove files linked to example * move into walkthroughts and clean up * exclude walkhroughs from nbstripout * add missing imports and output cells * update requirements --------- Co-authored-by: leelancashire <lee@everycure.org> * Break logic up into modular components for data-release nodes (#838) * refactor: separate config from logic * remove unused logger * style: adjust function name * fix: line continuation in curl multiline * fix: tag the commit that generated the data release The GitHub action Checkout, is really more like clone, not allowing you to immediately detach. * fix: remove superfluous JSON string quotes * fix: branch off to comply with protected branch policy * Correct Argo node's output to match the single item returned by its function (#844) Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * add label hide-from-release for Release PRs (#852) * continue running even when release info could not be uploaded --------- Co-authored-by: Emil <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * Fixes retention to 180d + use SSD for grafana + gives people access to submit workflows (#856) * retention 180d * better setting of values * wip * enabled everyone to submit argo workflows * bump * fix setting insecure --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * downscale infrastructure for christmas break * add explorer app * set resources to medium by default * Add the 'in' operator filtering on pipeline name in argo. (#920) This ensures both the `kg_release` and `data_release` Kedro pipelines can trigger the release workflow (which creates a draft PR). * add full apoc * Implement Workaround for Release Detection (#935) In combination with #936 , this PR allows us to mark releases (manually), thus also allowing us to trigger the final GitHub Actions workflows. * Adds 3 new git-crypt secret keys (#947) * Add 1 git-crypt collaborator New collaborators: B267AF6E emilkrause <emil.krause@dataminded.com> * Add 1 git-crypt collaborator New collaborators: F896B940 Siyan <siyan.luo@dataminded.com> * Add 1 git-crypt collaborator New collaborators: 317A2E46 Oliver Willekens <oliver.willekens@dataminded.com> --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * bump neo4j memory * bump neo4j memory * fix missing selector for wildcard cert (#976) * Enhancement/trigger test data release (#979) * dummy workflow to trigger our data release * POC dummy workflow * add documentation. * let user set the git sha * removing non-essential labels * removing non-essential labels --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> * update branch * listen to infra * move to main * register app * deploy in correct namespace * add supabase token * add exa_workflow key * Take out project-id as a variable in terraform (#987) * take out project-id as a variable * take out other network parameters as a variable * delete bootstrap file module * remove reference to the bootstrap module * Not relying on default values, putting values in a tfvars file. * enable subs to push images to submit jobs (#981) * add key * Fix the label selector in the workflow-controller Service. (#1056) * Change the selector on the ServiceMonitor to match the one from the Service it targets. (#1057) * Add a label onto argo-workflows ServiceMonitor in order to be picked up by Prometheus as a target. (#1062) * Remove git-crypt for almost everyone except admins (#1053) * delete SA, no longer versioned * do not encrypt nor version the file anymore * updated iam for secrets manager secret * update docs to remove git-crypt * avoid checking in ci * fix CI? * fix ci * bump disk * increase disk size * revert accidental change * add disk type * fix * version locks * Public data release bucket infra code (#1074) * tmp * working bucket as website * working bucket with LB in front of it * docs updates * docs for landing zone * Update variables.tf * cleanup * add mateusz admin key (#1092) * Add 1 git-crypt collaborator New collaborators: 267E0673 Mateusz Wasilewski <mateusz@everycure.org> * add search and parse router keys --------- Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com> * Update pipelines/matrix/conf/base/globals.yml * add ledger * use ledger services * add secret to argo-workflows * DS Workbenches on Vertex AI for ML researchers (#1102) * WIP * correct type of instance * bump * bump * wip * bum * wup * bump * updates to makefile * bump * works * docs done * wip * docs formatting * better automount * automount * workbench for jacques * avoid zip * bump * cleanup * locked new dependencies * cleanup pubsub stuff * rm function * cleanup * clone without user * cleanup * Update pipelines/matrix/conf/cloud/globals.yml * comment out version by default * cleanup * fix ruff * add repository_dispatch trigger to pipeline submission (#1138) * working deployment (#1143) * rm wrong oauth resource * add lee to workbench user list (#1155) * add to git crypt * fix secrets (#1163) * update to main * setup ssd correctly * add comment * update * go to main * go to main * go to main * retry * add make env * correct make command * retry --------- Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Daniel Rhodes <14894770+drhodesbrc@users.noreply.github.com> Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com>

* Bugfix/gpu resources (#621) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * Update argo-workflows.yaml * point at diff branch * allow 7687 TLS traffic * allow all namespaces to expose neo4j port * gateway only supports 443 and 80 and 8080 * Update infra/argo/applications/dev-namespaces/values.yaml Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> * Release article for v0.2.2 (#611) * require PR description going forward * add stale catcher * push latest * Apply suggestions from code review Co-authored-by: leelancashire <drllancashire@gmail.com> * bump * update post * get done --------- Co-authored-by: leelancashire <drllancashire@gmail.com> * hotfix: documentation deployment failed to load OIDC token * Add test for missing entries in Kedro catalog & remove unused entries (#600) * remove unused datasets * add unit test for unused kedro catalog entries * remove commented out code, create new data_release_with_embeddings pipeline * fix embeddings pipeline * Revert "remove commented out code, create new data_release_with_embeddings pipeline" This reverts commit 2e220ebff3a4fbcd6b5b853ea8ef6ee165c64430. * comment out unused dataset * Feat/improve argo workflow submission (#565) * refactor pipeline registry * split pipelines into separate modeeling and embeddings steps * split pipelines into separate modeeling and embeddings steps * add unit tests for pipeline registry * fix unit tests for pipeline registry * add missing install statement * trim down pipeline number * formatting * fix pipeline * remove unused comments * refactor pipeline registry * argo CD refactor * rename argo.py to test_argo.py * refactor argo test * remove Argo-specific CLI * rename _generate_argo_config to generate_argo_config * add templates to ignored * refactor and add more submission tests * sort import statements * fix types in test_argo * fix bug in FusableNode * extend gitignore: * add missing TODOs * change dir structure in CLI tests * remove unused comments * add pycov to dependencies * add missing type annotation * add unit tests for _get_feed_dict * add unit tests for run * extract pipeline initialization one function up * fix pipeline / mock usage in test_run * fix all mock usages in test_run * refactor run.py, extracting run functionality to a separate function * continue refactor of run.py * fix test_run_basic * fix test_run_with_fabricator_env_error test * add more tests * intermediate stage of extending unit test coverage * fix entire function * remove unused test * add TODOs and skips to failing tests * add full argo template generation test * fix missing name in template * register integration mark * add tests for .yaml config * fix types & add test_resourc_root fixture * make fixture names identical to function names * add missing fixtures * continue fixes to argo template submission * fix argo template test * refactor argo template generation test * refactor argo template generation test * fix assert statements in argo worfflow template generator * remove argo_node_spec * add complete template to argo config test * add missing test statements * add neo4j container * add sample template * add comments, and minor refactor * add TODOs, refactor submit to a separate function * add comment explaining how params are passed * amend argo template * add unit test for submission of workflow * fix submit function * add tests for job submission * add separate test for pipeline objects * switch formatting * continue fixing the submission test * improve submission test * adjust test and submission to test for pipeline dict * update submit * update submit * submit changes * uncomment fixture * fix fixture * unskip multi-pipeline test * add tests for verbose and dry run * fix bad comment pattern in template * add cloud alongside test to pipelines * improve formatting in argo sugmnit commands * extend submit options with separate settings for submission and triggering of pipelines. * submit refactor * argo test fix * fix tests * improve documentation * fix mock_dependencies fixture * add tests for save argo config * improve type checking * refactor location of project bootstrap code * add extensive _submit test * final fix to _submit tests * add pytest to deps * add template * replace template with non-defunct one * continue refactoring... * consistently use hyphens in CLI * add argo screenshots * add argo glossary * rename argo_glossary to glossary * add documentation on local argo workflows * add argo docs to config * fix documentation on kedro submit usage * Update docs/src/infrastructure/argo_workflows_locally.md Co-authored-by: Pascal Bro <pascal@everycure.org> * aalign with main * refactor to add run / release schema * test fix * finish pipeline refactor * add spark tests * add spark tags * start refactor of paths in globals in base * update globals * remove comments form base * improve comments in cloud setup * replace int with integration in paths * save evaluation state * fix paths in embeddings * fix paths in matrix generation * update catalog in modelling * debug failing test * add new path structure to test * update lock * fix links to resources in docs * add --load to Makefile * update dependencies * update Kedro catalog to reflect the changed (release- and run- based) architecture of pipeline * update pipelines in accordance with new Kedro Data Catalog structure * remove CLI changes from this PR * add tests for pipeline registry * add missing test fixture * remove commented out paths * reqs update * align version with main * move metric.yml to the modelling dir * in matrix_generation, retain model name * adjust evaluation layers to write to evaluation layer exclusively * change dataset names in catalog * fix name of the dataset used in the pipeline * rename metric so sanity_metrics.yml * rename raws in test env * remove obsolete test dir structure * add separate release and modelling test pipelines * set run and release names in test config to test_run and test_release * swap explicit bucket name for a variable one * add runs / releases to globals for base * remove unnecessary comments * add source_gcs_bucker * use kedro_data as kedro data dir consistently * remove unused commnts * reduce code reuse by using paths from global * test paths are now identical with base * fix file names in globals * change kedro_data to kedro * add changes to globals * save pipeline registry * update tests for pipeline registry * change defaults from local to default-run/release-name * improve formatting in globals * pull release_name from env variables * hardcode relases * minor refactor in globals paths * remove obsolete paths from documentation * fix dataset name * rename evaluation result dataset * remove unused datasets * use modelling rather than embedding path * restore dataset name * add layers to catalog * add layers to filesystem * add feat to embeddings * Rename evaluation.{model}.{evaluation}.result to evaluation.{model}.{evaluation}.model_output.result for consistency with layering system * finish adding layers to Kedro catalog * restore create_pipeline as default name for pipeline * fix name of the pipeline * restor previous structure of pipeline registry * move data to ingestion * fix paths * add ingestion as separate directory * restore AMD in Makefile * minor fixes * restore old pipeline naming system * remove obsolete code from example * restore name pipeline * add test for run name sanitization mechanism * fix run name sanitization * fix failing unit tests for CLI * fix test * delete template * parametrize test for feed dict * remove pipeline list submission * move part of test * simplify test * simplify argo workflow template generation test * fix doc of submit * clarify doc * restore old path to mlflow * Update pipelines/matrix/tests/test_argo.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> --------- Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * add gds secret to workflows namespace * Enable GPU nodes on kubernetes cluster (#598) * add gpu node pools * remove commented out spot nodes * fix accelerator counts and max node counts * add gpu node pool label * fix node pool to g2-standard-16-l4-nodes * add TODO * set labels on GPU / non-GPU nodes * ensure workflow template has negative affinity * Bump commit * Update infra/modules/stacks/compute_cluster/gke.tf Co-authored-by: Pascal Bro <pascal@everycure.org> * bump * test * bump * bump * bump * bump * bump --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * change disk type to pd-ssd --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> * Bugfix/gpu fix 2 (#635) * update service account PK (#604) * Docs updates for improved onboarding flow and pipeline overview * init * initial edits to embeddings and modeling * further updates * added time split * add sentence on frequent flyers * include PMBert * changes * missing lnk removed * missing file adde * wip * bump * data-api --------- Co-authored-by: leelancashire <lee@everycure.org> * hotfix: documentation update for presentation * set explicit disk type for GPU nodes --------- Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: leelancashire <lee@everycure.org> * Add IAM as terraform module for code centric IAM management of the project (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * enable external IP for neo4j * bump * point at different branch * bump * add neo4j auth secet * fix syntax * add moa app * give data minded permission to administer the cluster (#721) * missing line * use new branch for MoA viewer * add data release app * update ns * revert * revert * add secret * purge contractor name - change already applied on different branch (#739) * add prom-graf-stack * enable metrics for argo workflows * start developing on grafana for argo-workflows * setup http forwarding again * Extend engineering permissions (#749) * extend gcp permissions for the tech team * add bq permissions * update notebooks role * right size infrastructure * enable vertical pod autoscaling * adjustments in our node configs to be more cost efficient * adjustments in our node configs to be more cost efficient * feat: grant iam.workloadIdentityPoolAdmin to tech_team_group (#760) This should allow those in the group to be able to modify workloadIdentity Federation, which a.o. things is required to get GitHub Actions from non-main and non-infra branches to run the authentication flow. * improved way of applying grafana * Avoid overwriting raw data with fabricator pipeline (#554) * avoid overwriting raw common error text * solves people having permission to overwrite raw data * working argocd in https * debug: allow the tech team to impersonate service accounts (#768) * debug: allow the tech team to impersonate service accounts * Allow set of contributors to merge PRs to infra * Roles modification to test Gemini call (#774) * adjust way we pass in insecure flag * Add argo deployment of kg-dashboard pointing at development branch (#782) Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Big memory /cost optimized nodes (#767) * add new group types * big instances get ssds * fix https redirect and insecure flag for argo * fix path to branch from Kevin * Revert the changes on permission (#779) * remove ml.admin and aiplatform.admin, add ml.developer role to test gemini call * only modify the permissions on dataminded team * revert the changes on permissions --------- Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Pascal Bro <pascal@everycure.org> * Enable Neo4J endpoint for all releases (#803) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j --------- Co-authored-by: Alan <alan@hueb.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * update to main * move over * update ci * Add MoA visualizer service (#712) * add deployment * update name : * setup route * change protocol * use cluster ip * add moa_vis to repo * create init script * remove suffix arg * fix env variables * fix env variables * setup init contaier * add deployment comments * push ci * push ci * switch to visualizer * add step with correct permissions * correct paths * fix paths * try fix * fix typo * fix typo * remove unused paths * clean comment * fix makefile * hook up img * fix imports * update init script * update init script * redeploy * redeploy * use env vars * fix env * fix templating * update table names * update sql query * update image tag * tag name * change the correct image tag * use correct ing * update to new data * change path to data * fix path * reintroduce assets * correct path * update filename * add to streamlit * switch to pydantic * fix display cols * remove caching and feedback col * fix conflicts * try symlink * fix interpolation * add files * fix issues * fix ci * fix template * setup correct entrypoint * rm old rs * add base settings * fix init script * fix types * add image pull policy * add gs * add gs * img path as string * bump version * pydantic testing * attempt fixes * push changes * push changes * fix settings * add image version * sync vars * push * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * move app version on page (#831) * clean up and move app version * bump v * update ci --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> * fix target revisions for various deployments * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add Grafana and Prometheus (#821) * add prom-graf-stack * revert and better namespace * new prometheus stack * move to better folder * cleanup * bump * routes fix * ip * new work * cleanup * bum * bum * wrong nestig * fix metrics * cleanup * xx * x * new dashboard * add app * update * port change * AI: test health probe * move to separate template * bump * correct type * bump * fix prometheus * add also for argocd * fix path for prometheus * WIP * kubeops * fix httproute * add docs * fix kubeops * fix health checks * right port * fix prom endpoint * cleanup * bump * cleanup * bump * Revert "Enable Neo4J endpoint for all releases (#803)" (#841) This reverts commit 378b705c3775ff4fa63f2cdbbb3ae03cd1798a93. * listen to infra branch * de-duplicate data-release yaml files (#843) * debug: drop eventbus * Revert "debug: drop eventbus" This reverts commit 568ec79d5eb7ae2ce57ac224223bb25be2fa26b7. * debug: delete 3 major components of Argo Events * debug: re-enable eventbus * debug: re-enable eventsource and sensor * debug: add sync waves to control the deployment order * debug: replace 'in' operation with equality * fix: listen to events in ns argo-workflows During the demo we were listening on an articial event we created in the data-release ns * promote role to clusterrole so SA can observe across namespaces * fix: line continuation in curl multiline * debug: put payload on single line * Feat/neo4j endpoint (#842) * move all ARGO CD targeting to a new `infra` branch * run CI on terraform on infra branch intsead * add pre commit for github actions * github actions changes to make path filtering happen at workflow level * paths * s * also run infra deployment only on specific filters with dorny * bump * x * checkout for infra branch * add clone permissions * xi * x * bump * bump * rm old matrix module * make deploy dependent on plan * rm file * concurrency to 1 * move concurrency for CI * update to target infra branch * avoid defaul * bump * increase mlflow size again * mlflow ephemeral storage bug * x * x * increase mlflow size further * pubmedbert endpoint * added spec * deleted obsolete file * added quick locust for endpoints on k8s * add tmp gateway for api * turn on filestore driver * turn on filestore driver * do not run plan in env * bump * added project reference for gcs backend * rm backend and provider * cleanup * avoid attempt to create bucket * test different env for terraform * try with ro user * test jwt token permissions * bump * test with new filter for ref on rw user * do not lock when planning * avoid reading * debug * try breaking this * b * change env * debug again * avoid deploy for nwo * make openai parameterized via env variable * ignore cache directories * parametrize endpoints in makefile * send random number of requests in locust request * add joblib caching and proper compliance to OAI response * bake model into image * gen fake data with locust * updated system to behave as expected in scale up-down behavior * cleanup readme * update scaling * Dev/bte trapi deploy helm (#260) * added helm chart for deploying bte-trapi locally * changed bte-trapi.yaml template, removed bte-trapi application folder in new branch * MLFlow to GCS (#293) * add example * add work * push changes * rm breakpoint * call save * rm breakpoint * reenable save * rm debugging stuff * commit changes * rm mlflow file * rm lock file * rm test * allow proxying * add the release version to path * add changes * rm subpath in mlflow * Update onboarding.md * push * revert * revert * disable miniop * rm minio user * correct * reenable * set artifact location * revert commenting * Add 1 git-crypt collaborator (#343) New collaborators: 225C3B75 ahueb <alan@hueb.org> * Update index.md (#336) * Update index.md Updated onboarding content with remaining information from Notion which hasn't already been pulled across * Update docs/src/onboarding/index.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update index.md --------- Co-authored-by: Pascal Bro <pascal@everycure.org> * New script to retry docker compose_down in CI and debug when it's having issues (#357) * add new script to debug docker issues * cleanup structure a bit * Add Robokop data to ingestion pipeline (#188) * add * add todos * add todo pointers * Robokop Ingestion Pipeline added fields for Robokop ingestion. * update gitignore * update ignore * ignore idea files * add pointers for fabrication * Edits from Laurens comments modified files after Laurens comments * Cleaned up removed KC "TODO"s. Fixed Typos * flushing out additional columns per real robokop data * aligning fabricator column data with schema * renaming node name * removing duplicate spark_csv * updating * reverting to String * removing fabricator details * using LazySparkDataset, removing schema info * run pre-commit * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/conf/base/ingestion/catalog.yml Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * fixing typo * fix layers * take subset of columns * setting header to true * overriding catalog due to change in raw path * add * fix dataset name as - is not supported by bq * Update spark.yml * add new node function for robokop nodes * update * set unit seperator * add descriptiopn --------- Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * Update pipelines/matrix/conf/base/fabricator/parameters.yml --------- Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Pascal Bro <pascal@everycure.org> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * add if statement to only debug on infra branch * rm 2 * add default artifact root * connect to pgsql * connect to correct svc * rm coc * Update .github/ISSUE_TEMPLATE/onboarding.md Co-authored-by: Pascal Bro <pascal@everycure.org> * Update pipelines/matrix/src/matrix/hooks.py Co-authored-by: Pascal Bro <pascal@everycure.org> * moved specs in the right places * fix wrong reference to old namespace for httproute * update cert ref * update cert ref * update API endpoint to support 2 models * update memory requirements * add pdb * introduce spot based API backing * Update infra/modules/stacks/compute_cluster/gke.tf * Update services/pubmedbert_embeddings/README.md * Add 1 git-crypt collaborator New collaborators: 7BEAB3B9 Joe Sykora <joseph@everycure.org> * replace from preemptible to spot * bump * bump * BioThings Explorer infrastructure (#463) * added bte helm chart * removing resource requests * added templates foldet and deployment.yaml * updated values to match deployment * update revision for bte-infra * lowered resource request * added bte service.yaml * changed nodePort * create a new namespace for every dev that they can work in (#522) * use cluster IP for neo4j * new httproute * bump * Update argo-workflows.yaml * add reverse proxy for exposing via helm * point at diff branch * bump * bump * bump * bump * bump * ensure reverting of listener * bump * WIP * bump * cluster ip * bump * use load balancer instead of ingresss * Update infra/argo/app-of-apps/templates/argo-workflows.yaml * switch to clusterip for service * wip * try * use LB * rm ingress * bump * bump * bump * bump * fix sovler * ssl all the way * enable https on neo4j * enable ssl better and enable prometheus * also enable bloom * x * Update * ibum * bump * add more links to docs * blom * fix paths * working * cleanup * fix neo4j * add fix * rm breakpoint * add readme * rm certs * update gitignore * rm readme * add infra to trigger * fix circular import * fix tests * add explicit not check * add cert to docker test * ensure dir loaded in ci * push fix * fix ci run * retry * fix tests * allow specifying +s from outside * add docstring * add doc * attempt push * fixmountung * add terraform to infra deploy * retry * add defaults --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> * rename * use license * Merge/main to infra to main (#854) * Adds the git sha label to the workflow template and aborts submission if git state dirty. (#771) * adds two new labels: git sha of currently active branch and the flag if git is dirty. * merge two git labels into one. * Add the feature to abort if git requirements are unmet * fix moved docstring by accident * feat/trigger release from gh action (#819) This PR introduces Argo sensors (and eventbus and a few other related services) together with dedicated GitHub Actions to facilitate the data release process. In a nutshell, when someone triggers a data-release pipeline (either through the Kedro pipeline "kg_release" or through "data_release"), it will –on success– create a PR that contains a draft of the release notes and an associated article, together with parameters that should be published to the Every Cure website, so people may easily find out which parameters (a subset thereof) were associated with a certain data release. --------- Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * disable checking the status of the git repo Otherwise our data scientists will be blocked. We do however need this on main, for the final tests. * debug: attempt to circumvent shaded imports * fix: resolve circular import * fix: ensure node has outputs * fix: default kedro pipeline should not encompass data_release --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> * Add SILC troubleshooting document (#836) * add notebook * finish * add imgs * Setup Spoke KG integration (#772) * adding targets for kgx output * adding targets for kgx output * adding targets for kgx output * adding spoke targets * missed one instance * adding spoke pipeline nodes * initial commit * moving logic to nodes.py * use release path, followd by kgx * adding spoke version * initial commit * adding spoke * adding spoke * fixing typo * initial stab at adding spoke * typo * setting some columns to none as they don't exist in spoke * using node instead of argo_node per Laurens * fixing typo * adding spoke node * removing * removing edits that will be added in a different PR * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * nullifying non-existent columns * toPandas() seems to cause memory leaks issues, this alternative seems to work more reliably * formatting * removing unused nodes and edges * Update pipelines/matrix/src/matrix/pipelines/ingestion/pipeline.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * Update pipelines/matrix/src/matrix/settings.py Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * not using argo_node * not using argo_node * reverting * commenting out spoke again * disable unused --------- Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> * Walkthrough on how to build a new modeling pipeline in the MATRIX kedro pipeline (#757) * add stub of model walkthrough * add custom modelling notebook * add dependency on mljax * remove mljax * save state of walkthrough * checkpoint * update user story * update user story * update user story * update notebook * update notbook * save * sort out deps * story update * story update * story update * update user story to include custom function * extend example to support kedro run * add to experimental section of docs * remove examples from settings.py * remove files linked to example * move into walkthroughts and clean up * exclude walkhroughs from nbstripout * add missing imports and output cells * update requirements --------- Co-authored-by: leelancashire <lee@everycure.org> * Break logic up into modular components for data-release nodes (#838) * refactor: separate config from logic * remove unused logger * style: adjust function name * fix: line continuation in curl multiline * fix: tag the commit that generated the data release The GitHub action Checkout, is really more like clone, not allowing you to immediately detach. * fix: remove superfluous JSON string quotes * fix: branch off to comply with protected branch policy * Correct Argo node's output to match the single item returned by its function (#844) Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * add label hide-from-release for Release PRs (#852) * continue running even when release info could not be uploaded --------- Co-authored-by: Emil <emil.krause.44@gmail.com> Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> * Fixes retention to 180d + use SSD for grafana + gives people access to submit workflows (#856) * retention 180d * better setting of values * wip * enabled everyone to submit argo workflows * bump * fix setting insecure --------- Co-authored-by: Laurens Vijnck <laurens@everycure.org> Co-authored-by: Laurens <90421718+lvijnck@users.noreply.github.com> * downscale infrastructure for christmas break * add explorer app * set resources to medium by default * Add the 'in' operator filtering on pipeline name in argo. (#920) This ensures both the `kg_release` and `data_release` Kedro pipelines can trigger the release workflow (which creates a draft PR). * add full apoc * Implement Workaround for Release Detection (#935) In combination with #936 , this PR allows us to mark releases (manually), thus also allowing us to trigger the final GitHub Actions workflows. * Adds 3 new git-crypt secret keys (#947) * Add 1 git-crypt collaborator New collaborators: B267AF6E emilkrause <emil.krause@dataminded.com> * Add 1 git-crypt collaborator New collaborators: F896B940 Siyan <siyan.luo@dataminded.com> * Add 1 git-crypt collaborator New collaborators: 317A2E46 Oliver Willekens <oliver.willekens@dataminded.com> --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org> * bump neo4j memory * bump neo4j memory * fix missing selector for wildcard cert (#976) * Enhancement/trigger test data release (#979) * dummy workflow to trigger our data release * POC dummy workflow * add documentation. * let user set the git sha * removing non-essential labels * removing non-essential labels --------- Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> * update branch * listen to infra * move to main * register app * deploy in correct namespace * add supabase token * add exa_workflow key * Take out project-id as a variable in terraform (#987) * take out project-id as a variable * take out other network parameters as a variable * delete bootstrap file module * remove reference to the bootstrap module * Not relying on default values, putting values in a tfvars file. * enable subs to push images to submit jobs (#981) * add key * Fix the label selector in the workflow-controller Service. (#1056) * Change the selector on the ServiceMonitor to match the one from the Service it targets. (#1057) * Add a label onto argo-workflows ServiceMonitor in order to be picked up by Prometheus as a target. (#1062) * Remove git-crypt for almost everyone except admins (#1053) * delete SA, no longer versioned * do not encrypt nor version the file anymore * updated iam for secrets manager secret * update docs to remove git-crypt * avoid checking in ci * fix CI? * fix ci * bump disk * increase disk size * revert accidental change * add disk type * fix * version locks * Public data release bucket infra code (#1074) * tmp * working bucket as website * working bucket with LB in front of it * docs updates * docs for landing zone * Update variables.tf * cleanup * add mateusz admin key (#1092) * Add 1 git-crypt collaborator New collaborators: 267E0673 Mateusz Wasilewski <mateusz@everycure.org> * add search and parse router keys --------- Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com> * Update pipelines/matrix/conf/base/globals.yml * add ledger * use ledger services * add secret to argo-workflows * DS Workbenches on Vertex AI for ML researchers (#1102) * WIP * correct type of instance * bump * bump * wip * bum * wup * bump * updates to makefile * bump * works * docs done * wip * docs formatting * better automount * automount * workbench for jacques * avoid zip * bump * cleanup * locked new dependencies * cleanup pubsub stuff * rm function * cleanup * clone without user * cleanup * Update pipelines/matrix/conf/cloud/globals.yml * comment out version by default * cleanup * fix ruff * add repository_dispatch trigger to pipeline submission (#1138) * working deployment (#1143) * rm wrong oauth resource * add lee to workbench user list (#1155) * add to git crypt * fix secrets (#1163) * update to main * setup ssd correctly * add comment * update * go to main * go to main * go to main * retry * add make env * correct make command * retry --------- Co-authored-by: Mateusz <39764611+matwasilewski@users.noreply.github.com> Co-authored-by: Pascal Brokmeier <pascal@everycure.org> Co-authored-by: Alan <alan@hueb.org> Co-authored-by: elliottsharp <elliott.sharp@hotmail.com> Co-authored-by: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Co-authored-by: Jason Reilly <jdr0887@gmail.com> Co-authored-by: Jason Reilly <jdr0887@users.noreply.github.com> Co-authored-by: leelancashire <drllancashire@gmail.com> Co-authored-by: vjsykora <jsykora@Joe> Co-authored-by: leelancashire <lee@everycure.org> Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Oliver W. <4733178+oliverw1@users.noreply.github.com> Co-authored-by: Siyan Luo <89979939+Siyan-Luo@users.noreply.github.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Siyan Luo <siyanluo@Siyans-MacBook-Pro.local> Co-authored-by: Daniel Rhodes <14894770+drhodesbrc@users.noreply.github.com> Co-authored-by: emil-k <emil.krause.44@gmail.com> Co-authored-by: Piotr Kaniewski <115791652+piotrkan@users.noreply.github.com> Co-authored-by: All of us at Every Cure <releasebot@everycure.org> Co-authored-by: Siyan Luo <siyanluo@DMM-SiyLuo.local> Co-authored-by: matwasilewski <mat.p.wasilewski@gmail.com>

…oject (#628) * add iam terraform module * update variables * update variables * iam codified now * cleanup --------- Co-authored-by: Pascal Brokmeier <pascal@everycure.org>

alexeistepa marked this pull request as ready for review November 7, 2024 13:52

alexeistepa requested review from lvijnck and matwasilewski as code owners November 7, 2024 13:52

pascalwhoop reviewed Nov 7, 2024

View reviewed changes

Comment thread infra/deployments/hub/dev/iam.tf Outdated

may-lim linked an issue Nov 8, 2024 that may be closed by this pull request

Create an iam.tf file for the dev hub deployment to manage iam in code #627

Closed

pascalwhoop changed the title ~~add iam terraform module~~ Add IAM as terraform module for code centric IAM management of the project Nov 8, 2024

pascalwhoop changed the base branch from main to infra November 8, 2024 12:40

alexeistepa and others added 4 commits November 8, 2024 13:41

add iam terraform module

2bd6f9c

update variables

cd46308

update variables

a2e6ad4

iam codified now

3b6161b

pascalwhoop force-pushed the feat/iam branch from c834fa4 to 3b6161b Compare November 8, 2024 13:17

cleanup

a392654

pascalwhoop approved these changes Nov 8, 2024

View reviewed changes

pascalwhoop enabled auto-merge (squash) November 8, 2024 13:19

pascalwhoop assigned matwasilewski and lvijnck and unassigned matwasilewski Nov 11, 2024

pascalwhoop added this to the 10 Experiments per week: Technical Capabilities milestone Nov 11, 2024

pascalwhoop disabled auto-merge November 11, 2024 15:52

pascalwhoop merged commit cd33962 into infra Nov 11, 2024

oliverw1 mentioned this pull request Mar 24, 2025

Use different secrets for Production #1334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IAM as terraform module for code centric IAM management of the project#628

Add IAM as terraform module for code centric IAM management of the project#628
pascalwhoop merged 5 commits intoinfrafrom
feat/iam

alexeistepa commented Nov 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

pascalwhoop commented Nov 7, 2024

Uh oh!

pascalwhoop commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexeistepa commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pascalwhoop commented Nov 7, 2024

Uh oh!

pascalwhoop commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexeistepa commented Nov 7, 2024 •

edited

Loading