Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the integration of the Metrics pipeline in the release #19

Merged
merged 21 commits into from
Sep 16, 2022

Conversation

ireneisdoomed
Copy link
Contributor

This PR includes:

  • The changes to adapt the metrics script to work with remote data (#2613)
  • The changes to parametrise the metrics script with a configuration file (#2614)
  • General repo restructure to make it cleaner

If the folder has a `-`, importing the functions from a module would fail.
This drops the argument parsing approach. Now the metrics generation script is parametrised from a config. On top of that, the script has been accommodated to expect remote files and not work with local files

2614, 2613
This file was necessary when the app was hosted in Heroku. Streamlit Cloud is used now
DataFrame in this case is not only used to do the type checking
These files are now available at gs://otar000-evidence_input/release-metrics/gold-standard
Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there's somethign off, we need to optimize probably the creation of the machine or the job submission.

docs/metric-calculation.md Show resolved Hide resolved
export CLUSTER_NAME=ot-release-metrics
export CLUSTER_REGION=europe-west1

gcloud dataproc clusters create ${CLUSTER_NAME} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command create a cluster with a master and two workers. Is there a reason not to use a single node dataproc cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'd simply forgotten adding the flag.

@ireneisdoomed
Copy link
Contributor Author

I've been able to run the metrics script in ~20min for the first 22.09 pipeline run following the instructions in the README. Logs available at https://console.cloud.google.com/dataproc/jobs/ac69daf526dd4c6091b73e48e0fa2719/monitoring?region=europe-west1&project=open-targets-eu-dev

@ireneisdoomed ireneisdoomed merged commit e3c887e into il-2638 Sep 16, 2022
@tskir tskir deleted the il-2614 branch October 21, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants