-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the integration of the Metrics pipeline in the release #19
Conversation
If the folder has a `-`, importing the functions from a module would fail.
Local files are no longer expected
This drops the argument parsing approach. Now the metrics generation script is parametrised from a config. On top of that, the script has been accommodated to expect remote files and not work with local files 2614, 2613
This file was necessary when the app was hosted in Heroku. Streamlit Cloud is used now
DataFrame in this case is not only used to do the type checking
These files are now available at gs://otar000-evidence_input/release-metrics/gold-standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there's somethign off, we need to optimize probably the creation of the machine or the job submission.
export CLUSTER_NAME=ot-release-metrics | ||
export CLUSTER_REGION=europe-west1 | ||
|
||
gcloud dataproc clusters create ${CLUSTER_NAME} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command create a cluster with a master and two workers. Is there a reason not to use a single node dataproc cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd simply forgotten adding the flag.
I've been able to run the metrics script in ~20min for the first 22.09 pipeline run following the instructions in the README. Logs available at https://console.cloud.google.com/dataproc/jobs/ac69daf526dd4c6091b73e48e0fa2719/monitoring?region=europe-west1&project=open-targets-eu-dev |
This PR includes: