New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: work around segfault with >100 jobs in google life sciences backend #1451
Conversation
@moschetti would you mind taking a look at this and we can talk about what might be best practice? E.g., importing and defining inline vs. defining globally vs limiting the scope vs making it customizable, etc. |
@@ -897,24 +899,27 @@ def _retry_request(self, request, timeout=2, attempts=3): | |||
attempts: remaining attempts, throw error when hit 0 | |||
""" | |||
import googleapiclient | |||
|
|||
import google.auth | |||
credentials, project_id = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vsoch do we authenticate this way in the other places as well? Or can there maybe be an auth object for the entire life science executor object instead creating a special one here in the retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We go about it different ways, but I suspect under the hood since both are using the default application credentials (exported to the environment as GOOGLE_APPLICATION_CREDENTIALS
) the methods are similar. We use the discovery API clients to authenticate clients that are attached to the entire class, e.g., here
snakemake/snakemake/executors/google_lifesciences.py
Lines 136 to 165 in 417f40d
def _get_services(self): | |
"""use the Google Discovery Build to generate API clients | |
for Life Sciences, and use the google storage python client | |
for storage. | |
""" | |
from googleapiclient.discovery import build as discovery_build | |
from oauth2client.client import ( | |
GoogleCredentials, | |
ApplicationDefaultCredentialsError, | |
) | |
from google.cloud import storage | |
# Credentials must be exported to environment | |
try: | |
creds = GoogleCredentials.get_application_default() | |
except ApplicationDefaultCredentialsError as ex: | |
log_verbose_traceback(ex) | |
raise ex | |
# Discovery clients for Google Cloud Storage and Life Sciences API | |
self._storage_cli = discovery_build( | |
"storage", "v1", credentials=creds, cache_discovery=False | |
) | |
self._compute_cli = discovery_build( | |
"compute", "v1", credentials=creds, cache_discovery=False | |
) | |
self._api = discovery_build( | |
"lifesciences", "v2beta", credentials=creds, cache_discovery=False | |
) | |
self._bucket_service = storage.Client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request object is generated ultimately from one of those services - either directly or as a result of doing like pipelines.run() so I'm curious why the request object isn't coming with its own http already (or maybe some of them are but not consistently?) E.g., pipelines.run() here is the one that was originally giving us trouble:
operation = pipelines.run(parent=self.location, body=body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my first attempt I tried passing the credentials created here:
creds = GoogleCredentials.get_application_default() |
To
google_auth_httplib2.AuthorizedHttp
in _retry_request
, but it did not like those credentials giving this error: AttributeError: '_JWTAccessCredentials' object has no attribute 'before_request'
which led me to this issue, where I found to use google.auth
for the credentials.
So I pushed some new commits using google's recommended way to build the service apis. Seems to work so far on the test case. Still testing on a more complex workflow. Sorry if the commits are a bit messy. Still new to contributing and using git. |
No need to apologize, you're doing great. |
Kudos, SonarCloud Quality Gate passed!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @cademirch! So, am I getting it right that basically using google.auth overall solves the segfault issue then?
Let us know once your larger tests pass. I will wait with the merging until then. |
Yes mostly. Though I believe using google_auth_httplib2 for building the requests is the more critical part of the solution.
My larger workflow ran overnight without error, though it did end prematurely:
I don't think this premature ending is related to this issue though, I've seen this behavior before when running on cloud execution mode. |
Mhm, that premature ending is kind of worrying as well of course. I have never seen anything like that. Can you file a separate issue for that or is it already open? |
Yeah I have only seen it a few times. I think it could be related to |
Description
PR to fix segfault issue with
--google-lifesciences
with larger (at least 100 jobs) workflows. #1444QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).