Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK: Robust Metamist Integrations #784

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

milo-hyben
Copy link
Collaborator

@milo-hyben milo-hyben commented Jun 5, 2024

This PR is implementing items for task Robust Metamist Integrations on the road map.

Add a re-try mechanism to catch failing to connect to metamist when writing analysis entries.
Gracefully catch and handle failures, when writing to metamist fails.
Add integration tests to validate these behaviours.

TODO

  • Increase test coverage for metamist.py (currently on main 42.8%) to as close to 100% as possible
    WIP, this PR latest coverage at 76%
  • Two functions in metamist are with comments 'not used anywhere' (process_existing_analysis, find_joint_calling_analysis), Should we remove? It would make test coverage at 91% after this PR.
  • Make API call more robust, esp. differentiate between thrown API Exception
    It seems we just need to differentiate between ServiceException and the rest of ApiException.
    ServiceException can be retried or other we should stop.
  • If getting HTTP 504 (Gateway Timeout), try to reconnect, addressing this issue:
    Metamist timeouts killing pipelines #585
    Error Details: It fails on
"/cpg_workflows/status.py", line 73, in complete_analysis_job
 a_id = aapi.create_analysis(project=project_name, analysis=this_analysis)

Exception:

ServiceException(http_resp=r)
metamist.exceptions.ServiceException: (504)
Reason: Gateway Timeout

@vivbak
Copy link
Contributor

vivbak commented Jun 7, 2024

Two functions in metamist are with comments 'not used anywhere' (process_existing_analysis, find_joint_calling_analysis), Should we remove? It would make test coverage at 91% after this PR.

Yes, but could we do it in a separate PR (merged before this one). If something goes wrong it's easier to track down the cause!


from cpg_utils import to_path
from metamist.apis import AnalysisApi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Gross that we used to do this here.

from metamist.exceptions import ApiException
from metamist.models import Analysis, AnalysisStatus

from .metamist import AnalysisStatus, get_metamist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to keep an eye on this it case it creates any pickling issues. Let's try a test run later!

@milo-hyben milo-hyben marked this pull request as ready for review July 9, 2024 07:34
@milo-hyben milo-hyben requested a review from vivbak July 9, 2024 07:35
logging.error(
f'Error: {e} Call {api_func} failed with payload:\n{str(kwargv)}',
)
# TODO: discuss should we catch all here as well?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly a good idea.

return True


def mock_aapi_update_analysis_timeout_fails(*args, **kwargs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what does aapi mean ?

@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=3, min=8, max=30),
retry=retry_if_exception_type(ServiceException),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way for us to be alerted if too many of these exceptions are raised?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants