Skip to content

Client does not retry on intermittent GraphQL API errors #388

@t-persson

Description

@t-persson

Description

If there is an intermittent error on the GraphQL API our ETOS API will return 400 Bad Request which will not cause a retry from the ETOS Client.

Additional Context

status_forcelist=[500, 502, 503, 504], # Common temporary error status codes

https://github.com/eiffel-community/etos-api/blob/694b8b010f9db864c92a5291bfed13b9ffec3b6e/python/src/etos_api/routers/v0/router.py#L120-L122

Logs

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/usr/local/lib/python3.9/site-packages/gql/client.py", line 1537, in _execute
    result = await self.transport.execute(
  File "/usr/local/lib/python3.9/site-packages/gql/transport/aiohttp.py", line 308, in execute
    async with self.session.post(self.url, ssl=self.ssl, **post_args) as resp:
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1425, in __aenter__
    self._resp: _RetType = await self._coro
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 730, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1059, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/usr/local/lib/python3.9/site-packages/aiohttp/streams.py", line 672, in read
    await self._waiter
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fea700dcdc0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/etos_api/routers/v0/router.py", line 115, in _start
    artifact = await wait_for_artifact_created(
  File "/usr/local/lib/python3.9/site-packages/etos_api/routers/v0/utilities.py", line 67, in wait_for_artifact_created
    artifacts = await query_handler.execute(query % artifact_identifier)
  File "/usr/local/lib/python3.9/site-packages/etos_api/library/graphql.py", line 52, in execute
    return await session.execute(gql(query))
  File "/usr/local/lib/python3.9/site-packages/gql/client.py", line 1628, in execute
    result = await self._execute(
  File "/usr/local/lib/python3.9/site-packages/gql/client.py", line 1537, in _execute
    result = await self.transport.execute(
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/opentelemetry/trace/__init__.py", line 589, in use_span
    yield span
  File "/usr/local/lib/python3.9/site-packages/opentelemetry/sdk/trace/__init__.py", line 1105, in start_as_current_span
    yield span
  File "/usr/local/lib/python3.9/site-packages/opentelemetry/trace/__init__.py", line 452, in start_as_current_span
    yield span
  File "/usr/local/lib/python3.9/site-packages/etos_api/routers/v0/router.py", line 57, in start_etos
    return await _start(etos, span)
  File "/usr/local/lib/python3.9/site-packages/etos_api/routers/v0/router.py", line 120, in _start
    raise HTTPException(
fastapi.exceptions.HTTPException: 400: Could not connect to GraphQL.

Expected Behavior

We should return a better status code, maybe with a retry-after header, so that the ETOS client retries and the intermittent fault does not affect a testrun.

Steps To Reproduce

No response

The version of this project/repo, if applicable

No response

The version/edition of the Eiffel Protocol used, if applicable

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions