TDL-24687: Enhance tap performance#150
Conversation
|
|
||
| LOGGER = singer.get_logger() | ||
|
|
||
| DEFAULT_WAIT = 60 # Default wait time for backoff |
There was a problem hiding this comment.
As stated in zendesk documentation
| try: | ||
| response_json = await response.json() | ||
| except Exception: # pylint: disable=broad-except | ||
| response_json = {} |
There was a problem hiding this comment.
The except Exception block is too broad. It would be better to catch specific exceptions. Also in case of exception we should log the warning.
There was a problem hiding this comment.
I have created a separate function raise_for_error_for_async for the response validation.
| @backoff.on_exception(backoff.expo, | ||
| (ConnectionError, ConnectionResetError, Timeout, ChunkedEncodingError, ProtocolError),#As ConnectionError error and timeout error does not have attribute status_code, | ||
| max_tries=5, # here we added another backoff expression. | ||
| factor=2) |
| """ | ||
| Perform an asynchronous GET request | ||
| """ | ||
| while True: |
There was a problem hiding this comment.
Using while True: loop might lead into infinite loop if for some reason we don't receive 200 status code. I think we should have some max. retry limits applied here. Also can't we raise custom exceptions and handle these retries in backoff logic itself?
There was a problem hiding this comment.
Removed while loop and added backoff mechanism to retry 5 times.
|
|
||
| from tap_zendesk import http, streams | ||
|
|
||
| class TestASyncTicketAudits(unittest.TestCase): |
There was a problem hiding this comment.
The test does not cover scenarios where the paginate_ticket_audits function might raise exceptions.
There was a problem hiding this comment.
Added scenario to validate exception
| async def mock_get_objects(session, ticket_id): | ||
| return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | ||
|
|
||
|
|
||
|
|
||
| instance = streams.TicketAudits(None, {}) | ||
| instance.stream = 'ticket_audits' | ||
|
|
||
|
|
||
| # Run the sync method | ||
| async def run_test(): |
There was a problem hiding this comment.
| async def mock_get_objects(session, ticket_id): | |
| return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | |
| instance = streams.TicketAudits(None, {}) | |
| instance.stream = 'ticket_audits' | |
| # Run the sync method | |
| async def run_test(): | |
| async def mock_get_objects(session, ticket_id): | |
| return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | |
| instance = streams.TicketAudits(None, {}) | |
| instance.stream = 'ticket_audits' | |
| # Run the sync method | |
| async def run_test(): |
|
|
||
| @aioresponses() | ||
| @patch('asyncio.sleep', return_value=None) | ||
| def test_call_api_async_conflict(self, mocked, mock_sleep): |
There was a problem hiding this comment.
Added docstring in all the functions
RushiT0122
left a comment
There was a problem hiding this comment.
Requested changes inline.
Addressed all the requested changes |
…-io/tap-zendesk into TDL-24687/enhance-tap-performance
vishalp-dev
left a comment
There was a problem hiding this comment.
please check the comment added
Description of change
In the current setup:
We first fetch tickets.
For each ticket, we make three separate API calls:
One for ticket_comments
One for ticket_metrics
One for ticket_audits
With 10,000 tickets, this results in over 30,000 API calls, which takes a long extraction time. So, we have added following potential fix to boos the tap performance,
Reduce API calls by side-loading
When fetching a ticket, it is possible to also fetch the ticket_metrics in a single call as a side load, eliminating the need for a separate API call for ticket_metrics.
Similarly, when fetching ticket_audits, we can also fetch ticket_comments as a side load, removing the need for an additional API call to retrieve ticket_comments.
Make API calls asynchronously
More details can be found here in the ticket.
Manual QA steps
Risks
Rollback steps
AI generated code
https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code