Skip to content

TDL-24687: Enhance tap performance#150

Merged
prijendev merged 15 commits into
masterfrom
TDL-24687/enhance-tap-performance
Nov 7, 2024
Merged

TDL-24687: Enhance tap performance#150
prijendev merged 15 commits into
masterfrom
TDL-24687/enhance-tap-performance

Conversation

@prijendev
Copy link
Copy Markdown
Contributor

@prijendev prijendev commented Oct 28, 2024

Description of change

In the current setup:

We first fetch tickets.
For each ticket, we make three separate API calls:

  • One for ticket_comments

  • One for ticket_metrics

  • One for ticket_audits
    With 10,000 tickets, this results in over 30,000 API calls, which takes a long extraction time. So, we have added following potential fix to boos the tap performance,

  • Reduce API calls by side-loading

    • Ticket Metrics Side Load:
      When fetching a ticket, it is possible to also fetch the ticket_metrics in a single call as a side load, eliminating the need for a separate API call for ticket_metrics.
    • Ticket Audits and Comments Side Load:
      Similarly, when fetching ticket_audits, we can also fetch ticket_comments as a side load, removing the need for an additional API call to retrieve ticket_comments.
    • By combining these side loads, the total number of API calls will be reduced significantly, from over 30,000 calls to just 10,000 for the same 10,000 tickets.
  • Make API calls asynchronously

    • The total time required for the tap to complete in sync mode has been reduced by 90% compared to the current version.

More details can be found here in the ticket.

Manual QA steps

  • Verify that the discover mode is working as expected.
  • Verify that sync mode is working as expected.
  • Verify the no of records, state, and schema for each sync.
  • Verify that tap is working as expected with the state as well

Risks

Rollback steps

  • revert this branch

AI generated code

https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code

  • this PR has been written with the help of GitHub Copilot or another generative AI tool

@prijendev prijendev changed the title Tdl 24687/enhance tap performance TDL-24687: Enhance tap performance Oct 28, 2024
Comment thread tap_zendesk/http.py

LOGGER = singer.get_logger()

DEFAULT_WAIT = 60 # Default wait time for backoff
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in zendesk documentation

Comment thread tap_zendesk/http.py Outdated
try:
response_json = await response.json()
except Exception: # pylint: disable=broad-except
response_json = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception block is too broad. It would be better to catch specific exceptions. Also in case of exception we should log the warning.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a separate function raise_for_error_for_async for the response validation.

Comment thread tap_zendesk/http.py Outdated
Comment on lines +213 to +216
@backoff.on_exception(backoff.expo,
(ConnectionError, ConnectionResetError, Timeout, ChunkedEncodingError, ProtocolError),#As ConnectionError error and timeout error does not have attribute status_code,
max_tries=5, # here we added another backoff expression.
factor=2)
Copy link
Copy Markdown
Contributor

@RushiT0122 RushiT0122 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the indentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

Comment thread tap_zendesk/http.py Outdated
"""
Perform an asynchronous GET request
"""
while True:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using while True: loop might lead into infinite loop if for some reason we don't receive 200 status code. I think we should have some max. retry limits applied here. Also can't we raise custom exceptions and handle these retries in backoff logic itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed while loop and added backoff mechanism to retry 5 times.

Comment thread tap_zendesk/http.py Outdated
Comment thread tap_zendesk/streams.py Outdated
Comment thread tap_zendesk/streams.py Outdated
Comment thread tap_zendesk/streams.py
Comment thread tap_zendesk/streams.py Outdated

from tap_zendesk import http, streams

class TestASyncTicketAudits(unittest.TestCase):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does not cover scenarios where the paginate_ticket_audits function might raise exceptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added scenario to validate exception

Comment thread test/unittests/test_async_ticket_audits.py Outdated
Comment on lines +21 to +31
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]



instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'


# Run the sync method
async def run_test():
Copy link
Copy Markdown
Contributor

@RushiT0122 RushiT0122 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]
instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'
# Run the sync method
async def run_test():
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]
instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'
# Run the sync method
async def run_test():


@aioresponses()
@patch('asyncio.sleep', return_value=None)
def test_call_api_async_conflict(self, mocked, mock_sleep):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring is missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring in all the functions

Comment thread test/unittests/test_http.py Outdated
Copy link
Copy Markdown
Contributor

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes inline.

@prijendev
Copy link
Copy Markdown
Contributor Author

Requested changes inline.

Addressed all the requested changes

@prijendev prijendev requested a review from RushiT0122 October 28, 2024 11:34
Comment thread tap_zendesk/http.py
Comment thread tap_zendesk/http.py
@RushiT0122 RushiT0122 self-requested a review October 30, 2024 08:27
Comment thread tap_zendesk/streams.py
Comment thread tap_zendesk/streams.py
Copy link
Copy Markdown
Member

@vishalp-dev vishalp-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the comment added

@prijendev prijendev requested a review from vishalp-dev November 7, 2024 08:00
@prijendev prijendev merged commit f75a006 into master Nov 7, 2024
prijendev added a commit that referenced this pull request Nov 8, 2024
prijendev added a commit that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants