Skip to content

[TDL-22683] Use Date Windowing Pagination#113

Merged
RushiT0122 merged 70 commits intomasterfrom
TDL-22683-date-windowing-pagination
May 13, 2024
Merged

[TDL-22683] Use Date Windowing Pagination#113
RushiT0122 merged 70 commits intomasterfrom
TDL-22683-date-windowing-pagination

Conversation

@shantanu73
Copy link
Contributor

@shantanu73 shantanu73 commented Feb 20, 2024

Description of change

  1. Implemented date windowing for all multi-threaded streams to optimize the process of fetching stream data and resolve infinite loop issues in extraction runs.
  2. Added a limit to buffer size of fetched records to release so that memory consumption can be optimized.
  3. Removed performance metrics calculation which was adding additional overhead on tap performance.
  4. Fixed data loss issue in interrupted sync scenario.

Manual QA steps

  • Ran sync runs, and python memory profiling to determine tap memory consumption and stream performance.

Risks

Rollback steps

  • revert this branch

cosimon and others added 30 commits December 20, 2023 18:14
1) Fixed Date windowing logic for multithreaded_bookmark_generator.
2) Added date_windowing instance variable as true in the constructor of activities stream.
1) Removed endpoint params from activities generator.
2) Overrode modify_reques_params method for Loan Transactions generator.
3) Removed the method _all_fetch_batch_steps from multithreaded bookmark generator.
4) Modified the implementation for _all_fetch_batch_steps method in multithreaded offset generator to include date windowing.
5) Added new method modify_reques_params in multithreaded offset generator.
1) Set max threads back to 20.
2) Fixed bug in Date windowing implementation.
1) Changed number of threads from 5 to default 20 for gl_journal_entries stream.
2) Added date window implementation for gl_journal_entries stream.
1) Removed redundant initialization of date_windowing for Activities, GL Journal entries & Loan transactions streams.
2) Changed max threads for Deposit transactions stream to default value.
@RushiT0122 RushiT0122 force-pushed the TDL-22683-date-windowing-pagination branch from bbeec12 to c4232d6 Compare May 3, 2024 13:19
@RushiT0122 RushiT0122 force-pushed the TDL-22683-date-windowing-pagination branch from c4232d6 to 5b691a9 Compare May 3, 2024 13:21

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintain the spacing as on previous lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed already.

@@ -1,35 +0,0 @@
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to remove these tests completely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it because it was increasing execution time and not adding much value to the test suite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there only a few records, or do we have no records at all? If we do have a small number of records, we can reduce the page limit and test this specific stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window_size implementation fixed the issue, removing clients from untestable_streams().

self.base_url = base_url
self.page_size = page_size
try:
self.window_size = int(float(window_size)) if window_size else DEFAULT_DATE_WINDOW_SIZE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do integer typecasting directly instead of, converting to float first?

Copy link
Contributor

@RushiT0122 RushiT0122 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> int("10.0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '10.0'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused import time

@RushiT0122 RushiT0122 merged commit 79a9963 into master May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments