Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/32 #62

Merged
merged 45 commits into from
Jan 28, 2022
Merged

Release/32 #62

merged 45 commits into from
Jan 28, 2022

Conversation

DownstreamDataTeam
Copy link
Contributor

@DownstreamDataTeam DownstreamDataTeam commented Dec 8, 2021

Description of change

Started refactoring the Mambu Tap, only for Loan Accounts (for the moment).
We decided on refactoring due to the nature of the changes we would need to make in order to fix a bug in Loan Accounts.
The problem there was that accrued_interest is changed constantly, but records are filtered only by modified_date (which does not update on accrued_interest changes). As such, the only solution is filtering by modified_date OR account_appraisal_date.

Manual QA steps

  • Checked tap output to be identical to what it was before, only Loan Accounts was changed (filter by ModifiedDate OR AppraisalDate)

Risks

Rollback steps

  • revert this branch

@cmerrick
Copy link
Contributor

cmerrick commented Dec 8, 2021

Hi @DownstreamDataTeam, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

…kback_window for the loan_transactions stream
…/32'

[ECDDC-500] Lookback window bug fix

See merge request mambucom/product/ecosystem/mambu-marketplace/connectors/singer/tap-mambu!10
…'release/32'"

This reverts merge request !10
@dmosorast
Copy link
Contributor

I was able to give this PR a look over today, and I think it seems good, structurally. The refactoring pattern of "Generator" and "Processor" isn't something we've seen before, but makes sense. The way I'm reading this is that you're trying to set up the tap extraction as a sort-of pipeline in itself to give some flexibility with handling certain "types" of each stream (Please let me know if that's not a correct understanding).

I'm not ready to give a deep PR review line-by-line, but I was able to run through the tap-tester tests today, and it looks like they're failing with regard to the loan_repayments stream.

Tests test_start_date.StartDateTest, test_pagination.PaginationTest, and test_sync_canary.SyncCanaryTest failed because the loan_repayments stream did not emit any records.

Test test_automatic_fields.AutomaticFieldsTest failed with a KeyError on what looks like the PK for the stream. Here's the stacktrace, hope it's helpful for your review:

2021-12-21 22:03:19,823Z    tap - CRITICAL 'encoded_key'
2021-12-21 22:03:19,826Z    tap - Traceback (most recent call last):
2021-12-21 22:03:19,826Z    tap -   File "/usr/local/share/virtualenvs/tap-mambu/bin/tap-mambu", line 33, in <module>
2021-12-21 22:03:19,826Z    tap -     sys.exit(load_entry_point('tap-mambu', 'console_scripts', 'tap-mambu')())
2021-12-21 22:03:19,826Z    tap -   File "/usr/local/share/virtualenvs/tap-mambu/lib/python3.8/site-packages/singer/utils.py", line 229, in wrapped
2021-12-21 22:03:19,826Z    tap -     return fnc(*args, **kwargs)
2021-12-21 22:03:19,826Z    tap -   File "/opt/code/tap-mambu/tap_mambu/__init__.py", line 50, in main
2021-12-21 22:03:19,826Z    tap -     sync(client=client,
2021-12-21 22:03:19,826Z    tap -   File "/opt/code/tap-mambu/tap_mambu/sync.py", line 905, in sync
2021-12-21 22:03:19,827Z    tap -     total_records = sync_endpoint_refactor(
2021-12-21 22:03:19,827Z    tap -   File "/opt/code/tap-mambu/tap_mambu/tap_mambu_refactor/__init__.py", line 29, in sync_endpoint_refactor
2021-12-21 22:03:19,827Z    tap -     return processor.process_streams_from_generators(generators=generators)
2021-12-21 22:03:19,827Z    tap -   File "/opt/code/tap-mambu/tap_mambu/tap_mambu_refactor/TapProcessors/processor.py", line 58, in process_streams_from_generators
2021-12-21 22:03:19,827Z    tap -     if min_record_value == self.generator_values[iterator][self.deduplication_key]:
2021-12-21 22:03:19,827Z    tap - KeyError: 'encoded_key'

Copy link
Contributor

@dmosorast dmosorast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long, I think I understand the gist and this looks good to me. Just a few comments of small potential for cleanup. I'm assuming that in general, the code will get cleaner as you move streams over to the generator/processor style. I'll start looking at what we need to get it released.

else:
self.generators[0].max_bookmark_value = transformed_record[bookmark_field]

if bookmark_field and (bookmark_field in transformed_record):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged with the above if block.

if self.last_batch_size < self.limit:
raise StopIteration()
self.offset += self.limit
# self.write_bookmark()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be deleted. Since the sorting parameter is id and the bookmark key is a datetime, bookmarking at this point seems like it wouldn't work out.

@dmosorast dmosorast merged commit 7c883aa into singer-io:master Jan 28, 2022
@singer-bot
Copy link

You did it @DownstreamDataTeam!

Thank you for signing the Singer Contribution License Agreement.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants