Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-17000: fix breaking changes and TDL-17017: Fix bookmark for conversation_parts stream. #45

Merged
merged 21 commits into from
Jan 5, 2022

Conversation

savan-chovatiya
Copy link
Contributor

@savan-chovatiya savan-chovatiya commented Dec 30, 2021

Description of change

TDL-17000: fix breaking changes

  • Updated epoch_milliseconds_to_dt_str() function to convert epoch to UTC time.
  • Added time_extracted to write_record function.

TDL-17017: Fix bookmark for conversation_parts stream.

  • Inherited ConversationParts from BaseStream and created custom sync and get_records which will work in the following manner.

    • All the conversation_parts will be emitted for the parents(conversations) returned based on the start date/bookmark.
    • The bookmark will be updated with the parent's replication key after the collection of child records for one parent.
  • Updated search query for conversation stream to consider records greater or equal to bookmark from API. As the last stable version, 1.1.3 was considering records with the same replication key and bookmark value.

Manual QA steps

  • Verified that bookmark is written in UTC timestamp.
  • Verified that all records are emitted with time_extracted field.
  • Verified that bookmark is written properly after syncing conversation_parts of every conversation with update_at of a parent.

Risks

Rollback steps

  • revert this branch

@savan-chovatiya savan-chovatiya changed the base branch from master to crest-master December 30, 2021 10:30
tap_intercom/streams.py Outdated Show resolved Hide resolved
"""
# Get bookmark of parent stream `conversations` from tap_state
parent_bookmark = singer.get_bookmark(tap_state,
self.parent.tap_stream_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach isn't in line with our best practices. All streams should bookmark independently. State dependencies like this cause code that is very hard to read and maintain (as seen with the duplicate concept of "tap_state" versus "state" here).

So, the conversation_parts will need a custom sync method regardless of its approach so it can make its own way through the parent stream and store its own bookmarks as it needs.

Additionally, the parent should be rather low volume (I'd expect below 100k), so any optimization gained from this approach is lost in code complexity.

@@ -373,7 +377,7 @@ def get_records(self, bookmark_datetime=None, is_parent=False) -> Iterator[list]
yield from records


class ConversationParts(Conversations):
class ConversationParts(FullTableStream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a design consideration, this stream is a sort of custom concept, and I'd expect that it can probably inherit directly from the BaseStream class to do its custom bookmarking independent of the other more straightforward sync logic.

I would also still call this an incremental stream, as it bookmarks on it's parent's replication key. This approach is just a more coarse grained incremental extraction, but it's closer to incremental than full table. It was mislabeled as full table in the past which caused this confusion in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated conversation_parts sync as per suggestion. Inherited ConversationParts from BaseStream and kept it incremental by writing a custom sync which will bookmark on its parent replication key.

'operator': '=',
'value': self.dt_to_epoch_seconds(bookmark_datetime)
}]
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An updated search query for conversation stream to consider records greater or equal to bookmark from API as same as v1.3.3.

@savan-chovatiya savan-chovatiya changed the title TDL-17017: Fix bookmark for conversation_parts stream. TDL-17000: fix breaking changes and TDL-17017: Fix bookmark for conversation_parts stream. Dec 31, 2021
Copy link
Contributor

@dmosorast dmosorast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor changes and cleanup, I think overall this looks good.

tap_intercom/streams.py Outdated Show resolved Hide resolved
tap_intercom/streams.py Outdated Show resolved Hide resolved
tap_intercom/streams.py Outdated Show resolved Hide resolved
tap_intercom/streams.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dmosorast dmosorast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@hpatel41 hpatel41 merged commit 159281d into crest-master Jan 5, 2022
@hpatel41 hpatel41 mentioned this pull request Jan 5, 2022
KrisPersonal pushed a commit that referenced this pull request Jan 5, 2022
* TDL-17000: fix breaking changes and TDL-17017: Fix bookmark for conversation_parts stream. (#45)

* Reverted conversation_parts to full_table for parent as V1.1.3

* Updated bookmark test to handle conversation_parts

* Updated comments

* Added code to select parent_stream if only child stream is selected same as v1.1.3

* Added code comments

* Resolved review comment

* Resolved review comment

* Added code comments

* Added bookmark mechanism for conversation_parts same as before but independant of conversations

* Added code comments

* Added query paramater to get conversations greater and equal to bookmark

* Adding changes of PR42 for better unit tests

* Added logger messages

* Added unit test

* Updated unit test

* Updated unit test

* Updated variables in unit test

* Added time_extracted for conversation_parts

* Resolved review comments

* Added companies to untestable for bookmark test

* Reverted changes of last commit

* Tdl 17003 Added back missing field for contact streams (#43)

* Reverted the logic as per old version

* Removed leads as well as it is not supported by tap

* Tdl 17006 revert back companies to incremental (#44)

* Reverted companies to incremental stream

* Debuging integration test

* Removed print statement

* Added companies to untestable for bookmark test

* TDL-17002: Make the logger messages more descriptive (#46)

* added logger messages

* removed unnecessary loggers

* removed unnecessary loggers

* updated the code to use stream name instead of parent stream name

Co-authored-by: savan-chovatiya <80703490+savan-chovatiya@users.noreply.github.com>
Co-authored-by: Umang Agrawal <80704207+umangagrawal-crest@users.noreply.github.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants