Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-19319 Make replication key as automatic inclusion #195

Merged

Conversation

prijendev
Copy link
Contributor

Description of change

  • Replaced replication key of deals stream with property_hs_lastmodifieddate field.
  • Updated replication method of deals stream in the tap to INCREMENTAL.
  • Updated replication key in the base.py of tap-tester.
  • Updated automatic_field and start_date tap tester test case.

Manual QA steps

Risks

Rollback steps

  • revert this branch

prijendev and others added 4 commits June 6, 2022 14:41
* Added versionTimestamp in contacts stream.

* Updated bookmark test case.

* Updated comments.

* Updated replication method of contacts stream.
* Initial commit for request_timeout

* Updated unit test case error

* Removed path of unittest file

* Updated config file

* Updated readme file
@prijendev prijendev changed the title Initial commit. TDL-19319 Make replication key as automatic inclusion Jun 7, 2022
@@ -599,14 +599,23 @@ def sync_companies(STATE, ctx):
def has_selected_custom_field(mdata):
top_level_custom_props = [x for x in mdata if len(x) == 2 and 'property_' in x[1]]
for prop in top_level_custom_props:
if mdata.get(prop, {}).get('selected') == True:
if (mdata.get(prop, {}).get('selected') == True) or (mdata.get(prop, {}).get('inclusion') == "automatic"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return True if the custom field is automatic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prijendev can you add this statement as the Code comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment in the code.

@@ -599,14 +599,23 @@ def sync_companies(STATE, ctx):
def has_selected_custom_field(mdata):
top_level_custom_props = [x for x in mdata if len(x) == 2 and 'property_' in x[1]]
for prop in top_level_custom_props:
if mdata.get(prop, {}).get('selected') == True:
if (mdata.get(prop, {}).get('selected') == True) or (mdata.get(prop, {}).get('inclusion') == "automatic"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prijendev can you add this statement as the Code comment

@@ -105,7 +105,7 @@ def expected_metadata(self): # DOCS_BUG https://stitchdata.atlassian.net/browse
"deals": {
self.PRIMARY_KEYS: {"dealId"},
self.REPLICATION_METHOD: self.INCREMENTAL,
self.REPLICATION_KEYS: {"hs_lastmodifieddate"},
self.REPLICATION_KEYS: {"property_hs_lastmodifieddate"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include the property_ prefix from the denesting method? None of the other replicaiton keys have this. And I think it will break every customer's connection that is replicating this stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, hs_lastmodifieddate is the bookmark key for the deals stream. hs_lastmodifieddate is available in the properties field at the nested level.
For example, the response structure:

{
    "dealId": 1,
    "properties": {
        "hs_lastmodifieddate": 1653590080602
    }
}

As hs_lastmodifieddate is not available at the 1st level, it can not be marked as automatic inclusion and we can not select that particular field even. Also, tap includes all nested fields of the properties field as custom fields in the schema by appending the prefix property_ along with each field.

For example, the schema of the deals stream looks like as below,

{
    "dealId": {"type": ["null", "integer"]},
    "properties": {"type": "object", "properties": {
            "hs_lastmodifieddate": {},
            "hs_likelihood_to_close": {}
        }
    },
    "property_hs_lastmodifieddate": {},
    "property_hs_likelihood_to_close": {}
}

That's why we are just marking property_hs_lastmodifieddate as the inclusion of automatic. So, I am not sure in which case it will break the customer's connection because we have not removed or updated any fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this nested structure is true of the other streams as well, but companies for example denests the key and does not prefix property_ onto the key. I think it will break downstream for customers because the value of the replication key is changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kspeer825 We have changed the bookmark key just for the deals stream. This change will not affect any other streams.
If bookmark(last_modifieddate) is already available for the deals stream, then tap will use only that value for the first time. (#196 )
After completing sync, the tap will write an updated bookmark in the new bookmark key(i.e. property_lastmodified_date).
We haven't removed the existing bookmark as well, so in case we need to revert back, we can able to.
After sync, the bookmark will look like the below.

  "bookmarks": {
        "stream_id_1": {
            "old_bookmark": "2022-04-08T00:00:00",
            "current_bookmark": "2022-04-15T00:00:00"
        }
    }

We have incorporated this change to fix the customer's issue. So, @KrisPersonal shall we plan an alpha release first for the particular customer?

@prijendev prijendev requested a review from kspeer825 June 8, 2022 13:47
@prijendev prijendev changed the base branch from crest-master to master June 15, 2022 06:41
@@ -69,7 +69,7 @@ def test_run(self):
expected_keys = self.expected_automatic_fields().get(stream)

# BUG_TDL-9939 https://jira.talendforge.org/browse/TDL-9939 Replication keys are not included as an automatic field for these streams
if stream in {'companies', 'deals', 'subscription_changes', 'email_events'}:
if stream in {'companies', 'subscription_changes', 'email_events'}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this change for the other streams with replication keys that are missing automatic inclusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kspeer825 Yes, the replication key is not automatic for companies, subscription_changes, and email_events streams.

  1. The companies stream has the same issue(link) of the replication key as the deals stream. So, can we create a card to resolve this issue and work on that?
  2. subscription_changes and email_events streams are retrieving records in the date window. These streams are saving the start date of the last window as a bookmark value in startTimestamp field.
    email_events stream has a created field in the API response and subscription_changes has a timestamp field in the response that we may use as a replication key for the respective stream. But, we are not sure whether filter parameters(Start timestamp, End timestamp) are following these keys or not as we can't find any doc stating the same, so we asked a question for the same in the community(link).
    Possible solutions:
    1. We can add a new startTimestamp field in the record explicitly and mark that field as an automatic to keep current behavior as is.
    2. Update the bookmark key for email_events and subscription_changes as created and timestamp respectively if we got a positive reply from the support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sounds good
  2. Are you able to verify the usage of these keys by exploratory testing? If you have a record before the StartTimestamp, another record after the EndTimestamp, and another record between those two I think you'd answer that question. I supposed you'd actually need two more records at either of those fence posts to see if the API treats either inclusively.

Copy link
Contributor Author

@prijendev prijendev Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion with @cosimon , we concluded to make a change in the replication key just for the deals stream and company stream as this problem is for 2 streams only. We are not making any changes in subscription_changes and email_events as of now.

@prijendev prijendev requested a review from kspeer825 June 29, 2022 14:42
Copy link
Contributor

@kspeer825 kspeer825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

prijendev and others added 3 commits July 29, 2022 11:16
* Clearing bookmark for existing connection.

* Updated bookmark logic to utilize older bookmark key.

* Updated logic to keep older bookmark key in the state.

* Added unittest cases.

* Updated unit test cases.

* Resolved review comments.
prijendev and others added 2 commits September 8, 2022 20:44
* fixed pylint issues TDL-15447

* changing methods to functions since there is no-self-use

* pylint issues

* more pylint issues

* added pylint disable comment for some scenarios

* remove duplicate-code from config.yaml

* pylint comment for test_bookmarks

* remove duplicate code

* removed wrong-import-order
@kethan1122 kethan1122 merged commit 681bc8f into master Sep 13, 2022
@HarrisonMarcRose HarrisonMarcRose deleted the TDL-19319-make-replication-key-as-automatic-inclusion branch October 20, 2022 15:16
Marcos314 added a commit to gupy-io/tap-hubspot that referenced this pull request May 11, 2023
* Qa/save test artifacts (singer-io#201)

* add logging and save test logs as artifacts

* name should be static method so it plays well with BaseCase

Co-authored-by: kspeer <kspeer@stitchdata.com>

* skip fields missing from schema for contact_lists (singer-io#202)

Co-authored-by: kspeer <kspeer@stitchdata.com>

* add limitExempt to all fields test workaround (singer-io#203)

Co-authored-by: kspeer <kspeer@stitchdata.com>

* TDL-19319 Make replication key as automatic inclusion (singer-io#195)

* TDL 19241 add version timestamp in contacts (singer-io#191)

* Added versionTimestamp in contacts stream.

* Updated bookmark test case.

* Updated comments.

* Updated replication method of contacts stream.

* TDL 16686 Implement request timeout (singer-io#177)

* Initial commit for request_timeout

* Updated unit test case error

* Removed path of unittest file

* Updated config file

* Updated readme file

* Initial commit.

* Updated interruptes sync offset test case

* Resolved comments.

* Replaced replication key of deals stream with property_hs_lastmodifieddate field.

* Updated discover test case.

* Updated ingterrupted_sync test case.

* Utilize bookmark of existing connection (singer-io#196)

* Clearing bookmark for existing connection.

* Updated bookmark logic to utilize older bookmark key.

* Updated logic to keep older bookmark key in the state.

* Added unittest cases.

* Updated unit test cases.

* Resolved review comments.

* Updated get_start to use old bookmark for companies stream  method.

* Tdl 15447 fix pylint issues (singer-io#204)

* fixed pylint issues TDL-15447

* changing methods to functions since there is no-self-use

* pylint issues

* more pylint issues

* added pylint disable comment for some scenarios

* remove duplicate-code from config.yaml

* pylint comment for test_bookmarks

* remove duplicate code

* removed wrong-import-order

Co-authored-by: kethan1122 <105211331+kethan1122@users.noreply.github.com>

---------

Co-authored-by: Kyle Speer <54034650+kspeer825@users.noreply.github.com>
Co-authored-by: kspeer <kspeer@stitchdata.com>
Co-authored-by: Prijen Khokhani <88327452+prijendev@users.noreply.github.com>
Co-authored-by: kethan1122 <105211331+kethan1122@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants