Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

Merged
merged 5 commits into from Oct 6, 2023

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Oct 3, 2023

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

In stellar/stellar-core#3952 we discovered that the tx meta emitted by captive core can appear in a random order. This is problematic for Horizon because we would like ingestion to always be deterministic and reproducible. For some releases we test our ingestion system by ingesting history twice, once with the release candidate and the second time with the previous stable version and then we assert that content produced by both runs are the same. If tx meta can appear in a random order this can result in horizon effects being produced in a different order.

This commit is an attempt to make the ordering of Tx Meta consistent in Horizon ingestion.

Known limitations

stellar/stellar-core#3954 is an attempt to fix this issue in core. Once that change is included in a core stable release we can revert this PR.

@tamirms tamirms requested a review from a team October 3, 2023 08:26
@MonsieurNicolas
Copy link
Contributor

This change is surprising to me: the key order in meta has never been deterministic (since 2015), so what changed in Horizon and/or clients?

If this is it to help with testing, it seems odd to take a dependency on ordering in production?

Also, meta already stored basically everywhere (Horizon DB, core-DB in "legacy mode") is already non-canonical so this change may not be enough.

@MonsieurNicolas
Copy link
Contributor

I mean: if it makes the logic simpler, you can try to normalize meta in some way across all data sources, but I am not sure it's something you really want.

@tamirms
Copy link
Contributor Author

tamirms commented Oct 3, 2023

@MonsieurNicolas

This change is surprising to me: the key order in meta has never been deterministic (since 2015), so what changed in Horizon and/or clients?
If this is it to help with testing, it seems odd to take a dependency on ordering in production?

Having horizon ingestion be deterministic and reproducible makes it a lot easier for us catch bugs in release candidates. The most straightforward way to guarantee that horizon ingestion is deterministic is to ensure that the order of tx meta is deterministic. What would be the alternative approach?

Also, meta already stored basically everywhere (Horizon DB, core-DB in "legacy mode") is already non-canonical so this change may not be enough.

This change will re-sort tx meta before Horizon consumes it in ingestion. The sorting will be applied regardless if the source of the tx meta is captive core or the core-db in legacy mode.

@MonsieurNicolas
Copy link
Contributor

Having horizon ingestion be deterministic and reproducible makes it a lot easier for us catch bugs in release candidates. The most straightforward way to guarantee that horizon ingestion is deterministic is to ensure that the order of tx meta is deterministic. What would be the alternative approach?

Maybe it depends at which layer you test, which is why I was asking about what changed in Horizon: I thought you would just compare ingested state with expected state (so set comparisons) and not really worry about the format of the diff.

@tamirms
Copy link
Contributor Author

tamirms commented Oct 5, 2023

@MonsieurNicolas Horizon stores some ledger entries in its database and those are periodically compared against the ledger entries from history archive snapshots. That comparison is a set comparison like you described (where ordering doesn't matter).

But we also compare horizon effects between the release candidate and the last stable release. The ordering of meta does affect the ordering of horizon effects in some cases.

Here is an example comparing the effects response for a specific operation in horizon staging and horizon production

In staging the effects response looks like:

[
      {
        "id": "0207289144773263361-0000000001",
        "paging_token": "207289144773263361-1",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "trustline_deauthorized",
        "type_i": 24,
        "created_at": "2023-09-24T22:46:03Z",
        "trustor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4",
        "asset_type": "credit_alphanum12",
        "asset_code": "NICKEL"
      },
     ..., // other effects omitted for brevity  
      {
        "id": "0207289144773263361-0000000008",
        "paging_token": "207289144773263361-8",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000009",
        "paging_token": "207289144773263361-9",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000010",
        "paging_token": "207289144773263361-10",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "liquidity_pool_removed",
        "type_i": 94,
        "created_at": "2023-09-24T22:46:03Z",
        "liquidity_pool_id": "3e1337116f38a6ff6b4161cb4455ea4f3f2777562e52dbeffebaa0582cc7e561"
      }
    ]

In production the effects response looks like:

[
      {
        "id": "0207289144773263361-0000000001",
        "paging_token": "207289144773263361-1",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "trustline_deauthorized",
        "type_i": 24,
        "created_at": "2023-09-24T22:46:03Z",
        "trustor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4",
        "asset_type": "credit_alphanum12",
        "asset_code": "NICKEL"
      },
      {
        "id": "0207289144773263361-0000000008",
        "paging_token": "207289144773263361-8",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
     ..., // other effects omitted for brevity  
      {
        "id": "0207289144773263361-0000000009",
        "paging_token": "207289144773263361-9",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000010",
        "paging_token": "207289144773263361-10",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "liquidity_pool_removed",
        "type_i": 94,
        "created_at": "2023-09-24T22:46:03Z",
        "liquidity_pool_id": "3e1337116f38a6ff6b4161cb4455ea4f3f2777562e52dbeffebaa0582cc7e561"
      }
    ]

The responses are almost identical. The only difference is in the claimable_balance_sponsorship_created effects. In horizon staging 00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199 is first and is followed by 00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee. In horizon production the order is reversed: 00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee is first and is followed by 00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199.

After doing some digging, I found that this issue of random tx meta ordering affecting the effects response has occurred before in #3942 . If you look at the conversation in the issue, @2opremio argues that core should provide a deterministic ordering and he was met with objections from Jon Jove who was concerned about the impact of core's performance.

The way we fixed this issue is by adding sorting logic locally in the ingestion processor:

883ccb3

We could fix this new issue in the same way by adding sorting logic locally in this part of the code:

for _, change := range changes {
if err = wrapper.addLedgerEntrySponsorshipEffects(change); err != nil {
return nil, err
}
wrapper.addSignerSponsorshipEffects(change)
}

However, I believe a more robust solution would be sorting tx meta at the source either in core (the ideal solution) or from the ingestion library which reads tx meta from core (which is what this PR does).

// but if that is the case, we fall back to the original ordering of the changes
// by using a stable sorting algorithm.
func sortChanges(changes []Change) []Change {
sort.Stable(newSortableChanges(changes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since sortableChanges can mutate the underlying array in changes passed from caller as part of Swap method, should a copy of changes be made to avoid side effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intent was to mutate changes and avoid making a copy of the slice. I can make sortChanges() have a void return value to make that intent more clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that works also, thanks!

@@ -139,6 +139,7 @@ func (r *LedgerChangeReader) Read() (Change, error) {
Post: nil,
}
}
changes = sortChanges(changes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now results in all changes being (re)sorted, which is great for the goal stated, do you think it may have potential to trigger false warnings from verify-range as that compares history table dumps, at least in the interim while the production dumps are from a horizon release prior to the one having this change, maybe it's just an aspect worth mentioning in the changelog notes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it may have potential to trigger false warnings from verify-range ?

yes, that's true but we actually already have false positives from verify-range due to random ordering of tx meta from core. In fact, I discovered this issue by debugging verify-range failures on the soroban release candidate.

The aim of this PR is to fix this issue for all releases going forward. If we want to compare a release candidate with an older version then we would need to cherry-pick this commit which sorts the ingestion changes deterministically on top of the older version. Then we could run verify-range on the release candidate and compare it against the older version + cherry-picked commit.

Good call about adding this to the changelog notes. I can do that

@MonsieurNicolas
Copy link
Contributor

Thanks @tamirms for the background. I didn't realize this was needed for effects that don't have good global "IDs".
Now that I understand what this is, I think this change looks fine as Horizon may emit effects from basically anything, right?

Soroban-RPC / other ingesters may not need this though (but maybe we have a lot of spare CPU there)?

As far as doing the sort at the core layer: as we'd be looking at some pretty bad migration (to eventually remove this from Horizon), it does not seem to be worth doing at this time (we may revisit this if we do SPVs/parallel execution).

@tamirms
Copy link
Contributor Author

tamirms commented Oct 6, 2023

Now that I understand what this is, I think this change looks fine as Horizon may emit effects from basically anything, right?

Yes, horizon effects try to correlate a meta change to a specific operation. In the cases where there is more than 1 meta change of the same ledger entry type associated to an operation we can ran into ordering issues unless we perform a sort.

Soroban-RPC / other ingesters may not need this though (but maybe we have a lot of spare CPU there)?

Yes, this is true, soroban-rpc does not do anything similar to effects so it does not need tx meta to be ordered consistently

@tamirms tamirms merged commit 4ecb643 into stellar:release-horizon-v2.27.0 Oct 6, 2023
36 checks passed
tamirms added a commit to tamirms/go that referenced this pull request Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants