Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

horizon: Improve performance of migrations/64_add_payment_flag_history_ops.sql #5056

Merged
merged 6 commits into from
Sep 22, 2023

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Sep 19, 2023

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

The horizon /payments endpoint returns all operations that make payments. Prior to protocol 20, a payment operation could be identified by looking solely at the operation type:

https://github.com/sreuland/go/blob/ae4b37a2ada56c57f1034bd8bb7a3c29367d4566/services/horizon/internal/db2/history/operation.go#L242-L246

In protocol 20, invoke host operations which perform transfers on the stellar asset contract are also considered payments. However, not all invoke host operations are considered to be payments (for example deploying a contract has nothing to do with payments).

To identify invoke host operation payments we decided to add a new column is_payment and an index on that column:

https://github.com/sreuland/go/blob/5aaa5c6c9117769294aaf847e6ed9a6aaffc18c4/services/horizon/internal/db2/history/operation.go#L243-L250

is_payment would default to null on all pre-existing rows and the operations processor would set the is_payment column to true for invoke host operations that performed stellar asset contract transfers.

One problem we did not anticipate with this approach is that it would take a long time to run the db migration on a full history DB. Adding the new is_payment column is very quick. However, adding the index takes a long time because Postgres actually indexes null column values (see https://dba.stackexchange.com/questions/210292/does-postgresql-index-null-values ) which means that Postgres would need to write a value to the index for every row in the history_operations table.

This PR modifies the index so that it is a partial index which ignores all null is_payment values. With this modification nothing is stored for all the pre-existing rows in history_operations. However, running the migration still takes 4-5 hours on staging because Postgres still needs to scan each row to determine what should be included in the index (see https://dba.stackexchange.com/questions/234590/add-an-index-on-a-new-column).

Why

The partial index is a significant improvement over the previous migration. The space occupied by the partial index is minimal when compared to the full index. Although the partial index still takes 4-5 hours to construct, it is still significantly faster than the full index because it avoids writing any data to the index.

Known limitations

Although we have ran longer migrations in the past, 5 hours is still a very long time for a db migration. Some Horizon partners do not have a backup cluster and this migration will incur downtime.

There is another way we could solve the problem of including stellar asset contract transfers in /payments which would not require any migrations at all:

We could create a synthetic operation type for the subset of invoke host operations which are payments. The operations processor would examine each invoke host operation and, if the operation performed stellar asset contract transfers, the operations processor would set the operation type to the synthetic constant. Otherwise, the operations processor would set the operation type to xdr.OperationTypeInvokeHostFunction. And when querying for payment operations in the db we would include the synthetic payment type in the sql filter:

func (q *OperationsQ) OnlyPayments() *OperationsQ {
	q.sql = q.sql.Where(sq.Eq{"hop.type": []xdr.OperationType{
		xdr.OperationTypeCreateAccount,
		xdr.OperationTypePayment,
		xdr.OperationTypePathPaymentStrictReceive,
		xdr.OperationTypePathPaymentStrictSend,
		xdr.OperationTypeAccountMerge,
		// StellarAssetContractTransferOperationType is not a real xdr.OperationType
		// it is a synthetic type which represents the subset of invoke host operations
		// which are payments
		StellarAssetContractTransferOperationType,
	}})
	return q
}

This solution is obviously hacky and we would need to ensure that there is no possible future collision between the StellarAssetContractTransferOperationType constant and real xdr.OperationType values. We would also need to make sure that StellarAssetContractTransferOperationType is never exposed in the operations / payments response (whenever a row has an operation type of StellarAssetContractTransferOperationType we would need to rewrite the operation type to xdr.OperationTypeInvokeHostFunction in the response). However, it does save us from having to run a long running migration.

@tamirms tamirms requested a review from a team September 19, 2023 12:25
Comment on lines 4 to 5
CREATE INDEX "index_history_operations_on_is_payment" ON history_operations (is_payment)
WHERE is_payment IS NOT NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth it to benchmark this, because typically a partial index won't even be used by the optimizer if the where conditions in the query don't exactly match those in the index. And it looks like the only place where we're querying this, we do not have an IS NOT NULL clause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this should be

CREATE INDEX "index_history_operations_on_is_payment" ON history_operations (is_payment)
    WHERE is_payment = true;

@mollykarcher
Copy link
Contributor

I would agree with you, I think 5 hours is probably unacceptable. Fine for us given blue/green setup, but a lot of partners do upgrades/migrations live.

I think we should maybe take a step back and determine if this index is actually even needed. And if it is, how much of a difference it will really make for this query's performance. If it's worthwhile, then we should revisit your secondary idea of the synthetic operation type, and probably do a larger brainstorming with the team to try and see if we can come up with any other ideas.

Comment on lines 4 to 5
CREATE INDEX "index_history_operations_on_is_payment" ON history_operations (is_payment)
WHERE is_payment = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is our migration utility okay with this? Editing an existing migration file, that is. I'm not familiar with what we're using, but typically most tools will check if the hash of the migraiton files have changed from the time they were executed, and bork at this. If that's the case, and if someone has already run this migration, they might not be able to self-recover. Unfortunately, there might not be any way around that other than potentially outlining clear remediation steps that they need to take.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the migrations library does not check the hash. If someone has already applied the previous migration we will need to ask them to migrate down and then apply this migration

@mollykarcher mollykarcher linked an issue Sep 19, 2023 that may be closed by this pull request
@tamirms tamirms force-pushed the payments-index branch 2 times, most recently from 64ec4b9 to f00ce6b Compare September 22, 2023 09:05
@tamirms
Copy link
Contributor Author

tamirms commented Sep 22, 2023

In light of the investigation documented in #5059 (comment) , I am removing index_history_operations_on_is_payment because it is not actually used in the payments query.

@tamirms
Copy link
Contributor Author

tamirms commented Sep 22, 2023

@mollykarcher please take another look, I have removed the index.

Comment on lines -4 to -8
CREATE INDEX "index_history_operations_on_is_payment" ON history_operations USING btree (is_payment);

-- +migrate Down

DROP INDEX "index_history_operations_on_is_payment";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but let's just make sure we deploy/test this to testnet staging (ie. somewhere where this migration has already been run) to double check there are no issues with it

@tamirms tamirms merged commit 5bfe563 into stellar:release-horizon-v2.27.0 Sep 22, 2023
36 checks passed
@tamirms tamirms deleted the payments-index branch September 22, 2023 14:57
tamirms added a commit to tamirms/go that referenced this pull request Dec 11, 2023
…y_ops.sql (stellar#5056)

Remove index_history_operations_on_is_payment because it is not actually used in the payments query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: fix long-running payment migration
2 participants