Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getEvents backed by DB #215

Open
wants to merge 44 commits into
base: events-db-backend
Choose a base branch
from
Open

Conversation

psheth9
Copy link
Contributor

@psheth9 psheth9 commented Jun 13, 2024

Changes:

  • Fetch events from DB
  • Parse LCM on the fly with the use of stored events metadata
  • Remove in-memory events store

Pending Changes:

  • Migration
  • Benchmarking storage

Design Doc here
Parent Epic #115

Fixes: #190

@psheth9 psheth9 mentioned this pull request Jun 13, 2024
Base automatically changed from ingest-events-in-db to events-db-backend June 18, 2024 04:42
@psheth9
Copy link
Contributor Author

psheth9 commented Jun 18, 2024

ledgerCloseMetaWithEvents in get_events_test does not seem to work during DB ingestion.

Error logs:
unknown tx hash in LedgerCloseMeta: 4705598f90aa1c2affc48ffb10f42ab1a4d9545fcb8ed0adfd6b2ae141aa1c21

But also at the same time txMetaWithEvents in transaction_test works fine for event ingestion so it has to do something with the mocked LCM object itself.

@Shaptic
Copy link
Contributor

Shaptic commented Jun 18, 2024

@psheth9 you're changing the hash of the transaction when you inject events into it after creation which makes it invalid

@psheth9
Copy link
Contributor Author

psheth9 commented Jun 20, 2024

you're changing the hash of the transaction when you inject events into it after creation which makes it invalid

This is not the case as I am not touching any fields related to hash. Also ledgerCloseMetaWithEvents is being used everywhere (in old code) to validate get_events RPC so this should work automatically with new code. Unfortunately it does not :(

@psheth9 psheth9 marked this pull request as ready for review June 20, 2024 15:24
@psheth9 psheth9 changed the title [DRAFT] Add getEvents backed by DB [WIP] Add getEvents backed by DB Jun 20, 2024
@psheth9 psheth9 requested a review from Shaptic June 20, 2024 15:37
@psheth9 psheth9 self-assigned this Jun 21, 2024
Comment on lines 192 to 205

eventMigrationName := "EventsTable"
eventFactory := newEventTableMigration(
ctx,
logger.WithField("migration", eventMigrationName),
cfg.EventLedgerRetentionWindow,
cfg.NetworkPassphrase,
)
m2, err := newGuardedDataMigration(ctx, eventMigrationName, eventFactory, db)
if err != nil {
return nil, fmt.Errorf("creating guarded transaction migration: %w", err)
}

return multiMigration{m1, m2}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating a loop with the migration above

@@ -333,6 +359,16 @@ func (h eventsRPCHandler) getEvents(request GetEventsRequest) (GetEventsResponse
limit = request.Pagination.Limit
}
}
end := events.Cursor{Ledger: request.StartLedger + ledgerbucketwindow.OneDayOfLedgers}
Copy link
Contributor

@2opremio 2opremio Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for limiting the maximum number of ledgers to query? If so it should probably be parameterized in the command-line like we do for transactions or at the very very least it should be commented and a constant should be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a day worth of ledgers can be quite a lot. We may need to rethink this based on how expensive it is to query events.

Copy link
Contributor

@2opremio 2opremio Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to limit by number of events and not ledgers

Copy link
Contributor Author

@psheth9 psheth9 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limiting by events still won't solve the issue with current filters. We need to restrict scanning of ledgers for given request.

Comment on lines 385 to 398
f := func(event xdr.DiagnosticEvent, cursor events.Cursor, ledgerCloseTimestamp int64, txHash *xdr.Hash) bool {
cursorID := cursor.String()
if _, exists := cursorSet[cursorID]; exists {
return true
}

if request.Matches(event) && cursor.Cmp(start) >= 0 {
cursorSet[cursorID] = struct{}{}
found = append(found, entry{cursor, ledgerCloseTimestamp, event, txHash})
}
return uint(len(found)) < limit
}

err = h.dbReader.GetEvents(ctx, cursorRange, contractIDs, f)
Copy link
Contributor

@2opremio 2opremio Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the efficiency of this. When events were stored in-memory it was probably fine to scan through a potentially large range of events and filter them by applying a function to all of them.

I suspect that the same approach can be prohibitive when moving the events to a database and extending the default history to 7 days.

It's good that we filter by contract ID on the database but to ensure this performs decently we may need to either:

  1. Be more restrictive (e.g. enforce providing contract IDs, with a cap on how many contracts to provide)
  2. Move more of the filtering to the DB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless I would like to see a benchmark for event querying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should never scan 7 days of history as we are restricting fetch query by this

Some context around Ledger restriction for given request here in design doc

Ledger restriction parameter (N) should be more explicit and ideally should be part of the config. so for any given get request we will only look at start: start + N ledger. Also it might be also useful to include lastProcessedLedger as a part of response so client understands what to use as startLedger in follow up query. I will add this change once we agree.

I also liked the idea of making filters more restrictive because current filters are more loose ( contract_ids, event_type, topics all are optional) but this can be a additive change along with ledger restriction part.

Copy link
Contributor

@2opremio 2opremio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets us much closer! I did leave quite a few comments. The most important ones concern the data format stored in the DB and the scanning approach.

@psheth9 psheth9 requested a review from tamirms June 25, 2024 17:13
@psheth9 psheth9 added the api-change Any changes to existing API or addition of new API label Jun 26, 2024
psheth9 and others added 12 commits June 26, 2024 16:58
* Unify transaction and event retention windows

Use a new, `unified history-retention-window` flag and mark the event/transaction window
configuration parameters as deprecated.

Until we remove the event/transaction flag, `history-retention-window` will be initialized to
the maximum of the three.

* Appease the linter
* Add LedgerRangeReader interface

* Add GetLedgerRange implementation for ledgers table

* Use GetLedgerRange from the ledgers table for getHealth and getFeeStats

* Change varible name

* Add GetLedgerRange method for transactions table

* Add test for GetLedgerRange for ledgers table

* Add GetLedgerRange in ConstantLedgerReader

* Add test for GetLedgerRange in transactions table

* Fix nil pointer bugs in getTransactions

* Remove latest ledger assertion

* Comment out the latest ledger assertion

* Remove GetLedgerRange from meta table and use ledgerRangeGetter for getHealth and getFeeStats

* Remove ledgerCloseTime

* Revert newline change

* Revert

* Remove assertions

* revert

* Change interface name

* insert txns during integration test setup - 1

* insert txns during integration test setup - 2

* insert txns during integration test setup - 3

* Fix linting errors

* Fix linting errors - 2

* Fix linting errors - 3

* Fix linting errors - 4

* Fix linting errors - 5

* Revert

* Revert-2

* change camel-case naming

* change camel-case naming - 2

* Fix linting errrors - 6

* Simplify ledger range query

* Simplify ledger range query - 2

* Simplify ledger range query - 3

* Add benchmarking for GetLedgerRange

* Fix linter issues - 6

* Remove else condition

* Optimise the GetLedgerRange query

* Fix intrange linter

* Use require.NoError

* Move comment to definition

* Fix intrange linter - 2

* Fix nomnd linter

* Fix intrange linter

* Remove db/util.go and add txMeta methods to infrastructure

* forgot to gci files again :/

* Remove migration FIXME

* Fix linter checks

* Add migration for lcm sequence index

* Revert transaction_test.go changes

* Add newline

* Add GetLedgerRange implementation in meta table

* Use new GetLedgerRange for getHealth and getFeeStats

* Add GetLedgerRange to ConstantLedgerReader

* Remove GetLedgerRange from transactions code and use the meta table one

* Remove unnecessary file changes

* Fix linting errors

* Fix linting errors - 2

* Fix linting errors - 3

* Fix linting errors - 4

* Remove unnecessary constants

* Use sq.Select instead of sq.Expr

* Add nolint

* Add nolint - 2

* Handle single row case in ledger range

* Remove index migration

* Order the ledger results in ASC order

* exchange the expected and actual values in assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Any changes to existing API or addition of new API
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

None yet

5 participants