-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace "type" index with multi-column index on "id" and "type" #69
base: main
Are you sure you want to change the base?
Conversation
It may not be used by the framework but I do tend to find querying on type useful for ad-hoc diagnostics. Is that enough to push to having multiple indexes or would you suggest we leave it up to the consumer of the gem as to how they index their DB tables in that case? |
@scottyp-env do you use the I think it would be better if the gem will create indexes that it requires and leave the rest to the user. for example, I have found it to be useful to have an index for "id" and "created_at" for debugging events but I don't expect the gem to implement that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your reasoning sounds good @daniel-niknam-envato!
@@ -38,7 +38,7 @@ def create_events(db: EventSourcery::Postgres.config.event_store_database, | |||
column :created_at, :'timestamp without time zone', null: false, default: Sequel.lit("(now() at time zone 'utc')") | |||
index [:aggregate_id, :version], unique: true | |||
index :uuid, unique: true | |||
index :type | |||
index [:id, :type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be affected by this change?
I did a quick search of the create_events
and create_event_store
methods in this repo but I only found their usage in the benchmarking scripts and test setup. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the above index will have impact on EventSourcery::Postgres::EventStore#get_next_from method performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that adding the index to each application's events table will improve the performance of the get_next_from
method.
My questions is more about what is depending on this schema file and what will be benefited from this change.
In the changelog, it is mentioned that this change doesn't affect existing projects. That made me wonder what is the outcome of this change.
If someone creates a new Event Sourcery project, will the latest schema from this file be used somehow?
CHANGELOG.md
Outdated
@@ -15,10 +16,27 @@ and this project adheres to [Semantic Versioning](http://semver.org/). | |||
- Use GitHub Actions for the CI build instead of Travis CI ([#66]). | |||
- This project now uses `main` as its default branch ([#68]). | |||
- Documentation updated to refer to `main` and links updated accordingly. | |||
- The `events` table `type` modified to include both `id` and `type` ([#69]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an item in the "Changed" section, how about splitting it into two items?
- One item in the "Added" section for the addition of the new composite index
- Another item in the "Removed" section for the removal of the
type
index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking. I thought it would be easier to follow when everything is in one place.
4d405fd
to
cc78aba
Compare
I'm investigating further, but I think the proposed index could be vastly improved by swapping the order of the columns. The PostgreSQL documentation on multi-column indexes notes:
The context I'm investigating involves a database which has the
Given that gap, something is running the following query, which utilizes an index scan:
In this context, In the next couple of days, I'm looking to verify that replacing the |
I need to verify further (to make sure it wasn't influenced by the process of generating the indexes), but initial results look promising:
|
I took a snapshot of the DB with the new index, restored it as a new DB, and re-ran the query. It was slower, but was still a significant improvement from the
For the database I'm investigating, I'll be proposing:
This new index has the following properties:
As noted by others in this PR, this is a form of stopgap measure, but it does help in scenarios where events aren't seen for significant periods of time. |
Context
With the current design, every time a new event is stored in the database, every projector/reactor will receive a signal from PostgreSQL and eventually they will query the database for new events despite the fact that the new event could be irrelevant to them. As a result, queries like this:
will be executed multiple times. Fixing this issue won't be very simple and requires changes in both event sourcery and this gem. However, we could modify the existing index on the
events.type
column and make it a multi-column index on bothevents.id
andevents.type
. Although this change will not fix the issue entirely, it will improve the event sourcery performance significantly.Adding the above index will have impact on EventSourcery::Postgres::EventStore#get_next_from method performance.
Recent investigation
In the recent investigation, we've found out that having a multi-column index for both
events.id
andevents.type
columns on the events table could improve performance significantly. We processed around 3M events on the same AWS RDS instance multiple times, here is the result:Process duration before adding the index: 264 minutes
Process duration after adding the index: 103 minutes
Questions
What needs to be done for existing apps?
We can use gemspec post-install message to notify the existing applications about the change. At the moment, the event-sourcery gem does not provide a "migration" command so it would be better to help them by providing a SQL query for modifying the index. The documentation regarding creating a new index exists in the CHANGELOG.md file and that could be enough too.
Would this change would solve all performance issues?
No. There are other parts of the system that should be fixed or improved. For example, I can think of the following ideas:
Why not just add another index?
I was not able to find a query for the
events.type
column alone so I don't see any benefit for adding a new index when the existing one is not used.