Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace "type" index with multi-column index on "id" and "type" #69

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

daniel-niknam-envato
Copy link
Contributor

@daniel-niknam-envato daniel-niknam-envato commented Oct 27, 2021

Context

With the current design, every time a new event is stored in the database, every projector/reactor will receive a signal from PostgreSQL and eventually they will query the database for new events despite the fact that the new event could be irrelevant to them. As a result, queries like this:

SELECT * FROM "events" WHERE ((id >= ?) AND ("type" IN (?))) ORDER BY "id" LIMIT ?

will be executed multiple times. Fixing this issue won't be very simple and requires changes in both event sourcery and this gem. However, we could modify the existing index on the events.type column and make it a multi-column index on both events.id and events.type. Although this change will not fix the issue entirely, it will improve the event sourcery performance significantly.

Adding the above index will have impact on EventSourcery::Postgres::EventStore#get_next_from method performance.

Recent investigation

In the recent investigation, we've found out that having a multi-column index for both events.id and events.type columns on the events table could improve performance significantly. We processed around 3M events on the same AWS RDS instance multiple times, here is the result:

Process duration before adding the index: 264 minutes
Process duration after adding the index: 103 minutes

Questions

What needs to be done for existing apps?

We can use gemspec post-install message to notify the existing applications about the change. At the moment, the event-sourcery gem does not provide a "migration" command so it would be better to help them by providing a SQL query for modifying the index. The documentation regarding creating a new index exists in the CHANGELOG.md file and that could be enough too.

Would this change would solve all performance issues?

No. There are other parts of the system that should be fixed or improved. For example, I can think of the following ideas:

  • Better polling mechanism to trigger loading events only when the newly created event is watched by the event processor (reactors/projectors).
  • Time-based polling for applications that do not require real-time event processing
  • Configurable event poller that supports native Postgres and Redis pub/sub.

Why not just add another index?

I was not able to find a query for the events.type column alone so I don't see any benefit for adding a new index when the existing one is not used.

@daniel-niknam-envato daniel-niknam-envato self-assigned this Oct 27, 2021
@scottyp-env
Copy link

It may not be used by the framework but I do tend to find querying on type useful for ad-hoc diagnostics. Is that enough to push to having multiple indexes or would you suggest we leave it up to the consumer of the gem as to how they index their DB tables in that case?

@daniel-niknam-envato
Copy link
Contributor Author

It may not be used by the framework but I do tend to find querying on type useful for ad-hoc diagnostics. Is that enough to push to having multiple indexes or would you suggest we leave it up to the consumer of the gem as to how they index their DB tables in that case?

@scottyp-env do you use the events.type column only in those queries or is it with the conjunction of other columns?

I think it would be better if the gem will create indexes that it requires and leave the rest to the user. for example, I have found it to be useful to have an index for "id" and "created_at" for debugging events but I don't expect the gem to implement that.

Copy link

@scottyp-env scottyp-env left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reasoning sounds good @daniel-niknam-envato!

@daniel-niknam-envato daniel-niknam-envato marked this pull request as ready for review October 28, 2021 15:30
CHANGELOG.md Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@@ -38,7 +38,7 @@ def create_events(db: EventSourcery::Postgres.config.event_store_database,
column :created_at, :'timestamp without time zone', null: false, default: Sequel.lit("(now() at time zone 'utc')")
index [:aggregate_id, :version], unique: true
index :uuid, unique: true
index :type
index [:id, :type]
Copy link

@Domon Domon Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be affected by this change?

I did a quick search of the create_events and create_event_store methods in this repo but I only found their usage in the benchmarking scripts and test setup. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the above index will have impact on EventSourcery::Postgres::EventStore#get_next_from method performance.

Copy link

@Domon Domon Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that adding the index to each application's events table will improve the performance of the get_next_from method.

My questions is more about what is depending on this schema file and what will be benefited from this change.

In the changelog, it is mentioned that this change doesn't affect existing projects. That made me wonder what is the outcome of this change.

If someone creates a new Event Sourcery project, will the latest schema from this file be used somehow?

CHANGELOG.md Outdated
@@ -15,10 +16,27 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Use GitHub Actions for the CI build instead of Travis CI ([#66]).
- This project now uses `main` as its default branch ([#68]).
- Documentation updated to refer to `main` and links updated accordingly.
- The `events` table `type` modified to include both `id` and `type` ([#69]).
Copy link

@Domon Domon Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an item in the "Changed" section, how about splitting it into two items?

  • One item in the "Added" section for the addition of the new composite index
  • Another item in the "Removed" section for the removal of the type index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. I thought it would be easier to follow when everything is in one place.

@liamdawson
Copy link

liamdawson commented Dec 9, 2021

I'm investigating further, but I think the proposed index could be vastly improved by swapping the order of the columns. The PostgreSQL documentation on multi-column indexes notes:

the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.

The context I'm investigating involves a database which has the ('id', 'type') index, and one event type that hasn't been "seen" for an excessively long duration:

=> SELECT * FROM projector_tracker ORDER BY last_processed_event_id ASC;
 id | name | last_processed_event_id
----+------------+-------------------------
 30 | a    |               716812009
 24 | b    |               977842924
...
 16 | c    |               988610648

Given that gap, something is running the following query, which utilizes an index scan:

=> EXPLAIN SELECT * FROM "events" WHERE ((id >= 716812010) AND ("type" IN ('a'))) ORDER BY "id" LIMIT 1000;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..20456.54 rows=1000 width=516)
   ->  Index Scan using events_id_type_index on events  (cost=0.70..11185312.84 rows=546803 width=516)
         Index Cond: ((id >= 716812010) AND ((type)::text = 'a'::text))
(3 rows)

In this context, \di+ events_id_type_index reports the events_id_type_index is 52 GB in size. Scaling that proportionally to the number of rows scanned suggests that PostgreSQL needs to scan ~27% of the index to determine which of the events match the appropriate type. On an m4.xlarge RDS instance with 2,000 provisioned IOPS, that took ~165s to return 0 rows, given a relatively cold cache.

In the next couple of days, I'm looking to verify that replacing the 'type' index with a ('type', 'id') index vastly improves this query's performance. Additionally, I believe that an index that begins with the id column isn't particularly distinct from doing a sequential scan of the table proper, unless all of the information used by the query is contained in the index.

@liamdawson
Copy link

I need to verify further (to make sure it wasn't influenced by the process of generating the indexes), but initial results look promising:

=> EXPLAIN (VERBOSE, ANALYZE, BUFFERS, TIMING) SELECT * FROM "events" WHERE ((id >= 716812010) AND ("type" IN ('a'))) ORDER BY "id" LIMIT 1000;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..3878.35 rows=1000 width=516) (actual time=3.174..3.176 rows=0 loops=1)
   Output: id, uuid, aggregate_id, type, body, version, correlation_id, causation_id, created_at
   Buffers: shared hit=3 read=5
   ->  Index Scan using events_type_id_index on public.events  (cost=0.70..2122202.62 rows=547291 width=516) (actual time=3.171..3.172 rows=0 loops=1)
         Output: id, uuid, aggregate_id, type, body, version, correlation_id, causation_id, created_at
         Index Cond: (((events.type)::text = 'a'::text) AND (events.id >= 716812010))
         Buffers: shared hit=3 read=5
 Planning time: 0.157 ms
 Execution time: 3.202 ms
(9 rows)

@liamdawson
Copy link

liamdawson commented Dec 10, 2021

I took a snapshot of the DB with the new index, restored it as a new DB, and re-ran the query. It was slower, but was still a significant improvement from the "id", "type" index version:

=> EXPLAIN (VERBOSE, ANALYZE, BUFFERS, TIMING) SELECT * FROM "events" WHERE ((id >= 716812010) AND ("type" IN ('a'))) ORDER BY "id" LIMIT 1000;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..12.58 rows=2 width=512) (actual time=83.039..83.045 rows=0 loops=1)
   Output: id, uuid, aggregate_id, type, body, version, correlation_id, causation_id, created_at
   Buffers: shared hit=3 read=5
   ->  Index Scan using events_type_id_index on public.events  (cost=0.70..12.58 rows=2 width=512) (actual time=83.035..83.037 rows=0 loops=1)
         Output: id, uuid, aggregate_id, type, body, version, correlation_id, causation_id, created_at
         Index Cond: (((events.type)::text = 'a'::text) AND (events.id >= 716812010))
         Buffers: shared hit=3 read=5
 Planning time: 1515.274 ms
 Execution time: 83.102 ms
(9 rows)

For the database I'm investigating, I'll be proposing:

  1. Create events_type_id_index

    CREATE INDEX CONCURRENTLY events_type_id_index ON events ("type", "id");
  2. Verify the index is in use for the given query

  3. Drop the events_type_index

  4. Monitor PostgreSQL index statistics, and drop the events_id_type_index unless it's still being used

This new index has the following properties:

  • It can continue to be used wherever the events_type_index was previously used with a minimal loss of efficiency, because
  • The events_type_id_index is only ~2% bigger in my tests (52GB vs. 51GB)

As noted by others in this PR, this is a form of stopgap measure, but it does help in scenarios where events aren't seen for significant periods of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants