Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL Projections skipping events (SingleStreamStrategy) #189

Closed
fritz-gerneth opened this issue Jan 30, 2019 · 49 comments · Fixed by #221
Closed

MySQL Projections skipping events (SingleStreamStrategy) #189

fritz-gerneth opened this issue Jan 30, 2019 · 49 comments · Fixed by #221

Comments

@fritz-gerneth
Copy link
Contributor

Issue: under unknown circumstances the PdoEventStoreReadModelProjector skips applying events to registered handlers.

I have seen this issue happen a few times over the past year only. Yet since missing events in the projections causes quite a few problems I'm trying to figure out the issue since then. Creating this issue for tracking & by any chance input from others.

Setup: Mysql EventStore & Projections with SingleStreamStrategy running in their own process. DB Server and process server running on different VM hosts.

Projection options:

PdoEventStoreReadModelProjector::OPTION_PERSIST_BLOCK_SIZE => 1,
PdoEventStoreReadModelProjector::OPTION_PCNTL_DISPATCH => true,
PdoEventStoreReadModelProjector::OPTION_LOCK_TIMEOUT_MS => 20000,
PdoEventStoreReadModelProjector::OPTION_UPDATE_LOCK_THRESHOLD => 15000,

Symptoms: take a simple set of IssueCreated, IssueUpdated, IssueDeleted of any examplary Issue AR. Events occuring in the typical order of IssueCreated -> IssueUpdated (n times) -> IssueDeleted. The projection simply maintains a list of all (undeleted) issues and their latest values. Each event has a handler registered to either insert a row, update a row or delete the row respectively.
For 99.9% of the time this works perfectly fine as expected. Yet for unknown reasons very rarely some event is not handled. The effect on the projection differes depending on the event (e.g. IssueUpdate -> update lost) but is particular bad for IssueCreated as rows are missing at all (subsequent events might rely on the presence).
Resetting the projection solves this issue and all events are applied as expected. But this is a pretty unpractical to do in a production environment when this takes days each time.

Debugging:
To help me make this problem visible in the first place I slightly modified the [PdoEventStoreReadModelProjector](https://github.com/prooph/pdo-event-store/blob/master/src/Projection/PdoEventStoreReadModelProjector.php#L575) to simply log a line about which event it is now dispatching:

foreach ($events as $key => $event) {
	$this->logger->info(sprintf(
		'Dispatching event %s[%s]',
		$streamName,
		$key
	));

	if ($this->triggerPcntlSignalDispatch) {
		\pcntl_signal_dispatch();
	}

In normal operations I get sequential messages in this format ( gcl_collection is my stream's name):

2019-01-29T12:51:11+00:00 INFO (6): Dispatching event gcl_collection[1]
2019-01-29T12:51:12+00:00 INFO (6): Dispatching event gcl_collection[2]
2019-01-29T12:51:13+00:00 INFO (6): Dispatching event gcl_collection[3]
2019-01-29T12:51:14+00:00 INFO (6): Dispatching event gcl_collection[4]

Now there obvisouly might be gaps in the event no due to ConcurrencyExceptions when inserting many events in parallel. But this still gives me an idea if an event has been handled at all. Any no in the stream should also show up in my log as being dispatched at least once.

Having this deployed and running for a few weeks now I finally coud capture this event again (on our low-volumne testing environment this time). My stream reported a created event which had been skipped in the projection:

2019-01-29T12:51:11+00:00 INFO (6): Dispatching event gcl_collection[111]
2019-01-29T12:51:31+00:00 INFO (6): Dispatching event gcl_collection[112]
2019-01-29T12:53:11+00:00 INFO (6): Dispatching event gcl_collection[113]
// Missing event 114 here
2019-01-30T11:58:12+00:00 INFO (6): Dispatching event gcl_collection[115]
2019-01-30T11:58:12+00:00 INFO (6): Dispatching event gcl_collection[116]
2019-01-30T11:58:12+00:00 INFO (6): Dispatching event gcl_collection[117]
2019-01-30T11:58:29+00:00 INFO (6): Dispatching event gcl_collection[118]

Looking at the event-stream itself this event is clearly here though:

|| *event_id*                           || *created_at*               || *aggregate_id*                               || *aggregate_version*  ||
|| 72e6eebc-723c-4a7e-a5e7-ced418c3d08d || 2019-01-29 12:51:10.970799 || GCL:COL-e5baa4c0-23c3-11e9-a495-e7066c3b34e9 || 27                   ||
|| d02d3f6b-6fc5-4592-b88f-1a8e735b2ec4 || 2019-01-29 12:51:31.127142 || GCL:COL-e5baa4c0-23c3-11e9-a495-e7066c3b34e9 || 28                   ||
|| fd2c0841-3198-4d60-8204-ce9e44d489eb || 2019-01-29 12:53:11.159632 || GCL:COL-e5baa4c0-23c3-11e9-a495-e7066c3b34e9 || 29                   ||
|| 5dd81617-2cd7-4539-ac65-5eb9609481da || 2019-01-30 11:58:12.775746 || GCL:COL-50aa81c1-2486-11e9-a9f8-e1ff3db07a2f || 1                    ||
|| df5f4332-9537-4c2e-85be-e5e28519b763 || 2019-01-30 11:58:12.801373 || GCL:COL-50aa81c1-2486-11e9-a9f8-e1ff3db07a2f || 2                    ||
|| e415335f-6889-4652-8d86-d89e3f45166b || 2019-01-30 11:58:12.801517 || GCL:COL-50aa81c1-2486-11e9-a9f8-e1ff3db07a2f || 3                    ||
|| 686f835a-de20-48e8-8dfd-792e7614dcac || 2019-01-30 11:58:12.811411 || GCL:COL-50aa81c1-2486-11e9-a9f8-e1ff3db07a2f || 4                    ||

This is not limited to specific aggregate roots but can happen for any event, unrelated to the aggregate version or such. I have not seen this to happen during replays.

This only seems to happen when I insert events at a very high rate. In fact I think this has only happened so far if a single process rapdily creates many events (not necessarily on the same AR though). My current working theory is that while ordered in that way rows become visible to selects in a different order (e.g. the projection selects and gets event 115 while 114 becomes visible to the select a millisecond later or so. But then I;m not that deep into MySQL internals if this is posibble.

Any pointers on how to continue investigating this issue are welcome.

@codeliner
Copy link
Member

that's a serious problem ...

We had a similar issue with our mongoDB v6 event-store. Same symptoms:

  • only seems to happen when I insert events at a very high rate
  • while ordered in that way rows become visible to selects in a different order

Some background. We have an event sourced service that processes SAP changesets (IDOC XML files). A Java Middleware sits between our service and the SAP ERP. When there is a mass update in SAP (during data migration for example) the Java Middleware takes all updates and forwards them to us at once. This results in thousands of events within a couple of seconds. Under that load we experienced the same issue.

Back in the days we used MongoDB 3.6. No auto increment fields and no transactions available. Our custom made projections used the aggregate version + timestamp to remember stream positions until we saw that events are being skipped if the write model is on fire.

Replays worked just fine. Which made the problem hard to debug.

We did exactly the same to investigate the issue: Log processing of events in the projections and compare the log with the event streams. We identified the problem:

Events with higher versions became slightly faster visible for the projection.

To be honest: We also discussed (me and @sandrokeil ) if this could happen with prooph v7 projections, too. But since they rely on transactions and a stream sequence we thought the problem is only related to our custom implementation for MongoDB .... Looks like we were wrong :(

How did we solve the Problem?

One option was to let projections detect "gaps" and if a gap is detected wait a moment and reread the stream. That's not a 100% guarantee but could work if the sleep time is high enough.

Finally we went for a 100% save solution. It looks like this:
We have an Event-Metadata-Enricher that adds a "projection_xyz_handled" flag to each event metadata with an initial value of false. When a projection has processed the event it sets the flag to true.
When we want to replay, we simple reset the flags.

First we thought that this will produce a lot of load on our MongoDB-Cluster but Mongo can handle this just fine. Only real bottleneck for us is when we want to deploy a new projection. In this case we need to add a new flag to each existing event in the stream (several millions) and mongo needs to update the index. But we don't deploy new projections that often ....

Our final solution can't be implemented in prooph v7 projections, but maybe the gap detection is an acceptable workaround. It could be activated with an additional projection option.

I'm 99% sure that your working theory is correct @fritz-gerneth

My current working theory is that while ordered in that way rows become visible to selects in a different order (e.g. the projection selects and gets event 115 while 114 becomes visible to the select a millisecond later or so. But then I;m not that deep into MySQL internals if this is posibble.

@prolic What do you think?

Not sure if this is a MySql issue or could happen with Postgres, too. I don't want to bet against it ....
Does AxonFramework support projections on RDBMS backends? /cc @mrook
If so, how do they avoid the problem? A projection could read the WAL of the database directly. Kafka Connect does that for example.

@mrook
Copy link

mrook commented Jan 30, 2019

@fritz-gerneth Axon supports any projection, as long as you can write the event handler for it. With the default SequencingPolicy (which is SequentialPerAggregatePolicy), events are guaranteed to arrive at the projection event handler(s) in the order they were published on the event bus - per aggregate.

@prolic
Copy link
Member

prolic commented Jan 31, 2019

I read a little about auto increments in MySQL. The problem is, that the auto increment is done on write, not on commit.

F.e.
Start two transactions in two terminals.
Insert into ... (First in terminal one, then terminal two)
Commit in terminal two.

Select * from ...

You see a gap.

Commit terminal one.

Select * from ...

Gap is closed.

Probably the same is true for postgres as well. So in case there is a gap, we need to detect, back off and read again. Really painful, but I don't see a better solution.

@prolic
Copy link
Member

prolic commented Jan 31, 2019

@codeliner your approach with using metadata for this should also work, but I prefer to keep the event immutable.

@fritz-gerneth
Copy link
Contributor Author

fritz-gerneth commented Jan 31, 2019

Appreciate all your input! :)

@mrook : from what I understand this solves a similiar but different problem: making sure all selected events arrive in sequence (which they do for projections here). The issue in here is more about making sure we don't skip events when querying the event bus. I suppose this guarantee is implemented quite differently in Axon. But then my knowledge about Axon is old and very basic :)

@prolic : Do you have any resource detailing this behavior (for personal education :))?

I managed to reproduce this scenario in a slightly different setup as well: a AR has to append 5 events in one call where the first event has a significantly larger payload then the others (e.g. large complex JSON object vs two small string keys). All events were published in the same appendTo call. But again this is going into MySQL internals I have no experience with :)

Modifying events is something I neither want or can do. We are resetting / changing our projections quite rapidly, often for testing purposes only and many streams are used by many projections concurrently.
Since we cannot avoid gaps in the first place I agree some gap detection / handling is pretty much the only way arround.
I think the best way would be to extract loading of streams into a strategy.

foreach ($this->streamPositions as $streamName => $position) {
	try {
		$streamEvents = $this->eventLoadStrategy->load($streamName, $position, $this->metadataMatcher);
	} catch (Exception\StreamNotFound $e) {
		// ignore
		continue;
	} 

   // .....
}

The default one doing exactly the same as now.

class DefaultEventLoadStrategy 
{
    private $eventStore;

    public function load(string $streamName, $position, $metadataMatcher)
    {
        return $this->eventStore->load(new StreamName($streamName), $position + 1, null, $metadataMatcher);
    }
}

We then could add a second strategy supporting GAP detection and some (configureable) delay to retry. If the retry fails we assume a gap.

class GapDetectingEventLoadStrategy
{
    private $eventStore;
    
    private $sleepTime;
    
    public function load(string $streamName, $position, $metadataMatcher)
    {
        $events = $this->loadEvents($streamName, $position, $metadataMatcher);
        $nextEventNumber = $events->key();
        
        if ($nextEventNumber === $position + 1) {
            return $events;
        }
        
        sleep($this->sleepTime);
        
        return $this->loadEvents($streamName, $position, $metadataMatcher);
    }
    
    private function loadEvents(string $streamName, $position, $metadataMatcher)
    {
        return $this->eventStore->load(new StreamName($streamName), $position + 1, null, $metadataMatcher);
    }
}

That way I'd be backwards compatible and you can configure performance vs safety you won't miss an event. Am I thinking too naive here or missing any other vital parts or better approaches?

A more definitive approach might be to look at aggregate versions. But this has its own downsides as well:

  • Events are not necessarily aggregate events
  • Still possible to miss the first event of an aggregate
  • Need to keep track of the version of each Aggregate (like to avoid this as we have hundrets of thousands of ARs in some streams, updating this JSON object every time puts quite a load on the DB server)

@prolic
Copy link
Member

prolic commented Jan 31, 2019

@fritz-gerneth researching for this problem I found this article https://www.matthewbilyeu.com/blog/auto-incremented-values-can-appear-out-of-order-if-inserted-in-transactions/ - I hope it helps.

About adding an eventLoadStrategy: I think we should not do that, as it further complicates the usage of the event-store (we have already lots of configurable options, persistence strategies, ...). It's better to have a useful default value for sleep timeout and give a hint in the docs on why it was added and what impact you have changing the default.

Another possible solution is a complete different way of inserting rows, namely without any autoincrement. But I didn't work this out yet, so I don't know if this is possible.

@mrook
Copy link

mrook commented Jan 31, 2019

@fritz-gerneth I see, and yes it's implemented differently. Projections (event handlers) in Axon typically do not directly query the event bus (they can, but there's really no point), but instead are fed events through a processor, which either follows the live stream or replays from a certain position.

@sandrokeil
Copy link
Member

Another possible solution is a complete different way of inserting rows, namely without any autoincrement. But I didn't work this out yet, so I don't know if this is possible.

I would really appreciate that. We only need the autoincrement value to sort the events and read them forward and backward. If we use a stream/watch mechanism like in MongoDB or Postgres, events occur in the order they are saved to DB. As we can see, the autoincrement value is not really the order of events. I have discussed this a little with @codeliner and we have no really solution for the sort thing. Microseconds or Nanoseconds are not enough.

The question is, how can we preserve the order of events when they are saved at the same time and how is it sortable ascending and descending?

@prolic
Copy link
Member

prolic commented Jan 31, 2019

EventStore (http://eventstore.org/) also has incrementing event positions, starting from 0. So all its events are positioned as 0, 1, 2, 3, 4, ... and there are no gaps, never ever.

I looked into this on how they solved this on their side (although they don't use SQL) and their solution (I don't know if that's the only reason why they do this, and I suppose that it's not the main reason) is to use a single writer thread only. All write operations will get sent to that writer queue so there is no possibility of any race conditions.

As we don't have a server implementation here but merely a thin layer on top of a PDO connection, the closest we can do to get to the same behaviour is creating a lock for each write, so only one process can ever write and all others have to wait until it's finished. This will definitely reduce the write performance, but maybe that's even acceptable and definitely worth a try for the prooph/event-store v8 sql implementation.

To reduce the time the lock is acquired, maybe the best way to do this is with a stored procedure, but this would be a BC break in v7.

@fritz-gerneth
Copy link
Contributor Author

@mrook : I think Axon's processors are then pretty much the equivalent to projections here then (the TrackingEventProcessor to be precise). Think the GapAwareTrackingToken [source] is your answer to this issue :)

Implementation of a TrackingToken that uses the global insertion sequence number of the event to determine tracking order and additionally stores a set of possible gaps that have been detected while tracking the event store.

By storing the sequence numbers of gaps, i.e. sequence numbers of events that may have been inserted but have not been committed to the store, consumers are able to track the event store uninterruptedly even when there are gaps in the sequence numbers of events. If a gap is detected the event store can check if meanwhile this gap has been filled each time a new batch of events is fetched.

Digging through the source code on GitHub my understanding on how Axon does this (with this token) is a more aggressive at-least-once approach:

  • A token does not only represent a single position but a set of possible gaps in addition
  • When querying for new events not only the next position but all gaps are queried for as well
  • Gaps that have fallen too far behind the index are assumed to be actual gaps at some point and will be actually skipped at some point

Doing similiar on our end certainly would work as well but require a bit of a change in the event store api to allow multiple range-queries.

As a side-note: dismissing my former approach. Giving it a second thought made it clear that this does not reliably solve the issue but only solve this for gaps starting at the start of my iteration but not in between.

@fritz-gerneth
Copy link
Contributor Author

As we don't have a server implementation here but merely a thin layer on top of a PDO connection, the closest we can do to get to the same behaviour is creating a lock for each write, so only one process can ever write and all others have to wait until it's finished. This will definitely reduce the write performance, but maybe that's even acceptable and definitely worth a try for the prooph/event-store v8 sql implementation.

Performance impact of doing so would greatly depend on the stream strategy. Per aggregate should not be any major impact since Aggregates already kind of enforce this through event versions. But for for the others this would be a table lock affecting writes to either all aggregates of the same type or all aggregates on the system.
Even not allowing concurrent writes to the different ARs of the same type already has severe impact in scenarios with high write-volumes - which is exactly the scenario this would apply to to solve this issue here. Not saying this is a particularly bad or something though, just that the impact might be huge.
Would there be some BC way using GET_LOCK to enforce this on a per-stream basis?

Besides this I'd follow up on the gap-tracking stream positions. This might require a bit of work to actually track & store the gaps alongside the current position but would mitigate this issue too (the longer we allow gaps to be tracked the more safely it becomes). There're two major challenges I'd see for this though:

  • Current event-stream interface only allows to load a single range of events. When querying with gaps we'd have at least two ranges: for the current position + for any gaps we still track. This could be handled with multiple calls to load for each range too. But doing this in a single go certainly would be more performant.
  • For each stream we'd have to store the gaps along the current position, requiring a new format to store stream positions. Think this could be opt-in behavior with a big warning to 'do not change this for running projections' or a on-the-fly migration mechanism.

(To be clear: I'd be willing to implement both changes, for me that's merely a matter on if this is fixed in here or with a custom projection in our code-base)

@prolic
Copy link
Member

prolic commented Feb 1, 2019

@fritz-gerneth if you have high write throughput right now, maybe you can test using "GET_LOCK" for a few hours in production and see if the impact is really that huge? Maybe it's still acceptable. The main problem that I see is the additional roundtrip which leads to higher locking times. That's why I think having stored procedures in place might be a better solution then sending multiple queries from PHP.

@fritz-gerneth
Copy link
Contributor Author

Before testing this in production I wanted to see how this behaves and if it can actually solve the problem. I created a simple set of scripts to illustrate & test this issue:

  • A generator script that inserts a fixed amount of events into a table, optioinally with using GET_LOCK
  • A wrapper to invoke n instances of the generator in parallel (simulate n parllel producers
  • A consumer counting gaps while selecting the latest set of events (e.g. like our projections)

The generator can be invoked like this:

./event-gen-bulk.sh 2 50000 1

Arguments are:

  • Number of producers to start
  • Number of events to insert (each producer)
  • 0/1 to use GET_LOCK or not

Be sure to invoke the consumer before starting the generators though

 php event-consumer.php

Interesstingly enough, this issue seems to decline as the amount of parallel producers increases (no locking):

Producers | Events Per Producer | Gaps  | Events per gap
-------------------------------------------------------
        2 | 50005               |  350  | 29
        4 | 50005               | 1916  | 104
        6 | 50005               | 1153  | 260
        8 | 50005               |   23  | 17393
       10 | 50005               |    2  | 250025

So my next test was wrapping the insert in a GET_LOCK("test-ar", -1);:

Producers | Events Per Producer | Events/sec / producer | Events/sec total | Lock 
---------------------------------------------------------------------------------
        2 | 50005               | 3300                  | 6600             | no
        2 | 50005               | 1620                  | 3240 (-49%)      | yes
        4 | 50005               | 1541                  | 6164             | no
        4 | 50005               |  707                  | 2828 (-45%)      | yes
        6 | 50005               | 1230                  | 7380             | no
        6 | 50005               |  409                  | 2454 (-33%)      | yes
        8 | 50005               |  962                  | 7696             | no
        8 | 50005               |  279                  | 2232 (-29%)      | yes
       10 | 50005               |  830                  | 8300             | no
       10 | 50005               |  250                  | 2500 (-30%)      | yes

Now those values are to be taken with a (big) grain of salt as numbers have a great variance but they give at least some indication. The good news is: this does take care of any gaps in the consumer.

Two things to note:

  • the GET_LOCK and RELEASE_LOCK statements need to be outside of the transaction. When inside the transaction I did not see any effect / see gaps on the consumer.
  • This does not include network latency. Scripts & DB were on the same server

Pushing such a number of events on a single table concurrently is certainly not daily operations business so performance impact in any daily situation is proably (much) less if measurable at all.
For any situation where this might happen it pushes resource consumption from the consumer (having to deal with gaps) and database (reduced select load) to the producer (having to wait until the lock is available). Again not sure if this is really a bad thing or just something to be aware of.
Ultimateively I think this is something to preference. Using locks introduces a potential bottleneck for writes. How much of a bottleneck this would be in worst-case is hard to say as I don't have a append-only optimize database instance running on some decicated hardware. But then if I would have and were to require that kind of performance I certainly could roll with a custom implementation and disable locking the event store for inserts.

Summary: using GET_LOCK would eliminate the problem of gaps, pushing the responsbility on the write-side. As with any single write approach, performance is worse than concurrent writes but probably neglegtable for usual use-cases. That'd then be an option for the event-store maybe?

@prolic
Copy link
Member

prolic commented Feb 1, 2019

As I thought already. Can you try the same with a stored procedure? So a) getlock, b) start transaction, c) write data, d) commit transaction and e) release lock is only one db roundtrip?

@fritz-gerneth
Copy link
Contributor Author

Since the amount of rows we want to insert in one transaction varies I'm not entirely sure what would be the best way for stored procedures.. the only solutions I can find for this rely in passing in the data as a string and do some string-spliting afterwards.. which is a solution I hardy would want to start with even. but then i'm no expert with stored procedures and maybe there's a better solution?

@fritz-gerneth
Copy link
Contributor Author

fritz-gerneth commented Feb 2, 2019

Using this stored procedure I did re-run the test-sequence again:

DROP PROCEDURE IF EXISTS `insert_event_streams`;

DELIMITER $$

CREATE PROCEDURE insert_event_streams (
    IN eventstream VARCHAR(255),
    IN eventrows JSON
)
BEGIN

    DECLARE lock_name VARCHAR(260) DEFAULT CONCAT('_lock_', eventstream);
    DECLARE lock_status INT; -- to keep query buffer clean

    DECLARE event_index INT DEFAULT 0;
    DECLARE event_data JSON;

    DECLARE exit handler for sqlexception
    BEGIN

      ROLLBACK;
      SELECT RELEASE_LOCK(lock_name) INTO lock_status;

    END;

    -- Prepared statements do not support declared local variables but only constants and session variables
    -- https://bugs.mysql.com/bug.php?id=13572
    SET @insert_sql = CONCAT('INSERT INTO `', eventstream, '` (`event_id`, `event_name`, `payload`, `metadata`, `created_at`) VALUES (?, ?, ?, ?, ?);');

    PREPARE insert_stmt FROM @insert_sql;

    SELECT GET_LOCK(lock_name, -1) INTO lock_status;
    START TRANSACTION;

    WHILE event_index < JSON_LENGTH(eventrows)
    DO

        SET event_data = JSON_EXTRACT(eventrows, CONCAT('$[', event_index, ']'));

        SET @event_id = JSON_UNQUOTE(JSON_EXTRACT(event_data, '$."event_id"'));
        SET @event_name = JSON_UNQUOTE(JSON_EXTRACT(event_data, '$.event_name'));
        SET @event_payload = JSON_EXTRACT(event_data, '$.payload');
        SET @event_metadata = JSON_EXTRACT(event_data, '$.metadata');
        SET @event_date = JSON_UNQUOTE(JSON_EXTRACT(event_data, '$.created_at'));

        EXECUTE insert_stmt USING @event_id, @event_name, @event_payload, @event_metadata, @event_date;

        SET event_index = event_index + 1;

    END WHILE;

    COMMIT;
    SELECT RELEASE_LOCK(lock_name) INTO lock_status;

    DEALLOCATE PREPARE insert_stmt;

END$$

DELIMITER ;

Results are all within the margin of variation, no real benefit visible here. I'll re-run those tests on a remote server again tomorrow to see how much network latency affects this.

@fritz-gerneth
Copy link
Contributor Author

Apparently it was too late yesterday, figured I did run the wrong scripts to test the stored procedure :) Will re-run these on our staging servers once we have the permissions set there to do so.
Meanwhile I ran the two other script versions there already (no locking, locking) on a remote database server. Due to timings only for the two and 10 consumers:

Producers | Events Per Producer | Events/sec / producer | Events/sec total | Lock 
---------------------------------------------------------------------------------
        2 | 50005               |  224                  | 448              | remote, no lock
        2 | 50005               |   95                  | 190              | remote, lock
       10 | 50005               |  217                  | 2170             | remote, no lock
       10 | 50005               | 42                    | 420              | remote, lock 

Looking at where query-time is spent most for no-locking :
10concurrent-no-lock

And with locking:
10concurrent-lock

Seeing most time is spent waiting for the lock, I would not expect much difference with using a stored procedure.

@prolic
Copy link
Member

prolic commented Feb 3, 2019 via email

@fritz-gerneth
Copy link
Contributor Author

At last, here's the data for using the the stored procedure:

Producers | Events Per Producer | Events/sec / producer | Events/sec total | Lock 
---------------------------------------------------------------------------------
        2 | 50005               | 3300                  | 6600             | local, no lock
        2 | 50005               | 1620                  | 3240 (49%)       | local, lock
        2 | 50005               |  224                  | 448              | remote, no lock
        2 | 50005               |   95                  | 190              | remote, lock
        4 | 50005               | 1541                  | 6164             | local, no lock
        4 | 50005               |  707                  | 2828 (45%)       | local, lock
        6 | 50005               | 1230                  | 7380             | local, no lock
        6 | 50005               |  409                  | 2454 (33%)       | local, lock
        8 | 50005               |  962                  | 7696             | local, no lock
        8 | 50005               |  279                  | 2232 (29%)       | local, lock
       10 | 50005               |  830                  | 8300             | local, no lock
       10 | 50005               |  250                  | 2500 (30%)       | local, lock
       10 | 50005               |  217                  | 2170             | remote, no lock
       10 | 50005               |  42                   | 420              | remote, lock
       10 | 50005               | 53                    | 530              | remote, stored procedure

grafik

@prolic
Copy link
Member

prolic commented Feb 3, 2019

@fritz-gerneth @codeliner My thoughts based on the investigation by @fritz-gerneth (Thanks again for this, awesome job!):

  • When I read this correctly, then stored procedures are not needed, as the performance difference is very small -> That's really great because this really simplifies a lot.
  • Depending on the amount of producers, the write performance drops about 30%-50%, but is still in an acceptable range. -> It would be interesting to see how PostgreSQL performs here in comparison. However writing a couple of thousand events per second is still super fast.
  • For the SQL implementation of event-store v8 I'm also planning to use GET_LOCK, because I need a single writer only approach to work the internals out.

That said, I would go with the GET_LOCK approach. However I would like to know if

  • PostgreSQL needs the same patch
  • Possible performance impact on PostgreSQL

Note: In PostgreSQL there is no GET_LOCK, you have to use pg_advisory_lock(key bigint) and pg_advisory_unlock(key bigint). You can also use pg_advisory_xact_lock within a transaction, the lock will be automatically released once you commit that transaction.

@fritz-gerneth
Copy link
Contributor Author

If there are no objections, I'll write up a PR tomorrow to:

  • Add this feature to the appendTo method of the event stores for MySQL & MariaDB
  • Pessemistic locking disabled by default, can be enabled via constructor argument
  • Add some documentation on the readme on when to enable this
  • Lock names will be something like $streamName_writeLock

I'd skip Postgres for now, mostly because I have little experience there and we'd have to come up with a lock-name strategy. I think the issue is documented here and in the readme. Should someone be able to reproduce this issue on Postgres, the solution is here :)

@prolic
Copy link
Member

prolic commented Feb 6, 2019

@fritz-gerneth sounds good to me, thank you very much! I'l try to look into postgres this weekend, I hope I find the time.

fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
fritz-gerneth added a commit to funct/pdo-event-store that referenced this issue Feb 7, 2019
@fritz-gerneth
Copy link
Contributor Author

fritz-gerneth commented Feb 17, 2019

SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;

How doe this behave with respect to multiple concurrent sessios though?

unfortunately, still getting gaps... just fewer

gaps are to be expected (e.g. on concurrency-exceptions). the issue (hopefully?) to be solved is that events are read out of order. I have outlined the approach for actual gap detection or tracking on the consumer side.

update: it could be the gap detection consumer. I'm getting sparse gaps even with GET_LOCKS enabled. Perhaps the inserting is too fast for the reading and there are false gaps? Going to increase the LIMIT in the event consumer to 100

what's your setup on this? what exactly are you seeing here?

@markkimsal
Copy link

markkimsal commented Feb 18, 2019

It seems that, even though isolation level appears to block other transactions in a multi-terminal setup, it doesn't affect the view of the data from a 3rd party i.e. the event-consumer.php script. I still get gaps. I also get gaps with SELECT GET_LOCK. The only thing that stops pkey gaps is doing LOCK TABLES _gcl_... WRITE and UNLOCK TABLES inside the transaction.

I'm testing with php 7.1 on mariadb 10.3 (fork of mysql 5.5.5) and mysql 5.7

I tried setting SERIALIZABLE in the my.cnf and also flusing innodb logs after every commit. Both had no effect.

transaction-isolation = SERIALIZABLE
innodb_flush_log_at_trx_commit = 1

It seems to be that, when there is a gap found by event-consumer.php, that gap is not visible to either event-generator.php script; within their own transactions, they see a consistent, non-gap view of the data.

@fritz-gerneth
Copy link
Contributor Author

fritz-gerneth commented Feb 18, 2019

It seems to be that, when there is a gap found by event-consumer.php, that gap is not visible to either event-generator.php script; within their own transactions, they see a consistent, non-gap view of the data.

Yes. that's the whole issue here. Even when looking at the database later on the gaps are not there yet.

I'm testing with php 7.1 on mariadb 10.3 (fork of mysql 5.5.5)

I see you're using my test scripts too. Those are not compatible for MariaDB out of the box. I'm using a timeout of -1 which MariaDB does not support but throws an sql error. Took me a day to figure this out :( See this thread in my PR. For testing you can just set it to a large value instead. As this fails silently you're effecifely testing without locks on MariaDB.

@markkimsal
Copy link

well, if the problem is in the consumer the producers don't need any changes. With the addition of serializable isolation level we can be certain that TXA commits and finishes before TXB, but the consumer only sees TXB in the results. Even changing settings like innodb_flush_log_at_trx_commit=1 doesn't seem to ensure that the reader and read concurrent inserts in chronological order.

Adding LOCK IN SHARE MODE or FOR UPDATE to the end of the select statement in the event-consumer seems to ensure the consumer can see a consistent state of the data. There's no need for locks or transactions in the producer/generator side. With autocommit=1 this implicit transaction ends as soon as the select ends, but it seems to make sure all the index locks and gap locks are flushed.

Can anyone verify that select ... for update eliminates pkey gaps?

@fritz-gerneth
Copy link
Contributor Author

Adding LOCK IN SHARE MODE or FOR UPDATE to the end of the select statement in the event-consumer seems to ensure the consumer can see a consistent state of the data.

Why should this ensure the correct sequential order of reads when reading? This only prevents issues with concurrent writes on the same row. But that's not happening here anyway.
Besides this would make reading a stream single consumer only.

well, if the problem is in the consumer the producers don't need any changes

which problem are you trying to solve? out of order reading of events (and thus skipping) or event number gaps? those are different things and require different solutions. this issue here only deals with the first one.
with the current GET_LOCK we are effectively doing a table lock, so there is no way parallel producers can write to the stream at the same time, e.g. there is always only one active transaction inserting.

do you have any sample data / output of what you are seeing on your side? why do you think it is is the consumer that is having troubles? are you still seeing the error with the -1 thing fixed?

@Mattin
Copy link

Mattin commented Aug 28, 2019

Btw this is still an issue. GET_LOCK is not doing anything to this, projections from pg event-store are still skipping from 5-10% of events, which is huge. I tried transaction isolation too but results are the same. Anybody with some solution to this?

@codeliner
Copy link
Member

@Mattin Do you have some more details about your setup? 5-10% of events is really huge. We did not see such massive skips before. For example how many events per second do you have?

Did you try to run the test scripts to verify your issue? #189 (comment)

@fritz-gerneth
Copy link
Contributor Author

Providing an update on this from my side - I have seen events being skipped very rarely as well (but still orders of magnitudes less likely then before). Unfortunately I have not had yet the time to look into this in more detail but only have a working theory:

I only observed this in situations when I do persist multiple events on the same AR. In that case a projection sometimes skips one of these for yet unknown reasons. Event numbers are all in sequence, so this should be fine. Replaying the projection fixes the issue. So I assume this is some nasty in-transition concurrency stuff in MySQL. Not an MySQL expert, don't know one either who's into this on this level. Current plan of mitigation would be to add some sliding window event number gap-detection mechanism to the projections.

The alternative might be some bug in the projections fetching incorrect stream positions in some situations. I'll have to add more logging to the projections to really look into this further.

@prolic
Copy link
Member

prolic commented Sep 10, 2019 via email

@prolic
Copy link
Member

prolic commented Sep 10, 2019

Can anyone try transaction isolation level SERIALIZABLE?

@prolic prolic reopened this Sep 10, 2019
@markkimsal
Copy link

I tried all these transaction levels:

//$connection->exec('set autocommit=0');
//$connection->exec('set session transaction isolation level READ COMMITTED');
//$connection->exec('set session transaction isolation level READ UNCOMMITTED');
//$connection->exec('set session transaction isolation level SERIALIZABLE');
//$connection->exec('set autocommit=1');

The only thing that fixes it for me with the supplied demo scripts (event-gen-bulk.sh etc) is adding ... FOR UPDATE to the select in the consumer. This ensures consistent reads as mysql is always using the MVCC and processing inserts concurrently.

Transactions on the writer don't do anything because it is only ensuring consistency for that one write operation which is only appending rows. The reader seems to be encountering a block of 20 rows, but 10 of them from AR1 are still waiting to finish writing the key index, while 10 rows from AR2 are flushed first (because MVCC has to pick one thing to finish first when getting concurrent writes). The rows are there (this is why replaying the projection works, the pkeys are all in order), but are not available to selects that allow dirty reads. (or phantom reads, not sure which)

bash ./event-gen-bulk.sh 2 2000 &
php ./event-consumer.php    # with ... limit 20 for update
[ar1] Inserted 2005 events in 51.669422149658 seconds (38.804382100357 events/second)
[ar0] Inserted 2005 events in 52.137250185013 seconds (38.45619001549 events/second)
^CDetected 0 gaps in 4010 events (processed - events / gap)[1]+  Done 
bash ./event-gen-bulk.sh 2 2000 &
php ./event-consumer.php   # with only limit 20
Gap detected at 350, got 356 instead
Gap detected at 445, got 451 instead
Gap detected at 620, got 626 instead
Gap detected at 700, got 706 instead
Gap detected at 1410, got 1416 instead
Gap detected at 2050, got 2056 instead
Gap detected at 2540, got 2546 instead
Gap detected at 2745, got 2751 instead
Gap detected at 3020, got 3026 instead
Gap detected at 3525, got 3531 instead
Gap detected at 3595, got 3601 instead
Gap detected at 3620, got 3626 instead
Gap detected at 3645, got 3651 instead
Gap detected at 3695, got 3701 instead
[ar0] Inserted 2005 events in 53.846303939819 seconds (37.235610493171 events/second)
[ar1] Inserted 2005 events in 54.046936035156 seconds (37.097385107933 events/second)
^C^CDetected 14 gaps in 4010 events (processed 286.42857142857 events / gap)[1]+  Done

I also tried ... lock in share mode but I got one gap in about 10 test runs. (lock in share mode is now just for share)

A consistent read is a non-locking, non-blocking read that takes a snapshot of the DB and doesn't see changes happening after the select's point in time.
https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database.

A locking read is triggered with FOR UPDATE or FOR SHARE

With FOR SHARE, a locking read occurs instead: A SELECT blocks until the transaction containing the freshest rows ends (see Section 15.7.2.4, “Locking Reads”).

The types of locks differ with each isolation level, but even the most basic REPEATABLE READ default isolation level will work:
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read

This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other. See Section 15.7.2.3, “Consistent Nonlocking Reads”.

For locking reads (SELECT with FOR UPDATE or FOR SHARE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition.
...
For other search conditions (range-type conditions), InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range. For information about gap locks and next-key locks, see Section 15.7.1, “InnoDB Locking”.

@fritz-gerneth
Copy link
Contributor Author

bash ./event-gen-bulk.sh 2 2000 &

I think you are starting the generators without the locking strategy. The locking is controlled by a third parameter. Might want to check the output of the generator scripts, it prints which ever strategy it is using.

With locks enabled there should never be two writers active at the same time, therefore index updates would not be an issue.
In any case making the isolation level more restrictive would not resolve this issue. More strict levels only affect repeatable reads within a transaction (with inserts/updates). Single selects would not be affected beyond eliminating dirty reads.
Not sure if select .. for update might work. We only would enforce single-read-write for a stream. But isolation levels already guarantee we only read committed rows anyway. Again, this only is helpful if we want to make sure the selected value remains unchanged within the transaction. Since we do not update / change rows I cannot see how this would have any affect.

@codeliner
Copy link
Member

Current plan of mitigation would be to add some sliding window event number gap-detection mechanism to the projections.

I'd really like to solve the problem differently, but looking at the history of this issue and the fact that other libraries in other languages solved the problem with a sliding window gap-detection, too (f.e. AxonFramework) makes me wondering if there is any other solution to this problem :(

As I stated earlier in this issue, when we hit the problem in a project that makes heavy use of projections, we solved the problem by adding projection flags to event metadata. We use a custom mongoDB event store (prooph v6 originally and then modified to be compatible with prooph v7 event-store interface). Each projection has its own flag and when it handled an event, it sets its flag to true. This way, we are 100% sure that every projection sees every single event no matter what happens on the write side.

A sliding window gap detection would work very similar to that approach with the benefit that events stay immutable and the consumer is responsible for gap detection.

@codeliner
Copy link
Member

Reply from Allard Buijze (CTO AxonIQ) on twitter:

The problem is that insert order (where counter is incremented) and commit order (where event becomes visible) differ. We solved it in Axon by keeping track of ‘gaps’ so they can be consumed once visible. Consider using a real event store for event sourcing.

https://twitter.com/allardbz/status/1171504819191762944

@danizord
Copy link

danizord commented Sep 10, 2019

We have 2 different scenarios here:

Multiple writers

  • Writer A inserts event 1
  • Writer B inserts event 2
  • Writer B commits
  • Projection reads event 2
  • Writer A commits
  • Projection skipped event 1

I think this case seems to be solved by the GET_LOCK() mechanism.

Single writer

  • Writer A inserts event 1
  • Writer A inserts event 2
  • Writer A commits
  • Projection skips event 1 and goes straight to event 2

After reading some logs here, it seems to be happening even with GET_LOCK() (which has no effect here), but it is very rare, thus hard to reproduce :/

@markkimsal
Copy link

bash ./event-gen-bulk.sh 2 2000 &

I think you are starting the generators without the locking strategy. The locking is controlled by a third parameter. Might want to check the output of the generator scripts, it prints which ever strategy it is using.

As I've said multiple times, transactions on the writers have no effect on the problem. The writers do not need a consistent view of the table.

With locks enabled there should never be two writers active at the same time, therefore index updates would not be an issue.

In theory, yes. But, index updates are already not the issue.

Global table locks on the writers does not guarantee a consistent view of the data from the select's side of things.

Non-blocking reads always have a chance to see the database in a state that it never existed - at least in mysql.

@codeliner
Copy link
Member

I'm trying to understand the discussion and would like to ask a few questions to make sure that I got it right.

Let's look at @danizord example with the Single Writer. Two events, one transaction.
My current view of the world is that as long as the transaction is not committed, no other process can see those two events. They don't exist yet for everything except the writer that performs the operation and the database engine.

If I understand @markkimsal correctly, this assumption is not entirely true.

Non-blocking reads always have a chance to see the database in a state that it never existed - at least in mysql.

Hence, his suggestion to use SELECT ... FOR UPDATE to avoid dirty reads.

@fritz-gerneth says on the other hand, that this solution would enforce single-read-write for a stream.

I guess this is bad for read performance? At least it sounds a bit like that, but I'm not an expert.

@fritz-gerneth

But isolation levels already guarantee we only read committed rows anyway.

Is this really the case? If I understand @markkimsal correctly, it's not. And his suggestion to use SELECT ... FOR UPDATE would be a workaround to enforce the behavior that we would like to enforce with isolation levels.

We should also make sure that we're not looking at the wrong end. A bug in the projections could also be the root of the problem. TBH I don't think this is the case, because the projection logic is simple enough to be considered robust and stable. But in Germany we say "man soll niemals nie sagen!" (Never say never!)

@codeliner
Copy link
Member

I also reread the article linked by @prolic earlier in the thread:

https://www.matthewbilyeu.com/blog/auto-incremented-values-can-appear-out-of-order-if-inserted-in-transactions/

I think the moral of the story is: don't rely on the order in which values in an AUTO_INCREMENT column appear. There could be rows in transactional flight with smaller values than the max of the auto-inc column.

I'm starting to believe that this issue not only effects concurrent inserts of different transactions, but also inserts of multiple events within the same transaction. It makes sense somehow, because auto-increment is increased at insert time, not at commit time. So this is true for a single transaction, too. If the consumer reads while the writer is in the transaction, the consumer might see newly inserted rows.
GET_LOCK only ensures single-writer access, but when the consumer does not use GET_LOCK with the same name, it seems that the consumer is able to see some of the rows while they are still in transactional flight.

@codeliner
Copy link
Member

It's been a while since the last update on this issue. Now we're also facing the issue with a Postgres Event Store and need a solution. So I've added a simple gap detection mechanism (WIP), see #221

Feedback is very much appreciated!

@webdevilopers
Copy link

We can confirm this issue with PostgreSQL for event_streams and projections tables. We use MongoDB for Projections exclusively.

We will try the gap detection and then give further feedback.

Thank you so far everybody!

@Mattin
Copy link

Mattin commented Jun 7, 2021

@webdevilopers I can confirm that GapDetection works as it should with Postgre. I have just overridden the projection command with GapDetection and it worked like a charm. No skipped event since March 2020 :).

Here is Command class for quick try: https://gist.github.com/Mattin/d668fc62073649aa2d473cbda245dd5b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants