Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreference watchers #149

Closed
wants to merge 18 commits into from
Closed

Unreference watchers #149

wants to merge 18 commits into from

Conversation

enumag
Copy link
Member

@enumag enumag commented Nov 8, 2020

Superseeds #140

Still need to check your comment from #117 (comment). It might be easier now with the exceptions fixed.

@enumag enumag force-pushed the feature/unreference-heartbeat branch 2 times, most recently from da4ff6c to 2870746 Compare November 8, 2020 10:32
@enumag enumag force-pushed the feature/unreference-heartbeat branch from 2870746 to 5ae1719 Compare November 8, 2020 10:34
@coveralls
Copy link

coveralls commented Nov 8, 2020

Pull Request Test Coverage Report for Build 806

  • 55 of 56 (98.21%) changed or added relevant lines in 14 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 71.874%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Internal/Message/TcpConnectionErrorMessage.php 0 1 0.0%
Totals Coverage Status
Change from base Build 797: 0.3%
Covered Lines: 3432
Relevant Lines: 4775

💛 - Coveralls

@@ -110,6 +119,10 @@ public function should_close_connection_after_configured_amount_of_failed_reconn
$closed->resolve(true);
});

// Need a watcher to keep the loop running after connection is closed.
$watcher = Loop::delay(180000, function (): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is highly suspicious. I never encountered a scenario where this would be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. Let me explain. In this test you're creating a connection which can't succeed and will fail with timeout, yes? This means that afterwards there is no active watcher anymore and the loop would just end immediately. But you still want to use the broken connection in order to call appendToStreamAsync and expect it to throw InvalidOperationException.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only needed for this particular test case. In a real life application you would be okay with the loop ending immediately. But we still need the test because again in real life application the loop might stay active because of something else entirely.

@enumag
Copy link
Member Author

enumag commented Nov 27, 2020

@enumag During testing I found that when I remove all $connection->close() calls from the test suite, the tests hang forever, thus I'd say it doesn't resolve the issue we are trying to solve.

@prolic Okay, I tried commenting out the ->close() calls. Most of them don't cause any problem if removed but a few of them indeed do cause the tests to hang.

So I changed \Amp\PHPUnit\AsyncTestCase to call $this->setTimeout(20000); at the beginning of each test. With this the AsyncTestCase will jump out from a hanging test after the timeout and fail it with a message.

Then I added the AMP_DEBUG_TRACE_WATCHERS env variable which activates Amp's TracingDriver. With this I get some helpful dump which watchers are keeping the loop up.

This is the command I ended up with to run the tests (mainly a note for myself):

docker-compose -f docker/unittest/docker-compose.yml restart && docker-compose -f docker/unittest/docker-compose.yml run -e AMP_DEBUG_TRACE_WATCHERS=1 php /app/vendor/bin/phpunit -c phpunit/phpunit.xml --exclude-group=ignore --debug --stop-on-failure

To my surprise, the watcher causing the issue isn't from prooph but from Amphp:

Enabled, referenced watchers keeping the loop running: Watcher ID: q
#0 /app/vendor/amphp/amp/lib/Loop.php:194 Amp\Loop\TracingDriver->onReadable()
#1 /app/vendor/amphp/byte-stream/lib/ResourceInputStream.php:102 Amp\Loop::onReadable()
#2 /app/vendor/amphp/socket/src/ResourceSocket.php:75 Amp\ByteStream\ResourceInputStream->__construct()
#3 /app/vendor/amphp/socket/src/ResourceSocket.php:40 Amp\Socket\ResourceSocket->__construct()
#4 /app/vendor/amphp/socket/src/DnsConnector.php:134 Amp\Socket\ResourceSocket::fromClientSocket()
#5 Amp\Socket\DnsConnector::Amp\Socket\{closure}()
#6 /app/vendor/amphp/amp/lib/Coroutine.php:118 Generator->send()
#7 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:46 Amp\Coroutine->Amp\{closure}()
#8 /app/vendor/amphp/amp/lib/Internal/PrivatePromise.php:23 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->onResolve()
#9 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:143 Amp\Internal\PrivatePromise->onResolve()
#10 /app/vendor/amphp/amp/lib/Deferred.php:52 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->resolve()
#11 /app/vendor/amphp/amp/lib/functions.php:282 Amp\Deferred->resolve()
#12 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:149 Amp\Promise\{closure}()
#13 /app/vendor/amphp/amp/lib/Deferred.php:52 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->resolve()
#14 /app/vendor/amphp/socket/src/DnsConnector.php:85 Amp\Deferred->resolve()
#15 /app/vendor/amphp/amp/lib/Loop/NativeDriver.php:216 Amp\Socket\DnsConnector::Amp\Socket\{closure}()
#16 /app/vendor/amphp/amp/lib/Loop/NativeDriver.php:96 Amp\Loop\NativeDriver->selectStreams()
#17 /app/vendor/amphp/amp/lib/Loop/Driver.php:138 Amp\Loop\NativeDriver->dispatch()
#18 /app/vendor/amphp/amp/lib/Loop/Driver.php:72 Amp\Loop\Driver->tick()
#19 /app/vendor/amphp/amp/lib/Loop/TracingDriver.php:27 Amp\Loop\Driver->run()
#20 /app/vendor/amphp/amp/lib/Loop.php:95 Amp\Loop\TracingDriver->run()
#21 /app/vendor/amphp/phpunit-util/src/AsyncTestCase.php:65 Amp\Loop::run()
#22 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:1536 Amp\PHPUnit\AsyncTestCase->runAsyncTest()
#23 /app/vendor/amphp/phpunit-util/src/AsyncTestCase.php:132 PHPUnit\Framework\TestCase->runTest()
#24 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:1142 Amp\PHPUnit\AsyncTestCase->runTest()
#25 /app/vendor/phpunit/phpunit/src/Framework/TestResult.php:730 PHPUnit\Framework\TestCase->runBare()
#26 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:883 PHPUnit\Framework\TestResult->run()
#27 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestCase->run()
#28 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestSuite->run()
#29 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestSuite->run()
#30 /app/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:673 PHPUnit\Framework\TestSuite->run()
#31 /app/vendor/phpunit/phpunit/src/TextUI/Command.php:148 PHPUnit\TextUI\TestRunner->run()
#32 /app/vendor/phpunit/phpunit/src/TextUI/Command.php:101 PHPUnit\TextUI\Command->run()
#33 /app/vendor/phpunit/phpunit/phpunit:61 PHPUnit\TextUI\Command::main()

Namely it's this watcher. This doesn't make much sense since the watcher is supposed to be deactivated here but that's what I got.

Actually no. It's the Socket.

@enumag
Copy link
Member Author

enumag commented Nov 27, 2020

Ah wait, nevermind. The socket created here is actually the same one that ends up in \Prooph\EventStoreClient\Transport\Tcp\TcpPackageConnection::$connection. Which means it's not properly unreferenced. Okay, more digging needed. 😆

@enumag
Copy link
Member Author

enumag commented Nov 27, 2020

@prolic Ah okay. It's actually quite clear in the end. The unreferencing system works fine. What's hanging are the tests for subscriptions - namely those with a subscription still active at the end of the test. When this happens there are two options:

  • either manually close the connection in tearDown (which effectively kills the running subscription), this is what we were doing until now
  • or end the subscription at the end of each such test - then I'm confident that the tests would not hang anymore even without explicit connection close because the connection would no longer keep the loop alive

@enumag enumag closed this Nov 27, 2020
@enumag enumag reopened this Nov 27, 2020
@prolic
Copy link
Member

prolic commented Nov 27, 2020

* or end the subscription at the end of each such test - then I'm confident that the tests would not hang anymore even without explicit connection close because the connection would no longer keep the loop alive

This sounds good to me.

@enumag
Copy link
Member Author

enumag commented Nov 27, 2020

Okay, the tests did actually identify a bug. The issue is that if connecting to a subscription fails with AccessDenied / NotAuthenticated / MaximumSubscribersReached etc. the tests still hang. It's because the subscriptionDropped callback (which I'm using to keep track of active subscriptions) is never called.

I should be able to fix this but it will still take some time.

@enumag
Copy link
Member Author

enumag commented Nov 28, 2020

@prolic Okay, there is an issue with catchup subscriptions not counting the references correctly. Or rather when EventStoreCatchUpSubscription::stop() is called, the internal EventStoreSubscription actually isn't cancelled (= $this->subscription->unsubscribe() is not called). Do you have an idea why? Should we add $this->subscription->unsubscribe() into the EventStoreCatchUpSubscription::stop() method? If so, where exactly in the stop method should it be?

@enumag
Copy link
Member Author

enumag commented Nov 28, 2020

Apparently it's supposed to be called by EventStoreCatchUpSubscription::dropSubscription() but it seems that this method is never called either. I have no idea how this should be fixed.

@prolic
Copy link
Member

prolic commented Nov 28, 2020

This looks all good, I fixed those hanging tests for you. The problem was, that once a subscription was started, there is still some watcher active. So we need to either close the connection or stop the subscription. I did that in another branch and all tests are green again.

@enumag
Copy link
Member Author

enumag commented Nov 28, 2020

@prolic I'm aware, I did the same locally. But that's not a good fix. You were right earlier that the tests should NOT hang even with all ->close() calls removed. In most cases I just need to fix the tests to close the subscription but for catch up subscription there is an actual bug in the library - there is an active watcher when there shouldn't be one. I'll fix everything else and let you know so that you can have a look at it without anything else failing.

@prolic
Copy link
Member

prolic commented Nov 28, 2020 via email

@enumag
Copy link
Member Author

enumag commented Nov 29, 2020

Ah sorry the problem is something else... The problem isn't that dropSubscription() is not called, it's that it's called too early. It's called when $this->subscription is still null.

Basically the subscription starts processing events before the promise returned by $this->connection->subscribeToAllAsync() is resolved. This shouldn't be the case of course.

The problematic test, or one of them, is ProophTest\EventStoreClient\subscribe_to_all_catching_up_should::call_dropped_callback_when_an_error_occurs_while_processing_an_event. Then like 3 tests in ProophTest\EventStoreClient\subscribe_to_stream_catching_up_should.

I pushed my other fixes.

@enumag enumag force-pushed the feature/unreference-heartbeat branch from bf0bc49 to c04a3de Compare November 29, 2020 08:23
@prolic
Copy link
Member

prolic commented Nov 30, 2020

Thanks, I'll have a look asap.

@enumag
Copy link
Member Author

enumag commented Dec 9, 2020

ping @prolic

@prolic
Copy link
Member

prolic commented Dec 29, 2020

I had a look yesterday, couldn't resolve it, I try to get some time for it these upcoming days.

@prolic
Copy link
Member

prolic commented Mar 22, 2021

More than 2 months ago? I am really sorry @enumag
I will try to work on this tomorrow.

@enumag
Copy link
Member Author

enumag commented Jun 25, 2021

ping @prolic :-P

@prolic
Copy link
Member

prolic commented Jun 30, 2021

I tried 2 days ago, couldn't solve it, hopefully this weekend I have some time again.

@enumag
Copy link
Member Author

enumag commented Jun 30, 2021

Yeah it's a tough one. Would have solved it myself if it was easy.

@prolic prolic closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants