-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle issue with inconsistent zxid on reconnection #28
Conversation
Have the same issue |
hey @nikitagromov! thank you for your contribution, but I believe that this particular case requires some additional tests. So I prefer to see some tests for that case and then we can decide where it should be fixed. |
@cybergrind if you attempt to connect to zookeeper server with wrong zxid server will close your connection without any response, so you can't detect if session still present on server or not.
I will add tests for reproducing this issue |
In that case probably we cannot reach state when: |
Yep, because we don't have response from server :) I agree with you that I should change state to |
@cybergrind I've updated PR. Also I've added additional error log for case when repair_loop task failed on some unexpectable exception |
aiozk/session.py
Outdated
await self.set_existing_watches() | ||
self.conn.start_read_loop() | ||
await self.set_existing_watches() | ||
except Exception as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which errors do we expect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the solely purpose of all this try/catch block is to log error, probably self.repair_loop_task.add_done_callback
with logging error will look cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep 👍
await zk.start() | ||
# simulate failed connection | ||
await zk.session.close() | ||
zk.session.last_zxid = 1231231241312312 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I comment out this string and run the test on codebase without any fixes - it doesn't work.
Probably we need a test that doesn't work if we override zxid and works if we don't override it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should fail with timeout exception, I will check it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean code without this line should pass the test.
# zk.session.last_zxid = 1231231241312312
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On codebase without fixes it will not work even if we comment zk.session.last_zxid
. We can't reconnect to zk because of session holds in closing state. I will add test which checks only session reconnect
What the minimal changes to support this test. I've added
`closing=False/connected=False` lines and it still not working
…On Mon, Mar 25, 2019 at 5:27 PM Nikita ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In aiozk/test/test_client.py
<#28 (comment)>:
> @@ -45,6 +45,22 @@
await asyncio.wait_for(zk.session.close(), 2)
+
***@***.***
+async def test_inconsistent_zxid():
+ async def coro():
+ zk = get_client()
+ await zk.start()
+ # simulate failed connection
+ await zk.session.close()
+ zk.session.last_zxid = 1231231241312312
On codebase without fixes it will not work even if we comment
zk.session.last_zxid. We can't reconnect to zk because of session holds
in closing state. I will add test which checks only session reconnect
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AALutiFNxB64k5S6thgwteU9E5IEk4Csks5vaNzhgaJpZM4cFoMG>
.
|
to end of and test:
will pass on codebase without fix |
@nikitagromov thank you for your contribution. it was a pleasure to work with you =) |
We can get an infinite loop of reconnections when zxid was changed on the server and we attempt to connect with old zxid