-
Notifications
You must be signed in to change notification settings - Fork 7.9k
WIP: Fix fwrite() on non-blocking SSL sockets #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd love to try it, but I couldn't reproduce the issue back then with aerys locally - so I probably am not a big help here. |
I need to double check few things and do a bit more testing to be more sure that it doesn't break anything. It would be also nice to have some Win testing. Btw. anyone should be able to reproduce with the supplied test. But it's still WIP as more testing required. I will remove WIP from the title and merge it once I'm more sure that it's all fine. |
I checked this fix on my case. It's work! |
Whats your case? Any details? What/How did you test this patch? |
I have problem similar described in issue https://github.com/amphp/aerys/issues/107. |
@bukka What's the state here? |
@DaveRandom and @brzuchal discovered this same problem last night, and after working with them we also realized that I apologize for stepping on anyone's toes. How should we proceed? @bukka Feel free to replace what I committed with a single statement or move it to |
I want to look on the whole thing a little bit later in March and think about a better implementation in general. About this one. I have tested it with non blocking on Linux which seems fine however haven't done much testing with blocking to be sure it has no influance (meanig it really won't get set). In general these modes could possibly cause some BC issues so we should be careful. That's why I set it just for non blocking where it doesn't work already. @trowski have you done some testing with blocking - I haven't checked your tests properly so not sure if it has any influance but code seems to set it always. Also have you been testing that on Win to be sure that it works fine in there as well? I think it's always better to test it properly rather than break some apps... We should be really sure about all possible consequances before we ship it though... If you are not 100% sure, might be bettet to revert it and wait a little bit once I have got a bit more time in March. |
@bukka I tried the same test with a blocking stream and didn't have any issues. However, I agree that the fix should only be applied to non-blocking streams. Since the issue only exists with non-blocking streams there's no reason to inadvertently change behavior for blocking streams. I think the chances of BC breaks applying the fix to only non-blocking streams is minimal. Behavior below the buffer size causing issues continues to be the same and of course the patch fixes the large buffer issue. I have not personally tested on Windows, though the tests on appveyor seem to pass. I'd like to apply the fix to only non-blocking streams and ship this in the next point releases. This is an issue we're having now and is preventing usage of non-blocking streams for messages over about 128K. |
@trowski You can leave it. I will rewrite it with my version when I'm back as it's cleaner and just for non-blocking. I realised that the next release should be tagged in the week starting on 13th March so I have enough time to look into it. Btw. we really need to stop adding test for TLS 1.2 only as it's not good for future testing (e.g. once the 1.3 support is released) and there should be no issue with lower versions anyway for this particular. I actually think that our setting of a single constant is just something that we should deprecate and rather add min and max versions. But that's a completely different issue of course. Anyway don't worry about anything. I will take a look in March and will then close this PR as well ;) |
It reliably fails without the xp_ssl.c so it is good with smaller usleep
The #74159 is duplicate
Please squash PRs with dirty history before merging. |
It's not a dirty history and it's rebased which is actually what our official flow is: |
Will this be in the release this week or the next release? |
By dirty history I mean that it contains multiple commits performing "fixup" of previous commits, e.g. to fix indentation, slightly change style etc. If they are not (interactively) rebased or squashed before merging, these individual commits show in blames etc. When blaming code I want to see "Fixed bug #XXX" and not click through a series of "fixed indent" and "tidy code". |
@kelunik It should be in the next tagged release which should be 7.0.18 and 7.1.4 (at least that's where I updated NEWS) @nikic well not all of them are for the same things. For example the indent fix and skip is for another test that was already there. Also there is a clean up of the mode setting which is not related. Considering that squashing is not really our requirement by the linked workflow, I don't see any reason for it in here (read I just can't be bothered to squash only some bits... :) ). |
@nikic Also to see all changes you can easily view the merge commit which gives you exactly that so I don't really understand what the advantage of squashing is... |
@bukka In this case the merge commit will not show up in (non-global) commit logs and blames (trivial merge). I'm not saying that everything should be squashed together, but the history should be meaningful. It's fine to keep the reindent of an existing file as a separate commit, but having multiple separate commits for small tweaks to code that was introduced in the same PR is not meaningful. |
@bukka could you please check the AppVeyor fails on ext\openssl\tests\bug72333.phpt? Thanks. |
@weltling Looks like win in AppVeyor kills the socket quite quickly so this sort of test won't probably work there. Might be probably better to skip it on win or in just AppVeyour if it works on your local win machine. If you have a little bit of time to run the test locally on Win and possibly double check the functionality (if the fix works) - if you are able to give a try of the piece of code in https://bugs.php.net/bug.php?id=72333 that would be awesome. I will hopefully have time to finally properly setup my PHP Win env in the next few weeks but definitely not this week so your help would be appreciated! |
@bukka i was previously checking the test locally, same fails there about "connection was forcibly closed". Timely it looks busy as well these days, but willing to help or support the fix testing, ofc. Just let the code from the BT run, here's what it gives out:
Not sure it's right, as the second write gives 0 back. Clear, it'd be better to know exactly, what goes on, before adding skips. BTW you can also signup to AppVeyor locally using the GitHub account. Then it'll pickup pushes to your fork. Thanks. |
Looks like the test is also intermittent on Travis, e.g. failing in https://travis-ci.org/php/php-src/jobs/211471345
Of course, depending on frequency this may be okay, server tests tend to be unreliable. |
@weltling That result for the BT is expected. 0 is there because socket is not available for read. The main thing is that there is no error ;) |
@bukka Thanks for the patch. When i run the test locally on Windows, the test blocks now. Seems there's still some test issue, but might be a local issue, too. From the test code, and what you've expressed in the last comment, shouldn't we also break when (edit) Thanks. |
A quick try yet - decreasing the select timeout to 1 second makes the test to pass. Seems the client indeed hangs, but not in the loop - in the select. Still, one second probably too slow for the test run :/ Thanks. |
@bukka thanks for taking care. Seems this is a tricky case :/ I've still have no stable result locally with ext\openssl\tests\bug72333.phpt. Now was checking with 32- and 64-bit and OpenSSL 1.1 on master, whereby it shouldn't matter with the exact versions. Locally, i still have the test hanging sporadically, but running through some times with a warning "completed successfully", sometimes passing. I'm going to observe it more the comming week. What were the exact queue setting you mentioned? Thanks. |
@bukka, thinking more - are you sure this high timeout on select is needed? OFC it is hard to check what exactly happens on CI, but i guess there's a good potential it hangs elsewhere, not only on Windows, even then it passes afterwards. For now it looks like some dead lock in the network related test code. I guess the big timeout is the issue, will experiment on that yet. Thanks. |
@weltling I lowered the timeout to 1 sec as it should be the top when this happens anyway and if it goes over 1 sec, than it would hang for longer. It is kind of strange that it's not visible in AppVeyor but you get it. Just to explain what the test is about. The point is to make |
@bukka this looks more or less stable now, tested quick with the current 7.0 and master without rebuild. Probably it'll be even better do have a smaller select timeout. It looks like a bit wrong assumption in the test code, in the current and in the previous version as well, that leads to the dead lock. The principal fact is, that select blocks, disregarding whether the socket itself is non blocking. Thanks for the explanation, likely the behavior to enforce is indeed a tricky thing, as the I/O buffers are controlled by the systems. It is also likely, that AppVeyor was already doing same - hanging for the full timeout and then running the test through. That's what i meant by telling it's hard to say what happens on CI. As the only checked part is that the operation is finished without warnings, the test could still pass. The previous test version was actually doing same locally - let it run, it exhausts the timeout and the test runs through. I couldn't say whether this was same in the very first test version, probably it were just hanging too long so i didn't wait for the timeout and thought it just hangs. The diff to your new version is anyway that i don't see any extra occasional warnings anymore, teh "operation completed successfully" warnings i was mentioning above. We still could decrease timeout, but in general the test should be fine now. Many thanks! |
This is still work in progress but want to see what Travis thinks about that.
@bwoebi this should hopefully address https://github.com/amphp/aerys/issues/107 so if you could give a try that would be great.
@weltling I haven't setup my win machine yet so if you have time and can test it on windows that would be awesome. I have got a separate server and client tests in here (you run server first and then client and see what it does ;)) which is a bit more user friendly for manual testing:
https://github.com/bukka/php-util/tree/master/tests/openssl/asyncw