WIP: Fix fwrite() on non-blocking SSL sockets #2330

bukka · 2017-01-22T20:52:25Z

This is still work in progress but want to see what Travis thinks about that.

@bwoebi this should hopefully address https://github.com/amphp/aerys/issues/107 so if you could give a try that would be great.

@weltling I haven't setup my win machine yet so if you have time and can test it on windows that would be awesome. I have got a separate server and client tests in here (you run server first and then client and see what it does ;)) which is a bit more user friendly for manual testing:

https://github.com/bukka/php-util/tree/master/tests/openssl/asyncw

bwoebi · 2017-01-22T21:01:21Z

I'd love to try it, but I couldn't reproduce the issue back then with aerys locally - so I probably am not a big help here.

krakjoe · 2017-01-23T17:18:01Z

@bukka @bwoebi is there any one else that could/can reproduce the issue, or provide some kind of feedback here ?

bukka · 2017-01-24T12:08:02Z

I need to double check few things and do a bit more testing to be more sure that it doesn't break anything. It would be also nice to have some Win testing. Btw. anyone should be able to reproduce with the supplied test. But it's still WIP as more testing required. I will remove WIP from the title and merge it once I'm more sure that it's all fine.

Fedott · 2017-01-28T23:40:08Z

I checked this fix on my case. It's work!

staabm · 2017-01-29T00:31:51Z

I checked this fix on my case. It's work!

Whats your case? Any details? What/How did you test this patch?

Fedott · 2017-01-29T00:46:16Z

Whats your case? Any details? What/How did you test this patch?
Show all checks

I have problem similar described in issue https://github.com/amphp/aerys/issues/107.
I compiled php in docker container and run own application.
I loaded big files via protocol http2 (used openssl). Files loaded correctly.
Communication over websocket with openssl work too.

kelunik · 2017-02-24T09:19:34Z

@bukka What's the state here?

trowski · 2017-02-24T16:00:09Z

@DaveRandom and @brzuchal discovered this same problem last night, and after working with them we also realized that SSL_MODE_ENABLE_PARTIAL_WRITE and SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER were needed to make non-blocking stream writes succeed when given a large buffer. We had no idea this PR existed, and I already committed a patch with essentially the same fix, however with the modes set in php_openssl_setup_crypto. (Setting the modes separately is a leftover from trying to determine if both options were needed, definitely should be one statement).

I apologize for stepping on anyone's toes. How should we proceed? @bukka Feel free to replace what I committed with a single statement or move it to php_openssl_enable_crypto if that's where the options should be set.

bukka · 2017-02-24T21:39:25Z

I want to look on the whole thing a little bit later in March and think about a better implementation in general.

About this one. I have tested it with non blocking on Linux which seems fine however haven't done much testing with blocking to be sure it has no influance (meanig it really won't get set). In general these modes could possibly cause some BC issues so we should be careful. That's why I set it just for non blocking where it doesn't work already.

@trowski have you done some testing with blocking - I haven't checked your tests properly so not sure if it has any influance but code seems to set it always. Also have you been testing that on Win to be sure that it works fine in there as well? I think it's always better to test it properly rather than break some apps... We should be really sure about all possible consequances before we ship it though... If you are not 100% sure, might be bettet to revert it and wait a little bit once I have got a bit more time in March.

trowski · 2017-02-24T22:01:24Z

@bukka I tried the same test with a blocking stream and didn't have any issues. However, I agree that the fix should only be applied to non-blocking streams. Since the issue only exists with non-blocking streams there's no reason to inadvertently change behavior for blocking streams.

I think the chances of BC breaks applying the fix to only non-blocking streams is minimal. Behavior below the buffer size causing issues continues to be the same and of course the patch fixes the large buffer issue.

I have not personally tested on Windows, though the tests on appveyor seem to pass.

I'd like to apply the fix to only non-blocking streams and ship this in the next point releases. This is an issue we're having now and is preventing usage of non-blocking streams for messages over about 128K.

bukka · 2017-02-25T15:24:59Z

@trowski You can leave it. I will rewrite it with my version when I'm back as it's cleaner and just for non-blocking. I realised that the next release should be tagged in the week starting on 13th March so I have enough time to look into it.

Btw. we really need to stop adding test for TLS 1.2 only as it's not good for future testing (e.g. once the 1.3 support is released) and there should be no issue with lower versions anyway for this particular. I actually think that our setting of a single constant is just something that we should deprecate and rather add min and max versions. But that's a completely different issue of course.

Anyway don't worry about anything. I will take a look in March and will then close this PR as well ;)

It reliably fails without the xp_ssl.c so it is good with smaller usleep

The #74159 is duplicate

nikic · 2017-03-14T18:52:55Z

Please squash PRs with dirty history before merging.

bukka · 2017-03-14T18:59:42Z

It's not a dirty history and it's rebased which is actually what our official flow is:

https://wiki.php.net/vcs/gitworkflow

kelunik · 2017-03-14T19:04:56Z

Will this be in the release this week or the next release?

nikic · 2017-03-14T19:08:09Z

By dirty history I mean that it contains multiple commits performing "fixup" of previous commits, e.g. to fix indentation, slightly change style etc. If they are not (interactively) rebased or squashed before merging, these individual commits show in blames etc. When blaming code I want to see "Fixed bug #XXX" and not click through a series of "fixed indent" and "tidy code".

bukka · 2017-03-14T19:23:11Z

@kelunik It should be in the next tagged release which should be 7.0.18 and 7.1.4 (at least that's where I updated NEWS)

@nikic well not all of them are for the same things. For example the indent fix and skip is for another test that was already there. Also there is a clean up of the mode setting which is not related. Considering that squashing is not really our requirement by the linked workflow, I don't see any reason for it in here (read I just can't be bothered to squash only some bits... :) ).

bukka · 2017-03-14T19:29:38Z

@nikic Also to see all changes you can easily view the merge commit which gives you exactly that so I don't really understand what the advantage of squashing is...

nikic · 2017-03-14T19:52:24Z

@bukka In this case the merge commit will not show up in (non-global) commit logs and blames (trivial merge).

I'm not saying that everything should be squashed together, but the history should be meaningful. It's fine to keep the reindent of an existing file as a separate commit, but having multiple separate commits for small tweaks to code that was introduced in the same PR is not meaningful.

weltling · 2017-03-15T08:30:27Z

@bukka could you please check the AppVeyor fails on ext\openssl\tests\bug72333.phpt?

Thanks.

bukka · 2017-03-15T20:15:31Z

@weltling Looks like win in AppVeyor kills the socket quite quickly so this sort of test won't probably work there. Might be probably better to skip it on win or in just AppVeyour if it works on your local win machine. If you have a little bit of time to run the test locally on Win and possibly double check the functionality (if the fix works) - if you are able to give a try of the piece of code in https://bugs.php.net/bug.php?id=72333 that would be awesome.

I will hopefully have time to finally properly setup my PHP Win env in the next few weeks but definitely not this week so your help would be appreciated!

weltling · 2017-03-15T23:20:27Z

@bukka i was previously checking the test locally, same fails there about "connection was forcibly closed". Timely it looks busy as well these days, but willing to help or support the fix testing, ofc. Just let the code from the BT run, here's what it gives out:

int(0)
string(0) ""
int(65536)
int(0)

Not sure it's right, as the second write gives 0 back. Clear, it'd be better to know exactly, what goes on, before adding skips.

BTW you can also signup to AppVeyor locally using the GitHub account. Then it'll pickup pushes to your fork.

Thanks.

nikic · 2017-03-15T23:28:19Z

Looks like the test is also intermittent on Travis, e.g. failing in https://travis-ci.org/php/php-src/jobs/211471345

========DIFF========
001+ Warning: fwrite(): SSL: Connection reset by peer in /home/travis/build/php/php-src/ext/openssl/tests/ServerClientTestCase.inc(96) : eval()'d code on line 11
========DONE========
FAIL Bug #72333: fwrite() on non-blocking SSL sockets doesn't work [ext/openssl/tests/bug72333.phpt]

Of course, depending on frequency this may be okay, server tests tend to be unreliable.

bukka · 2017-03-16T20:26:46Z

@weltling @nikic Should be sorted by 0c8ad36

bukka · 2017-03-16T20:28:04Z

@weltling That result for the BT is expected. 0 is there because socket is not available for read. The main thing is that there is no error ;)

weltling · 2017-03-17T01:05:05Z

@bukka Thanks for the patch. When i run the test locally on Windows, the test blocks now. Seems there's still some test issue, but might be a local issue, too. From the test code, and what you've expressed in the last comment, shouldn't we also break when (edit) false === $result? From the test - the client would've anyway written less that server wanted to read, so it is likely the client loops through select endlessly, as the server didn't close the socket. I'll be able to check more close on monday, probably. Still, what was intended to be checked, from the current test the actual size the client wanted to write doesn't matter.

Thanks.

weltling · 2017-03-17T01:17:00Z

A quick try yet - decreasing the select timeout to 1 second makes the test to pass. Seems the client indeed hangs, but not in the loop - in the select. Still, one second probably too slow for the test run :/

Thanks.

bukka · 2017-03-17T19:03:45Z

@weltling I have committed another fix 53e2c91 that should consume data by server so the client should be able to write more hopefully. AppVeyor seems fine but your local setup might have a smaller queue so thought that it might be the issue. Let me know once you get time to try it. ;) Thanks

weltling · 2017-03-19T22:34:40Z

@bukka thanks for taking care. Seems this is a tricky case :/ I've still have no stable result locally with ext\openssl\tests\bug72333.phpt. Now was checking with 32- and 64-bit and OpenSSL 1.1 on master, whereby it shouldn't matter with the exact versions. Locally, i still have the test hanging sporadically, but running through some times with a warning "completed successfully", sometimes passing. I'm going to observe it more the comming week. What were the exact queue setting you mentioned?

Thanks.

weltling · 2017-03-19T23:32:13Z

@bukka, thinking more - are you sure this high timeout on select is needed? OFC it is hard to check what exactly happens on CI, but i guess there's a good potential it hangs elsewhere, not only on Windows, even then it passes afterwards. For now it looks like some dead lock in the network related test code. I guess the big timeout is the issue, will experiment on that yet.

Thanks.

bukka · 2017-03-20T16:27:10Z

@weltling I lowered the timeout to 1 sec as it should be the top when this happens anyway and if it goes over 1 sec, than it would hang for longer. It is kind of strange that it's not visible in AppVeyor but you get it.

Just to explain what the test is about. The point is to make write in the client (used by OpenSSL) write less than requested (so the returned size is lower than the supplied data size). It is more difficult to create such case on localhost because OS usually caches (at least linux does) it in the network buffer (the queue that I was talking about) so there must be enough data supplied by the client that are not consumed by the server (that happens when there is a usleep after short read). In your case I guess that the client is not capable to continue to write anything even though there should be enough space in the queue. Not really sure about the exact reason though but currently seems to me like something Windows related as it does not happen on Linux but might be wrong.

weltling · 2017-03-20T17:28:32Z

@bukka this looks more or less stable now, tested quick with the current 7.0 and master without rebuild. Probably it'll be even better do have a smaller select timeout. It looks like a bit wrong assumption in the test code, in the current and in the previous version as well, that leads to the dead lock. The principal fact is, that select blocks, disregarding whether the socket itself is non blocking.

Thanks for the explanation, likely the behavior to enforce is indeed a tricky thing, as the I/O buffers are controlled by the systems. It is also likely, that AppVeyor was already doing same - hanging for the full timeout and then running the test through. That's what i meant by telling it's hard to say what happens on CI. As the only checked part is that the operation is finished without warnings, the test could still pass. The previous test version was actually doing same locally - let it run, it exhausts the timeout and the test runs through. I couldn't say whether this was same in the very first test version, probably it were just hanging too long so i didn't wait for the timeout and thought it just hangs. The diff to your new version is anyway that i don't see any extra occasional warnings anymore, teh "operation completed successfully" warnings i was mentioning above. We still could decrease timeout, but in general the test should be fine now.

Many thanks!

krakjoe added the Bug label Jan 23, 2017

krakjoe added the Waiting on Author label Jan 23, 2017

bukka added 8 commits March 14, 2017 18:13

Fix bug #72333 (fwrite() on non-blocking SSL sockets does not work)

17e9fc9

Remove extra SSL mode settings

53cc72e

Tidy up setting of SSL_MODE_RELEASE_BUFFERS

760ff21

Fix indent and add comment

411f4b1

Speed up test for bug72333

f0e67d1

It reliably fails without the xp_ssl.c so it is good with smaller usleep

Tidy up test for bug72333

8e5d080

Skip bug74159 test for OpenSSL lower than 1.0.1

ba70478

Replace NEWS entry for #74159 with #72333

960e5cd

The #74159 is duplicate

bukka force-pushed the bug72333 branch from a78e7ab to 960e5cd Compare March 14, 2017 18:41

php-pulls merged commit 960e5cd into php:PHP-7.0 Mar 14, 2017

kelunik mentioned this pull request Mar 14, 2017

js or css files on HTTP/2 amphp/http-server#107

Closed

oknet mentioned this pull request Oct 26, 2017

Assertion when setting proxy.config.ssl.max_record_size=-1 apache/trafficserver#2705

Closed

WIP: Fix fwrite() on non-blocking SSL sockets #2330

WIP: Fix fwrite() on non-blocking SSL sockets #2330

Uh oh!

Conversation

bukka commented Jan 22, 2017

Uh oh!

bwoebi commented Jan 22, 2017

Uh oh!

krakjoe commented Jan 23, 2017

Uh oh!

bukka commented Jan 24, 2017

Uh oh!

Fedott commented Jan 28, 2017

Uh oh!

staabm commented Jan 29, 2017

Uh oh!

Fedott commented Jan 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kelunik commented Feb 24, 2017

Uh oh!

trowski commented Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bukka commented Feb 24, 2017

Uh oh!

trowski commented Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bukka commented Feb 25, 2017

Uh oh!

nikic commented Mar 14, 2017

Uh oh!

bukka commented Mar 14, 2017

Uh oh!

kelunik commented Mar 14, 2017

Uh oh!

nikic commented Mar 14, 2017

Uh oh!

bukka commented Mar 14, 2017

Uh oh!

bukka commented Mar 14, 2017

Uh oh!

nikic commented Mar 14, 2017

Uh oh!

weltling commented Mar 15, 2017

Uh oh!

bukka commented Mar 15, 2017

Uh oh!

weltling commented Mar 15, 2017

Uh oh!

nikic commented Mar 15, 2017

Uh oh!

bukka commented Mar 16, 2017

Uh oh!

bukka commented Mar 16, 2017

Uh oh!

weltling commented Mar 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weltling commented Mar 17, 2017

Uh oh!

bukka commented Mar 17, 2017

Uh oh!

weltling commented Mar 19, 2017

Uh oh!

weltling commented Mar 19, 2017

Uh oh!

bukka commented Mar 20, 2017

Uh oh!

weltling commented Mar 20, 2017

Uh oh!

Uh oh!

Fedott commented Jan 29, 2017 •

edited

Loading

trowski commented Feb 24, 2017 •

edited

Loading

trowski commented Feb 24, 2017 •

edited

Loading

weltling commented Mar 17, 2017 •

edited

Loading