Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-1420: Stuck in ssl:send when internal_active_n != 1 #4349

Open
OTP-Maintainer opened this issue Nov 26, 2020 · 11 comments
Open

ERL-1420: Stuck in ssl:send when internal_active_n != 1 #4349

OTP-Maintainer opened this issue Nov 26, 2020 · 11 comments
Assignees
Labels
bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community priority:low team:PS Assigned to OTP team PS

Comments

@OTP-Maintainer
Copy link

Original reporter: essen
Affected version: Not Specified
Component: ssl
Migrated from: https://bugs.erlang.org/browse/ERL-1420


This issue only occurs on *Windows*. Tested on Windows 10.

This is a followup to https://bugs.erlang.org/browse/ERL-960 where the attached test case now works reliably, but the more general problem seems to remain.

I have test cases that heavily send data via ssl:send. When a single test is run, there is typically no problem. When multiple tests run at the same time and internal_active_n=1 there is no problem. When multiple tests run at the same time and internal_active_n=$DEFAULT then the ssl:send gets stuck.

I do not have an easy to use test case to reproduce this issue. On the other hand it should be reproducible by cloning Ranch and running a specific test suite. I can reproduce this with all versions of OTP >= 22.0 at least.

{{git clone [https://github.com/ninenines/ranch]}}

{{cd ranch}}

{{git checkout windows-ssl-active-n}}

{{make ct-sendfile t=ssl}}

You should have timeouts and timetrap timeouts. If I remove the timetrap timeout then the tests simply never finish.

I'd appreciate help in solving this issue.
@OTP-Maintainer
Copy link
Author

ingela said:

Thanks for clarifying about windows. Have not had time to try and reproduce this yet. Hopefully next week.

@OTP-Maintainer
Copy link
Author

dgud said:

I did not manage to reproduce this on my windows, (via wsl) 
either on 23.1.1 nor 23.0.3 which I have installed on my home machine.



{noformat}
X64:/mnt/e/src/ranch:windows-ssl-active-n
>PATH=/mnt/c/Program\ Files/erl-23.0.3/bin:$PATH make ct-sendfile t=ssl
 GEN    test-build
 GEN    ct-sendfile
Converting "test" to "e:/src/ranch/test" and re-inserting with add_patha/1


Common Test v1.19 starting (cwd is e:/src/ranch)



CWD set to: "e:/src/ranch/logs/ct_run.ct_ranch@Dan.2020-12-08_16.41.21"

TEST INFO: 1 test(s), 10 case(s) in 1 suite(s)

Testing src.ranch.sendfile_SUITE.groups: Starting test, 10 test cases
Testing src.ranch.sendfile_SUITE.groups: TEST COMPLETE, 10 ok, 0 failed of 10 test cases

Updating e:/src/ranch/logs/index.html ... done
Updating e:/src/ranch/logs/all_runs.html ... done

{noformat}

I run the suite ~10 times on each release.

@OTP-Maintainer
Copy link
Author

essen said:

I can reproduce it reliably on a Windows 10 VM running on VirtualBox on a Windows 10 host and the tests running either via BuildKite or in an MSYS2 environment. I can probably give you access to this environment directly. I'll see tomorrow about trying against 23.1.1 (instead of 23.1) to match your version and see if perhaps that fixes it. I forgot that on Windows I do not currently run patch releases.

If the problem remains I will see if I can provide you with access to that environment.

@OTP-Maintainer
Copy link
Author

essen said:

I can reproduce the issue with both 23.1.1 and 23.1.5.

I should be able to give you access to the VM if you like.

@OTP-Maintainer
Copy link
Author

ingela said:

Thanks for the offer. We are pretty busy right now and it feels like to many unknowns for such a debugging session at the moment.  What version of  VirtualBox are you running?  Is it the latest? We have experienced problems with VirtualBox in other scenarios where the the whole OS would just hang.  Now we run vmware instead.  By the way which is your host-os?  Maybe you could use erlang:halt/1 with a string to produce a erl-crashdump  and maybe find some clue. 

@OTP-Maintainer
Copy link
Author

ingela said:

We can not reproduce, so we can not find the problem.

@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community team:PS Assigned to OTP team PS priority:medium labels Feb 10, 2021
@max-au
Copy link
Contributor

max-au commented Feb 15, 2021

@essen this bug actually reminds me very much of a tricky corruption I fixed in #2701 - could you test/see if this is actually fixed with the PR?

@essen
Copy link
Contributor

essen commented Feb 15, 2021

Based on the PR merge and release dates I believe your fix is part of 23.1, with which I could observe this issue.

@max-au
Copy link
Contributor

max-au commented Feb 15, 2021

Very true, it is in OTP 23.1. It also seems to be Windows-specific, which I must admit I haven't tested.
Does setting internal_active_n to 2 work? If not, it means that only {active, once} works on Windows.

@essen
Copy link
Contributor

essen commented Feb 17, 2021

I do not remember. I'll try and get back to you on that when I can.

@essen
Copy link
Contributor

essen commented Feb 28, 2021

Sorry it took so long to come back to you.

Unfortunately it's harder to tell what's happening now because I'm getting timetrap_timeout even when internal_active_n is set to 1. It happens with 23.2 as well as older versions.

I suspect it is more a combination of VirtualBox+Windows+OTP than it is a plain OTP issue.

I'm happy to just close this as I'm unlikely to have time debugging this in the near future.

@u3s u3s self-assigned this May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community priority:low team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

4 participants