Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in Windows build in master #1104

Closed
ronaldtse opened this issue Apr 21, 2020 · 15 comments
Closed

Failure in Windows build in master #1104

ronaldtse opened this issue Apr 21, 2020 · 15 comments
Assignees
Labels
Projects

Comments

@ronaldtse
Copy link
Contributor

Description

In the master build the Windows build seems to fail by timeout; i.e.

[736]
+ cp /d/a/rnp/rnp/cmake/CTestCostData.txt /home/runneradmin/local-builds/rnp-build/Testing/Temporary
+ ctest -j2 -R '.*' --output-on-failure
Test project C:/hostedtoolcache/windows/Ruby/2.5.8/x64/msys64/home/runneradmin/local-builds/rnp-build
        Start 156: setupTestData
  1/166 Test #156: setupTestData .................................................................   Passed    0.24 sec
        Start 160: cli_tests-Encryption
        Start 159: cli_tests-Compression

Steps to Reproduce

  1. Build: https://github.com/rnpgp/rnp/runs/605970058?check_suite_focus=true
@ni4
Copy link
Contributor

ni4 commented Apr 21, 2020

@ronaldtse Looks like it's something with GHA (see comment rnpgp/ruby-rnp#67 (comment)).
At least recently merged PR #1097 didn't make any windows-related changes.

@ronaldtse
Copy link
Contributor Author

Got it, thanks! Will look out for the fix to come...

@ronaldtse
Copy link
Contributor Author

This is the latest failure: https://github.com/rnpgp/rnp/actions/runs/91859235

D:/a/rnp/rnp/src/lib/rnp.cpp: In function 'rnp_result_t key_to_json(json_object*, rnp_key_handle_t, uint32_t)':
437
D:/a/rnp/rnp/src/lib/rnp.cpp:6324:70: error: 'TRUE' was not declared in this scope
438
 6324 |     json_object *jsorevoked = json_object_new_boolean(key->revoked ? TRUE : FALSE);
439
      |                                                                      ^~~~
440
D:/a/rnp/rnp/src/lib/rnp.cpp:6324:77: error: 'FALSE' was not declared in this scope
441
 6324 |     json_object *jsorevoked = json_object_new_boolean(key->revoked ? TRUE : FALSE);
442
      |                                                                             ^~~~~
443
make[2]: *** [src/lib/CMakeFiles/librnp.dir/build.make:551: src/lib/CMakeFiles/librnp.dir/rnp.cpp.obj] Error 1
444
make[2]: Leaving directory '/home/runneradmin/local-builds/rnp-build'
445
make[1]: *** [CMakeFiles/Makefile2:1215: src/lib/CMakeFiles/librnp.dir/all] Error 2
446
make[1]: *** Waiting for unfinished jobs....
447

@ni4
Copy link
Contributor

ni4 commented Apr 30, 2020

Okay, it seems I reproduced this hang locally. Will take a look...

@dewyatt
Copy link
Contributor

dewyatt commented Apr 30, 2020

Okay, it seems I reproduced this hang locally. Will take a look...

That's good news, maybe we don't have to wait for it to resolve itself after all. I wonder if it's hanging reading from a pipe or something. It's odd we haven't really changed anything though.

@ni4
Copy link
Contributor

ni4 commented Apr 30, 2020

@dewyatt I think it could be something updated in ctest or google test. MinGW uses latest ones (and it's a pain to use older versions, since those are not always available). Will investigate further.

@ni4
Copy link
Contributor

ni4 commented May 1, 2020

Update: I tried to downgrade MinGW packages without luck. Ended up attaching lldb to the hanged process, giving no callstack and hang in WaitForMultipleObjects(). So I think it is something in MinGW/MSYS core (search give some comparable problem in summer 2019).
I will give up it for a while and probably get back to it afterwards.

@dewyatt
Copy link
Contributor

dewyatt commented May 3, 2020

So I tried having a go at this too. I can see rnp.exe and rnpkeys.exe both hang after main().
Some observations:

  • It became a problem for us when the botan package was updated to 2.14.0 (because of commit randombit/botan@d1318e6).
  • If librnp is built as a static library, the issue goes away.

I'm not sure if it's a bug in winpthread's C++11 impl, or in botan's thread pool, or with our configuration.

I think some workarounds would be:

  • Use mingw-w64-x86_64-libbotan 2.13.0
  • Build botan ourselves with --without-os-feature=threads
  • Maybe use mingw-mcf (and keep threads enabled) which I believe offers an alternative C++11 thread impl

I also found at least one unrelated windows-specific bug, I'll do a PR for that next week.

@ni4 Are your findings looking similar?

@ni4
Copy link
Contributor

ni4 commented May 3, 2020

@dewyatt Thanks for your investigations. Actually I didn't suspect Botan update as a possible source of problems. Actually I was unable to get back to the case where things work well, so will try it tomorrow.

@ni4
Copy link
Contributor

ni4 commented May 4, 2020

@dewyatt I tracked down Botan to the minimalistic test which hangs when run alone:

TEST_F(rnp_tests, test_botan_hang_rsa)
{
    botan_rng_t rng = NULL;
    assert_int_equal(botan_rng_init(&rng, NULL), 0);

    botan_privkey_t rsa_key = NULL;
    assert_int_equal(botan_privkey_create(&rsa_key, "RSA", "1024", rng), 0);
    /* call below hangs the test if run with '1' as third parameter */
    assert_int_equal(botan_privkey_check_key(rsa_key, rng, 1), 0);
    botan_privkey_destroy(rsa_key);
    botan_rng_destroy(rng);
}

Internally it uses PK_Signer/PK_Verifier to test whether key is able to sign, and most likely this is the source of hangs (signing with RSA hangs as well).

Since now I need to focus on other things, what about to downgrade Botan in CI to the 2.13, and later get back to this (probably, creating issue in Botan's GitHub)?

@dewyatt
Copy link
Contributor

dewyatt commented May 4, 2020

@ni4 Confirmed, that makes sense since the thread pool is utilized in rsa_private_op (I suspect decryption would hang too).

Since now I need to focus on other things, what about to downgrade Botan in CI to the 2.13

Sure we should be able to do that. I'll do a PR today.

and later get back to this (probably, creating issue in Botan's GitHub)?

I did some brief testing and wasn't able to reproduce the hang with your minimal test when building with MSVC (with threads enabled), only with mingw-w64+winpthreads. So I lean towards it not being a bug in botan itself, but still maybe botan's configure.py can disable threads for this platform.

@ni4
Copy link
Contributor

ni4 commented May 8, 2020

To save it for later - I think this line from Botan's release notes introduced the hang: Use the library thread pool instead of a new thread for RSA computations, improving signature performance by up to 20%. (GH #2257).

@dewyatt
Copy link
Contributor

dewyatt commented May 8, 2020

To save it for later - I think this line from Botan's release notes introduced the hang: Use the library thread pool instead of a new thread for RSA computations, improving signature performance by up to 20%. (GH #2257).

Right that's the commit I linked to earlier - randombit/botan@d1318e6

@ni4
Copy link
Contributor

ni4 commented May 8, 2020

@dewyatt wow, somehow mislooked that, sorry, my fault.

@ni4
Copy link
Contributor

ni4 commented Jan 19, 2021

Closing this since it was fixed via #1371 and msys2/MINGW-packages#7640

@ni4 ni4 closed this as completed Jan 19, 2021
Ongoing automation moved this from To do to Done Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Ongoing
  
Done
Development

No branches or pull requests

3 participants