Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfaults and other stability issues #24

Open
AlexDaniel opened this issue Aug 19, 2016 · 34 comments
Open

segfaults and other stability issues #24

AlexDaniel opened this issue Aug 19, 2016 · 34 comments
Labels
all bots Issues affecting all (or most) of the bots blocked ☹ Underlying issue is outside this repo, a ticket was filed whateverable Issues affecting the bot framework (and therefore all of the bots)

Comments

@AlexDaniel
Copy link
Member

See this: http://irclog.perlgeek.de/perl6/2016-08-19#i_13055233

Pretty sure that it is not our fault, but we have to rakudobug it.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 19, 2016

  • Seems like RT #129291 is the most common problem at this moment. Once that is fixed, we will probably see other issues.

@AlexDaniel AlexDaniel added the all bots Issues affecting all (or most) of the bots label Sep 21, 2016
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Oct 8, 2016

@AlexDaniel AlexDaniel changed the title “pointer … to past fromspace” and other weird crashes segfaults and other crashes Oct 8, 2016
AlexDaniel added a commit that referenced this issue Oct 9, 2016
We've been using non-parallel version of this script for a while now.
Instead, we just run this script several times on different ranges.

See whateverable issue #24 and RT #129781
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Dec 17, 2016

  • RT #129781 was fixed, next problem is that the process is not killed if there's a lot of stuff on stdout of Proc::Async. See RT #130370, but it's not a problem because a workaround has been added in commit c564d8d.

@AlexDaniel AlexDaniel added the testneeded Issue is generally resolved but tests were not written yet label Jan 6, 2017
AlexDaniel added a commit that referenced this issue Jan 10, 2017
Otherwise we choke ourselves trying to process the output.

See RT #130370

Partially addresses #24
@AlexDaniel
Copy link
Member Author

As of today, there are no segfaults. I'd still have to write tests for some cases mentioned here, but generally it is not an issue anymore.

@AlexDaniel AlexDaniel reopened this Mar 12, 2017
AlexDaniel added a commit that referenced this issue Mar 23, 2017
For a long time this script was suffering due to various instabilities
in rakudo (see issue #24). As of today, some of the bugs were fixed,
but at the same time this script itself was wiggled into some stable
state. I am afraid to touch it.

Therefore, committing what we have without any clean up.

In fact, there are some issues I can see right now, but the script is
proven to work in practice… 🙈
@AlexDaniel AlexDaniel removed the testneeded Issue is generally resolved but tests were not written yet label Jul 27, 2017
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Jul 27, 2017

Getting stuff like this:

MoarVM panic: Internal error: invalid thread ID 284 in GC work pass

Didn't look into it deeply at all, but leaving a note here anyway.

Can be reproduced by running t/bisectable.p6 on the server (sometimes you may get lucky and the whole file will pass, but usually it crashes half way through).

@tony-o
Copy link

tony-o commented Jul 27, 2017 via email

@AlexDaniel
Copy link
Member Author

Seems to be alright now after fixes by @jnthn++.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Aug 6, 2017

  • Currently most bots are leaking memory (which is why some things are slower than they were before).

@AlexDaniel
Copy link
Member Author

The leakage was reported in RT #131879, and right now it is fixed in a way that it does not leak as much anymore. The memory usage increases if you keep throwing non-existent commits into the bots, but given 16GB of RAM on the server this is hardly a problem.

Right now, the bots are stable.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Aug 26, 2017

  • Quotable does not work (and was not working for a while): RT #131961
    Greppable has a problem with it also, but it is more or less usable.

@AlexDaniel AlexDaniel added blocked ☹ Underlying issue is outside this repo, a ticket was filed whateverable Issues affecting the bot framework (and therefore all of the bots) labels Aug 26, 2017
AlexDaniel added a commit that referenced this issue Aug 27, 2017
Right now Quotable does not work anyway (see issue #24), but it should
start working as soon as the issue is resolved.
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 5, 2017

RT #131961 is resolved, waiting for the next bug to appear now.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 5, 2017

  • Well, didn't have to wait for too long. Most bots can't pass their tests, I don't know why yet. Things seem to hang.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 5, 2017

@AlexDaniel
Copy link
Member Author

Well, I guess it's not. Things are still broken though.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Oct 1, 2017

@AlexDaniel
Copy link
Member Author

OK, RT #132191 turned out to be an issue in IRC-Client (it was relying on a rakudo bug).

Now there are at least two other problems. Bisectable fails with this output:

ok 60 - Did you mean “HEAD” (new)?
# Failed to get expected result in 11.04535627 seconds (11 nominal)
not ok 61 - Did you mean “HEAD” (old)?
# Failed test 'Did you mean “HEAD” (old)?'
# at /home/bisectable/git/whateverable/t/lib/Testable.pm6 (Testable) line 81
# expected: ["testable742093, Cannot find revision “DEAD” (did you mean “HEAD”?)"]
#  matcher: 'infix:<~~>'
#      got: []
# Test failed. Stopping test suite, because PERL6_TEST_DIE_ON_FAIL environmental variable is set to a true value.
# Failed to get expected result in 11.04317088 seconds (11 nominal)
not ok 62 - _
# Failed test '_'
# at /home/bisectable/git/whateverable/t/lib/Testable.pm6 (Testable) line 81
# expected: [-> ;; $_? is raw { #`(Block|84942264) ... }]
#  matcher: 'infix:<~~>'
#      got: []
# Test failed. Stopping test suite, because PERL6_TEST_DIE_ON_FAIL environmental variable is set to a true value.

There is no reason why test 61 would fail. Actually, it passes if you put it higher in that file. I don't know what's going on there, but most likely it's an issue in rakudo.

The second problem is that it runs some other test after the first test failed. Why? It should not be like that.

@AlexDaniel
Copy link
Member Author

Ah OK, the ‘_’ test is an issue in whateverable. Nevermind that. Why does it fail in the first place is beyond me however.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Nov 19, 2017

The code involves a lot of calls to Text::Diff::Sift4 module, but nothing special really. This issue didn't exist a few releases ago, and I really am not sure when this happened exactly.

The same test works fine in committable.t, and actually if you move these tests higher in the bisectable.t file, they will pass. Really weird stuff going on.

@AlexDaniel
Copy link
Member Author

Alright, some progress on that! First of all, it doesn't hang, it segfaults. The reason I was thinking that it hangs is because the test suite does not really detect if the bot process dies unexpectedly, so there was no easy way to notice. Now I have some code that will help notice the issue in the future, will commit that soon.

Now, the segfault happens in the react block here:
https://github.com/perl6/whateverable/blob/e9ccebadca9a44e4a27a2325737308828568786b/lib/Whateverable.pm6#L220-L232

So, that's easy now, right? Just run it under valgrind and you'll immediately see the issue…

Ha-ha.

Nope. You run it under valgrind, and the issue goes away. 💩

I'm suspecting that we may be seeing something like rakudo/rakudo#1202 here, but it's hard to tell.

@AlexDaniel
Copy link
Member Author

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Nov 28, 2017

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Nov 29, 2017

  • Bots are currently leaking memory like crazy. I will probably turn off some of them so that they don't max out the memory usage on the server.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Dec 8, 2017

  • Just had this intermittent fail:
Cannot find method 'specialize' on object of type NQPClassHOW

On this line:
https://github.com/perl6/whateverable/blob/46337991a954885fe4c535319275bbb6f797b391/lib/Whateverable.pm6#L326

I cannot reproduce so we will just let it be…

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Jan 17, 2018

AlexDaniel added a commit that referenced this issue Jan 27, 2018
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Feb 8, 2018

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Feb 13, 2018

Again, there's nothing special with this test. And if you look closely, previous tests have been commented out because they were causing another segv previously. Here's the ticket: rakudo/rakudo#1259

@AlexDaniel AlexDaniel changed the title segfaults and other crashes segfaults and stability issues Feb 16, 2018
@AlexDaniel AlexDaniel changed the title segfaults and stability issues segfaults and other stability issues Feb 16, 2018
@AlexDaniel
Copy link
Member Author

Bots no longer leak memory like crazy, so that issue is resolved. Bisectable still can't get through its tests though.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Mar 6, 2018

@AlexDaniel
Copy link
Member Author

OK issue #296 can be workarounded like this:

-my $host-arch = $*KERNEL.hardware;
+my $host-arch = ‘x86_64’;
$host-arch = ‘amd64’|‘x86_64’ if $host-arch eq ‘amd64’|‘x86_64’;
-$host-arch = $*KERNEL.name ~ ‘-’ ~ $host-arch;
+$host-arch = ‘linux’ ~ ‘-’ ~ $host-arch;

Heh. Not committing this to the repo because I'm hoping it'll get resolved relatively quickly.

@lizmat
Copy link
Contributor

lizmat commented Mar 7, 2018 via email

@AlexDaniel
Copy link
Member Author

@lizmat tried that, no difference. Further discussion here: rakudo/rakudo#1595

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Mar 26, 2018

@AlexDaniel
Copy link
Member Author

  • Some sort of rakudo bug I guess? 8e206d5

@AlexDaniel
Copy link
Member Author

In 2023, I hit a few issues that were part of #388:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all bots Issues affecting all (or most) of the bots blocked ☹ Underlying issue is outside this repo, a ticket was filed whateverable Issues affecting the bot framework (and therefore all of the bots)
Projects
None yet
Development

No branches or pull requests

3 participants