Thread panicked #485

joshfactorial · 2023-02-09T14:48:40Z

I'm getting a weird error when I try to use this profiler as part of a batch submission job. The error logs I'm seeing look like this:

=fil-profile= Memory usage will be written out at exit, and stored in profile_200M..
=fil-profile= You can also run the following command while the program is still running to write out peak memory usage up to that point: kill -s SIGUSR2 43523
thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', filpreload/src/lib.rs:138:33
stack backtrace:

Then, that's it. The program never starts. The same command on the command line works fine. Here is my input command:

fil-profile --no-browser -o profile_200M run -m \
    neat --log-level DEBUG \
        --no-log \
        model-seq-err \
        -o DefaultSingleEndedBinned \
        -i reads/sub200M_read1.fq \
        --overwrite

The text was updated successfully, but these errors were encountered:

itamarst · 2023-02-09T18:26:56Z

Sorry it didn't work!

What version of Fil are you using, what version of Python, and what OS?

joshfactorial · 2023-02-10T14:59:47Z

fil-profile v 2022.7.1
RHEL 7.9
slurm 21.08.8-2
Python 3.10.8

itamarst · 2023-02-10T18:09:51Z

Hm, I guess I should get Conda packages updated...

itamarst · 2023-02-10T18:12:52Z

You can try installing with pip meanwhile as a workaround, to see if that helps; the latest version there is 2023.1.0.

itamarst · 2023-02-12T15:06:12Z

Conda-Forge now has up-to-date packages (2023.1.0), so if that's what you were using, can you retest? Thank you!

joshfactorial · 2023-02-23T19:05:40Z

I installed the update, but haven't had a chance to check the results yet.

joshfactorial · 2023-02-23T19:06:58Z

Okay, I have a different error, but unrelated. I think that means it at least got past the thread-panicked problem.

itamarst · 2023-02-23T21:03:22Z

Great (and not so great). Tell me more about the new error!

joshfactorial · 2023-03-14T18:20:02Z

Okay, well our server was down for a bit, but I was able to fix the issues I was seeing in my code and re-run, and I'm still getting this error: thread '' panicked at 'already borrowed: BorrowMutError', filpreload/src/lib.rs:144:29

itamarst · 2023-03-14T18:27:54Z

Sorry it's not working, I'll take a look.

itamarst · 2023-03-14T18:29:05Z

Oh and can you:

Set environment variable RUST_BACKTRACE=1, rerun, and then post the whole traceback it prints
Tell me which version and OS exactly you're running on?

Thank you!

joshfactorial · 2023-03-14T19:26:32Z

Release:


NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"

The backtrace seems to have revealed nothing:

=fil-profile= Memory usage will be written out at exit, and opened automatically in a browser.
=fil-profile= You can also run the following command while the program is still running to write out peak memory usage up to that point: kill -s SIGUSR2 16747
thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', filpreload/src/lib.rs:144:29
stack backtrace:
slurmstepd: error: *** JOB 4604 ON vfc002 CANCELLED AT 2023-03-14T14:04:15 DUE TO TIME LIMIT ***

joshfactorial · 2023-03-14T19:43:41Z

However long I run it, it starts, hits that error, continues to run for the duration, doing nothing. The profile is a bunch of memory usage on imports then nothing.

itamarst · 2023-03-14T19:54:53Z

Just talking through the code to remind myself (have to head out soon):

At a broad level, re-entrancy is supposed to be prevented by a flag that gets incremented and decremented around calls into Rust from _filpreload.c. The general structue is "if should_track_memory(): increment() then run() then decrement()"
All the code appears to be structured that way appropriately. This is an assumption that should be verified more carefully.
CORRECTED: The specific issue is inside an add_allocation() function. Which suggests something somewhere is doing some other interaction with the callstack (set/clear/start/finish) which triggers allocation and somehow is reentrant.

joshfactorial · 2023-03-14T19:56:29Z

Could it be something in the way the slurm scheduler works? I apologize ahead of time for my lack of knowledge of all the inner workings lol.

joshfactorial · 2023-03-14T19:59:46Z

It's an HPC cluster I'm running it on. I do have some ability to add some libraries and such, if that's the problem.

itamarst · 2023-03-14T20:09:27Z

Probably not slurm. It's possible it's Redhat 7.9? Which BTW is losing extended support in a year, after which you'll have to pay extra for security updates (something to hassle the cluster administrator about).

But that's just random guessing, I would have to figure out mechanism. I will read code some more and think.

Could you tell me what libraries you're using? Are you using threads?

joshfactorial · 2023-03-14T20:16:49Z

It's all Python:

python 3.10
biopython 1.79
pkginfo
matplotlib
numpy
tqdm
pyyaml
pip
scipy
pytest
bedtools
htslib
pybedtools
pysam
frozendict
poetry 1.3

As far as Red Hat goes, it's been a whole thing that I stay out of lol. Above my paygrade.

joshfactorial · 2023-03-14T20:17:33Z

It's single threaded at the moment. I tried requesting more processers from the server to see if that solved it, but it had no effect.

itamarst · 2023-03-14T21:06:56Z

Hm. I think I may've found one place that could be causing the issue.

itamarst · 2023-03-17T15:26:29Z

I will try to do release later today.

itamarst · 2023-03-19T14:05:28Z

Release with fix is out.

itamarst mentioned this issue Mar 14, 2023

Attempt to fix reentrancy bugs #500

Merged

itamarst linked a pull request Mar 14, 2023 that will close this issue

Attempt to fix reentrancy bugs #500

Merged

itamarst closed this as completed in #500 Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread panicked #485

Thread panicked #485

joshfactorial commented Feb 9, 2023

itamarst commented Feb 9, 2023

joshfactorial commented Feb 10, 2023

itamarst commented Feb 10, 2023

itamarst commented Feb 10, 2023

itamarst commented Feb 12, 2023

joshfactorial commented Feb 23, 2023

joshfactorial commented Feb 23, 2023

itamarst commented Feb 23, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

itamarst commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023 •

edited

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

itamarst commented Mar 17, 2023

itamarst commented Mar 19, 2023

Thread panicked #485

Thread panicked #485

Comments

joshfactorial commented Feb 9, 2023

itamarst commented Feb 9, 2023

joshfactorial commented Feb 10, 2023

itamarst commented Feb 10, 2023

itamarst commented Feb 10, 2023

itamarst commented Feb 12, 2023

joshfactorial commented Feb 23, 2023

joshfactorial commented Feb 23, 2023

itamarst commented Feb 23, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

itamarst commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023 • edited

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

joshfactorial commented Mar 14, 2023

itamarst commented Mar 14, 2023

itamarst commented Mar 17, 2023

itamarst commented Mar 19, 2023

itamarst commented Mar 14, 2023 •

edited