Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Bots benchmarks are not working #28

Open
p-jacquot opened this issue Apr 29, 2021 · 4 comments
Open

Some Bots benchmarks are not working #28

p-jacquot opened this issue Apr 29, 2021 · 4 comments

Comments

@p-jacquot
Copy link
Contributor

I noticed that there are Bots benchmarks that are not working with Hermitux.
The following programs are executed on a nova node of g5k. I have no difficulty running other benchs on this node.
Here they are :

strassen.omp-tasks

I think the latest bug fixes in Hermitux may have created bugs for this bench. Before the bug fixes, I was able to execute strassen with a n parameter equals to the value of 1024. But now, even with this value I'm not able to run it.
Here is the error shown by the program :

0x000000000020d803
/root/hermitux/hermitux-kernel/libkern/string.c:37 (discriminator 3)

Here is the last lines of the kernel logs :

[0.070][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x20d803, fs = 0x4d6508, gs = 0, rflags 0x11086, task = 1, addr = 0xffffff9f92746000, error = 0x2 [ supervisor data write not present ]
[0.070][0:1][ERROR] rax 0xffffff9f92746000, rbx 0x8000000000, rcx 0xffffff9f92746000, rdx 0xffffff9f92747000, rbp 0xffffffffcfc93a30, rsp 0xa2e108 rdi 0xffffff9f92746000, rsi 0, r8 0x1, r9 0x3f24e8c5a001, r10 0x22, r11 0x1246, r12 0xffffff800fc93a30, r13 0x1, r14 0x8, r15 0x1f92747
[0.070][0:1][ERROR] Heap 0x4db000 - 0x4de000

alignment.for-omp-tasks

This one never worked with Hermitux. Here is the error shown :

Sequence format is Pearson
Multiple Pairwise Alignment (20 sequences)
0x0000000000485901
??:?

Here are the last lines of the kernel logs :

[0.000][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x485901, fs = 0x4dadc8, gs = 0, rflags 0x11206, task = 1, addr = 0x9b4b68, error = 0x2 [ supervisor data write not present ]
[0.000][0:1][ERROR] rax 0xa16abc, rbx 0xa1b910, rcx 0x4, rdx 0x4, rbp 0xa16af0, rsp 0x9b4b60 rdi 0x15, rsi 0x28, r8 0, r9 0x956, r10 0x2, r11 0x9, r12 0x1, r13 0x4, r14 0xa, r15 0x672
[0.000][0:1][ERROR] Heap 0x4e0000 - 0x4e2000

uts.omp-tasks

I started executing this bench only a few days ago, and I noticed I doesn't work too. This one is known for using a high number of tasks, I don't know if it is helpful. It can be important, because the crash seems to be located in OpenMP's functions.

The following is obtained for an execution with OMP_NUM_THREADS=1 and HERMIT_CPUS=1

uts :

Root branching factor                = 2000.000000
Root seed (0 <= 2^31)                = 23
Probability of non-leaf node         = 0.333344
Number of children for non-leaf node = 3
E(n)                                 = 1.000032
E(s)                                 = -31250.000000
Compute granularity                  = 1
Random number generator              = SHA-1 (state size = 20B)
Root node at 0xa33710
GUEST PAGE FAULT @0x9b3ff8 (RIP @0x4781de)
0x00000000004781de
kmp_tasking.cpp:?

Here are the last lines of the kernel logs :

[0.060][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x4783c9, fs = 0x4d6628, gs = 0, rflags 0x11246, task = 1, addr = 0x9affd8, error = 0x2 [ supervisor data write not present ]
[0.060][0:1][ERROR] rax 0x485cc0, rbx 0x75d693d0a4c0, rcx 0x21a, rdx 0, rbp 0x9b00b0, rsp 0x9affe0 rdi 0, rsi 0x75d693d0a4c0, r8 0x188, r9 0xa32e0, r10 0x75d693d0a300, r11 0xc50ff8ec, r12 0x4d1a80, r13 0x75d693d0a380, r14 0, r15 0x9b0220
[0.060][0:1][ERROR] Heap 0x4db000 - 0x4de000

Providing the benchs

I'd like to give you the executable that are crashing, so that you can reproduce the errors, how should I proceed ?

@olivierpierre
Copy link
Collaborator

Thanks for reporting these bugs, could you create folders with self-contained version of the benchmarks following the models present here to have an easy way of reproducing the issues?

@olivierpierre
Copy link
Collaborator

Could you double check alignment_for? it's working fine on my computer, maybe it has been solved by one of the latest fixes?

@p-jacquot
Copy link
Contributor Author

Strange, I've checked on my personal laptop just before making the pull request and it wasn't working.
I'll try to dig a bit to see if it was an error of mine or not.

@olivierpierre
Copy link
Collaborator

After testing on a Grid5000 machine, I can indeed see the issue:

polivier@nova-18:~/hermitux/apps/bots/alignment/alignment_for$ make test
OMP_NUM_THREADS=4 \
HERMIT_VERBOSE=0 HERMIT_ISLE=uhyve HERMIT_TUX=1 \
HERMIT_DEBUG=0 HERMIT_SECCOMP=0 HERMIT_MEM=4G \
HERMIT_CPUS=4 /home/polivier/hermitux/hermitux-kernel/prefix/bin/proxy /home/polivier/hermitux/hermitux-kernel/prefix/x86_64-hermit/extra/tests/hermitux \
prog  -f ../prot.20.aa
Sequence format is Pearson
Multiple Pairwise Alignment (20 sequences)
GUEST PAGE FAULT @0x99fc00 (RIP @0x4024e1)
0x00000000004024e1
/home/polivier/hermitux/apps/bots/alignment/alignment_for/alignment.c:273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants