Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo sometimes hangs on Laplace test with qemu and SMP #12

Closed
jschwe opened this issue May 6, 2020 · 4 comments
Closed

Demo sometimes hangs on Laplace test with qemu and SMP #12

jschwe opened this issue May 6, 2020 · 4 comments

Comments

@jschwe
Copy link
Contributor

jschwe commented May 6, 2020

When running rusty-hermit demo on windows or macOS with 2 cores on QEMU the demo gets stuck in the laplace loop. On windows adding a println inside the loop fixes this behaviour for me and the demo runs fine. Can someone explain what is happening here?

The problem only occurs on qemu when

  1. running on windows or macOS (ubuntu always works). It appears all platforms are affected, but not always.
  2. SMP=2. With only one core the code always works.

The following is only tested on windows:
With SMP=3 we have the following behaviour:

  • Without my println! fix: it hangs (just as before)
  • With the println! fix: iteration 0 and 1 complete and it hangs after them

With SMP=4

Because of this issue hermit-os/kernel#40 is stuck, since the pipeline doesn't complete on windows and mac due to this error.

This Patch fixed the demo for me on windows with SMP=2.

Index: demo/src/tests/laplace.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- demo/src/tests/laplace.rs	(revision 4b74fa4889f60c27eda89c40d3b22598ffc54ff5)
+++ demo/src/tests/laplace.rs	(date 1588778712995)
@@ -55,7 +55,7 @@
 			iteration(&current[0], &mut next[0], size_x, size_y);
 		}
 		matrix.swap(0, 1);
-
+		println!("Finished laplace iteration {}", counter);
 		counter += 1;
 	}
 
@jschwe jschwe changed the title Demo hangs on Laplace test with qemu and SMP on windows and macOS Demo sometimes hangs on Laplace test with qemu and SMP May 8, 2020
@jschwe
Copy link
Contributor Author

jschwe commented May 8, 2020

Under Ubuntu with a 4 core CPU I've also made the following Observations. The execution hangs at the matrix calculations stage and only uses one core of my Host CPU cores, but that is utilized to 100%. This only for compilation with the debug profile. Everything runs fine with the release profile, regardless of the number of CPUs With 4 qemu cores and release profile everything slows down to a crawl too. I'm sure the multiplication would eventually finish, but the execution is much slower (seen by inserted printlns)

num qemu cores time to multiply a matrix time for laplace
1 with 64MB RAM 10.374278 s 8.782785 s
2 with 64MB RAM 10.408395 s 10.239706 s
3 with 64MB RAM 136.162215 s 12.804455 s
3 with 256MB RAM terminated after 10 minutes 13.389563 s

@jschwe
Copy link
Contributor Author

jschwe commented May 13, 2020

Compiling with libhermit-rs LevelFilter::Debug ( in libhermit-rs/src/logging.rs ) gives useful insight!
Attached are three logfiles. From the Logfiles it is apparent that something happens during the Laplace test when multiple cores are available that causes [1][DEBUG] Only Idle Task is available.
I included the relevant section (of the 2 Core situation) below. Things that may be related:

  • Deallocation of some stack after creating task 9.
  • sys_notify: invalid address to condition variable
  • The address of deallocated TLS is very near to previously created stack. Do they extend in different directions?
  • Switching of FPU owner: Is there only one FPU (owner) even though we have two cores, or is this just an inaccurate log message?
[0][DEBUG] Creating new task 8
[0][DEBUG] Create stacks at 0x3971000 with a size of 1088 KB
[1][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Received TLB Flush Interrupt
[0][DEBUG] Set up TLS at 0x3a85120, tdata_size 0x0, tls_size 0x108
[0][DEBUG] Creating task 8 with priority 2 on core 1
[1][DEBUG] Received Wakeup Interrupt
[1][DEBUG] Task is available.
[1][DEBUG] Switching task from 1 to 8 (stack 0x383BD50 => 0x3982F60)
[0][DEBUG] Creating new task 9
[1][DEBUG] Switching FPU owner from task 6 to 8
[0][DEBUG] Create stacks at 0x3B9B000 with a size of 1088 KB
[1][DEBUG] Deallocating stacks at 0x3A86000 with a size of 1088 KB
[0][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Deallocate TLS at 0x3b9a000 (size 0x1000)
[0][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] sys_notify: invalid address to condition variable
[0][DEBUG] Set up TLS at 0x3a86120, tdata_size 0x0, tls_size 0x108
[0][DEBUG] Creating task 9 with priority 2 on core 0
[1][DEBUG] Received TLB Flush Interrupt
[1][DEBUG] Create condition variable queue
[1][DEBUG] Blocking task 8
[1][DEBUG] Only Idle Task is available.
[1][DEBUG] Switching task from 8 to 1 (stack 0x3982F60 => 0x383BD50)

debug_log_2C_64M.txt
debug_log_4C_128M.txt
debug_log_1C_64M.txt

Edit: Tested on windows with rusty-hermit at commit: 2bee7b3

@stlankes
Copy link
Contributor

Can you check the current version in the branch devel? I hope that I fix the issue.

@jschwe
Copy link
Contributor Author

jschwe commented May 17, 2020

I tested locally on windows with 4 cores and this seems to have solved the problem.

The pipeline hermit-os/kernel#40 which tested the commit on ubuntu, windows and macos with 1 and 2 cores on qemu also completed without problems. Before the pipeline would get stuck when using SMP.

@jschwe jschwe closed this as completed May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants