Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd Program Failure #2199

Closed
tweekley49 opened this issue Feb 19, 2020 · 12 comments
Closed

Odd Program Failure #2199

tweekley49 opened this issue Feb 19, 2020 · 12 comments

Comments

@tweekley49
Copy link

tweekley49 commented Feb 19, 2020

image

Can't figure out why I am getting a program failure. Could it be because I am running out of memory? Would that be a cause for the program failure to cause a kernel panic? I am benchmarking a rather large-ish issue....oddly enough on this system it says program failure but on another system it runs for a moment and then gives me a kernel panic. Picture below is the kernel panic. Any thoughts?? (Isn't the program, runs completely fine off the unikernel)

image

@tweekley49 tweekley49 changed the title cd Odd Program Failure Feb 19, 2020
@fwsGonzo
Copy link
Member

I don't know - thats a weird crash. You could always give it more RAM, just change the "mem" value in vm.json. Also, try updating your branch to my latest simpler_clone branch. No need to run conan for anything.

@tweekley49
Copy link
Author

I will do that and let you know of the results. I think I am on the latest simpler_clone branch, but I will verify that.

@tweekley49
Copy link
Author

image
I am STUMPED. I am able to run the program everywhere else without any problem but for some reason I get the error above when I run it inside IncludeOS. I made sure qemu had what it needed so I am perplexed.

@fwsGonzo
Copy link
Member

fwsGonzo commented Feb 21, 2020

It's probably a bug in IncludeOS then, it's calling TKILL on an invalid tid which is never good. I don't have any things for you to try, but can you show your code? Do you use cooperative threads? Are you using SMP for multiprocessing?

@tweekley49
Copy link
Author

The code I am using is located at github.com/ProfessorWest/splash2-posix and I am using radix in the kernel folder.

Code isn't pretty, we just smashed everything into one file for simplicity. Also, no cooperative threads are being used and I updated src/musl/futex.cpp as you recommended to stabilize multi-processing.

@fwsGonzo
Copy link
Member

I'll take a look at this when I have some time. Definitely interesting test. It looks like it uses all kinds of pthread stuff which might not be tested yet.

@tweekley49
Copy link
Author

Awesome! Do let me know if your findings!

@tweekley49
Copy link
Author

I fixed the problem. So inside of vm.json, I had to give more CPUs and had to make sure I gave enough memory too!

@fwsGonzo
Copy link
Member

Good to hear! Yes, there is no SMP without -smp XXX to Qemu.

@tweekley49
Copy link
Author

I will be attempting to bare metal boot it next with the benchmark to see the speed up from qemu -> bare metal

@tweekley49
Copy link
Author

tweekley49 commented Mar 2, 2020

Interestingly enough...it appears the TKILL error is super random. Sometimes it does it, some times it doesn't. Sadly, specifying SMP and adequate memory was not the fix.

@fwsGonzo
Copy link
Member

fwsGonzo commented Mar 7, 2020

Yep, that's a known problem. I've been trying to fix it for a while but I'm having a break now. There is clearly some sharing in the thread code, so the only multiprocessing-safe interface is the SMP interface itself. Threads do work, but only if they can start properly. If you use a threadpool and create the threads on startup, then as long as it doesn't crash at the start, it will remain up forever.

So, the problem is during thread creation. I don't have any ideas atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants