New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valheim Dedicated Server Crashes Post Unity 2022 Update #1182
Comments
Valheim server still works using Box64 v0.2.4 on my end (RPI4 4GB, Ubuntu 64bit), but fails to start using v0.2.6. However, starting with v0.2.4 leads to a server crash after an arbitrary amount of time (leaving behind mono_crash memory dumps) |
@sea212 did you try with current version also? Can you bisect the issue if it's still broken? |
@ptitSeb I'll provide more information shortly. |
When tried to bisect I realized that I can't reliably reproduce the error that the server hangs on launch. Sometimes it does, most of the times it seems to work. |
If it's a random issue, you can try to start the server with |
To clarify, should I stay on a more recent built of box64, or revert to 0.2.4 before trying this flag? Is there any way to make them smaller? Thanks again for the help |
Stay on current, it would be easier for me. Als, if it just work with
You can use |
@ptitSeb Ok,
I ran this below and was able to generate a more interesting log
In particular
Which is odd, previously libpulse-mainloop was only needed for crossplay, so maybe the default behavior is now crossplay on? |
I'm on an RPI 5 running Ubuntu Server 23.10, getting the same issue as those above - silent crashing after an arbitrary period of time. I'm running
To note, I'm seeing occasional
Let me know if there's anything else I can provide. |
In addition to the above, I took a log of my last run of the game using the following shell script: #!/bin/bash
export templdpath=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=./linux64:$LD_LIBRARY_PATH
export SteamAppId=892970
export BOX64_SHOWBT=1
export BOX64_LOG=1
export BOX64_TRACE_FILE=log.txt
export BOX64_DYNAREC_STRONGMEM=1
echo "Starting server PRESS CTRL-C to exit"
# Tip: Make a local copy of this script to avoid it being overwritten by steam.
# NOTE: Minimum password length is 5 characters & Password cant be in the server name.
# NOTE: You need to make sure the ports 2456-2458 is being forwarded to your server through your local router & firewall.
./valheim_server.x86_64 \
-name "something" \
-port 2456 \
-world "something" \
-password "secret" \
-nographics \
-batchmode \
-public 0 \
export LD_LIBRARY_PATH=$templdpath Pardon my ignorance, but it's curious to see so many errors about |
I am having a similar issue. I do not have BOX64_DYNARC_STRONGMEM set to 1. Whats interesting is, sometimes the Valheim service will run fine for a bit (up to 8 hrs it ran once!), sometimes it will only live for roughly 5 minutes before silent failure. Recently though, the Valheim service has been silently failing typically after an hour, usually sooner in recent days.. can't get it to run up to anywhere near 8 hours again... If it would help for me to run and trace logs I can do so if it helps resolve this sooner. I am a novice at linux and bash however but I think I could handle running a similar script to that mentioned above. |
@nitroinferno what if you run with |
@ptitSeb How do I run the valheim service with BOX64_DYNAREC_STRONGMEM=1 from terminal? For example I run the server exec from valheim.service, and typically do 'systemctl start valheim' right in the terminal. |
The simpler would be to use either [valheim_server.x86_64]
BOX64_DYNAREC_STRONGMEM=1 And the parameter will automatically picked up. |
@ptitSeb I already ran it using |
Thank you, I set it under the /etc/box64.box64rc, since the service runs from /etc/ directory. Seems the program is staying alive stable now (could also be luck of the draw). However, its been running for just a bit over 2 hours, which it hasn't done for awhile. I will continue to monitor it. Its been quite some time that it ran for this long without the silent failure. To sea212's point, I do have mono_crash.mem files within ~/home//valheim_server/ directory. There are 7 files to be specific. Believe these are from the silent failures earlier in the day, I did not reboot since modifying the box64.box64rc Update: Seems it was luck of the draw it ran 2 hours. After restarting it didn't stay online for longer than 5minutes, then subsequent restarts wouldn't last beyond 15m. |
Did a bit more digging - after my server ran for 40hours straight, no issue hosting up to 3 players at different times. Until silently crashing again at 40 hours, when no one was logged in. Since then it typically fails after 15-45min. I went into the directory of the non-responsive PID ( /proc/$[PID] ) to do some investigation. I found the following: Using Sudo cat /proc/$[PID]/syscall it returns something along the lines of: This does not change, while the program is stuck.
Using Sudo cat /proc/$[PID]/stack it always returns the following once the program is stuck:
This is all a bit over my head - but hoping this may help determine whether this is box64 related or not.. |
I believe there may be 2 issues at hand. The crashes on initialization / startup sometimes generate the mono_crash.mem files (these crashes are RARE - at least in my case). I overclocked by rpi4 to 2GHz and it stopped any and all crashing on startup / initialization. I am suspecting they may be somehow tied to the lower end clock rate of the CPU? (I have no gauge whether this makes logical sense or not) I created a service which monitors to see if the primary .x86_64 PID has become stuck in 'Sleeping' status as seen in htop. This is an easy work around to the sporadic failed starts. My main issue is how to diagnose or provide useful info on when the main process gets stuck in 'sleeping' state. I tried running with BOX64_LOG=1 but I don't think it provided any useful info; it pretty much looked similar to #1182 (comment). I Tried running BOX64_LOG=2 and 3, and it both generated a log file that was enormous and it couldn't get through a startup / initialization routine. I believe the RPI4 isn't powerful enough to handle all this logging and initialization simultaneously. From strace -c I get a lot of errors reported for futex, restart_syscall, and read.
I suspect the issue is a deadlock or resource contention issue causing the sporadic silent failures, where the main process cannot be returned to 'running' status after going to 'sleep'. Is there any BOX64 settings I can tweak to avoid this, or methods I can use to generate more useful log data? note: I tried using gbd to get useful trace info logs both when running, and stuck in sleeping state. Not sure how useful these are I will attach one of each to this post. |
Thanks for those analysis, that some interesting stats. It seems this is some multitasking issues. My feeling is that |
Hi, author of the guide in OP here. I've not had much time to look into this myself and I also brought down my original Ampere instance in OCI. The instance was running the latest version of Valheim I'm in the process of spinning up a new OCI instance using the same specs and will do a clean install of Valheim to narrow down the exact packages. Edit: I forgot to add that the Valheim server was running over night with no signs of issues (total runtime was about 20 hours prior to teardown). Edit2: Instance is up and has been running for about 3 hours, no additional steps have been done other than running the installation script from my guide. Update 1: The server usually logs the amount of connections every 10 minutes. Update 2: After restarting from the last run and letting it run over night the server ran for just over an hour and deadlocked with no players online. Update 3: It seem like the packages I mentioned was a false report, I must have used them for something else. I installed the Terraria Server and it started without adding more packages than were already installed, in any case I will start looking at using box64rc to see how the Valheim server behaves. |
Hi All, [valheim_server.x86_64] EDIT: |
Similar to @ebarrragn I set the following: Servers been MUCH more stable and running for long hours on a pi4 4gb. ran for 43 hours the other day which it has never done before for me. I believe modifying the settings may be the solution. I will have to try with bigblock = 3. |
I've been trying to run the server in OCI on Ampere trying out a variety of environment variables but not getting any positive results, using the following environment variables:
The tracefile is spammed of the following segfaults triggered by FillBlock:
https://gist.github.com/husjon/a94b6760e036e83d0b67a04e3916033d#file-valheim-box64-deadlock-01-log Update:
normal startup log: https://gist.github.com/husjon/a94b6760e036e83d0b67a04e3916033d#file-valheim-box64-normal-log |
Make sure you are using latest version box64 also. Or at least the same version across those various tests. |
I've been on v0.2.6 for all tests so far, but I'll take a look at building and testing towards latest on main a bit later. |
After settings Update 1: The server has now been running for close to 30 hours and still chugging along, still I feel it might be too early to say for sure so I'll continue to monitor the server. |
My server is encountering a crash after 5-15mins which subsequently throws a "failed to connect" when trying to reconnect. We end up needing to restart the server to connect successfully only for it to eventually crash again. This was with just 1-2 people to test the variables. I'm running the suggested variables mentioned above:
|
@Pohtaytoh Try setting the bigblock above to =2. Just a shot in the dark but mines been consistently working 40+ hours with it set to that with other settings above, though I'm hosting on a rpi4 without OCI ampere. |
@nitroinferno environment variables set directly on the executable should be taking precedence over any configuration files (that is at least that's what normal). I think the reason why you didn't see any output about them being set is because you did not have |
Ah that makes sense thanks @husjon! Another interesting thing is when the instance autosaves I get a bunch of sigpwr and sigxcpu signals - also sometimes it throws segfaults when running (this current iteration of the program running isn't throwing segfaults though). I am using v0.2.7 57ca9df built on Jan 18 2024. I'm a novice (just started using linux in Dec.) and not even sure on how to update the box64 version and whether I need to run through cmake stuff again or what. Another interesting bit is my program creates 44 threads (believe the ampere creates 36?) And mine consumes a LOT of cpu time/usage (see end code block). Guessing the following may be because weakness od using rpi4 4gb?
Lot of cpu usage compared to actual runtime
|
@nitroinferno depending on how you installed box64 but if you built it yourself after cloning the repository you could enter the folder, run As for the cpu load, it could be due the lower cpu frequency on the raspberry pi 4 (1.5GHz). Regarding threads, currently my OCI instance runs with 40 threads for Valheim with box64 I've attached strace to see if I see anything similar regarding signals. |
Good and bad news, server crashed after about 3.5 hours, this was without any variables set other than LOG and TRACEFILE. Update: second run lasted 2 hours. I will start looking into configuring box64 now. |
It sounds like a potential solution for now would be to run box64 on v0.2.6 since that appears to be working. |
@Pohtaytoh for OCI Ampere I agree that v0.2.6 should be fine as long as the configuration is set correctly. |
Ok, I reverted to box64 v0.2.6 on my server and it seems much more stable so far. Will report back in a day to see if we get the same results you did @husjon Update1: It's been just about 8hrs since I reverted to 0.2.6, and this has been the longest my server has been up without crashing since the Valheim update. I've been logging in every hour or so and had a friend log in with me at one point. Very stable so far. Update2: Confirmed this morning (~19hrs since revert) that the server is still up. This is the longest we've been able to keep it running so far, and I believe this was the missing piece to replicating @husjon's results. Results: Running on oracle cloud's Ampere A1 instance, VM.Standard.A1.Flex with the following environmental variables:
|
With the exact same config file (strongmem parameter), latest box64 is not stabble with server? |
Also, note that the new box64 has a secret strongmem=4 parameter, that should be equivalent to the strongmem=3 of v0.2.6 |
I tried running latest box64 with the following configuration but it will not start up correctly.
It gets to a point shortly after startup (after about about 10-15 seconds) where it logs the following and locks completely.
On a normal startup, this line stay for just a few moments until the next lines pop up, like:
I tried adjusting strongmem up to 4, no luck as of yet. The best so far where I've been able to connect is with latest box64 and the following configuration:
Update: Seem like bigblock was causing me some issues, got it running now with the following:
As in I had to explicitly set bigblock to 0 Update 2: So far the server have been running for 24 hours using
Update 3: Server still chugging along (48hours) using the above mentioned configuration (from Update 2). |
That looks promising. After setting the parameters to:
My OCI instance is now running the server for 11 hours straight without crashing. That is the longest it was able to run since November last year. On a side note, I noticed the CPU usage for the process increased considerably, is that expected with strongmem = 4? |
Yes, the the higher the strongmem parameter, the higher the cpu usage for the same workload. That's excpected. |
3 days in and the server have been behaving really nicely using the following configuration on v0.2.7-ea86b0e3.
I have not been able to play much but again hopping onto it every now and then, the important part is that the server is running, which it was barely able to do before. I did drop a comment in my guide letting people know of a procedure if they wanted to try out v0.2.6, using the parameters mentioned in #1182 (comment), so far I've yet to hear anything (which could mean either no one have tried it yet or they are still testing. :) |
Just to chime in as well. Going on 4 days, and server is still going strong since we reverted to 0.2.6. |
Hi, sorry if I'm not good at explaining. Here the crash report: ================================================================= My settings: Dynarec for ARM64, with extension: ASIMD AES CRC32 PMULL ATOMICS SHA1 SHA2 PageSize:4096 Running on Cortex-A76 with 4 Cores
Any help? thank you |
Update from my last comment #1182 (comment). Almost forgot about the server since I haven't had time to play over the last week, but just connected to it and is still running with the following configuration (also mentioned in previous comment)
A few people in my guide have also started trying the procedure using In my case I'm happy with the results, as soon as I hear more from the few extra people testing, I'll update here. |
Hello, I can also help with testing. I have a "VM.Standard.A1.Flex" instance with 20GB memory. I am still new to this stuff so some of these things are beyond me, but would you be able to help me with a couple questions?
Thanks! |
My server has been running on my raspberry pi 4 now that only has 4GB of RAM for over a week with no crashes! Me and my friend hop on and play for hours at a time no problems. Still using the version I mentioned before: v0.2.7 57ca9df Valheim developers minimum recommended RAM is 8GB. So 20 is probably a little overkill 10 GB should be fine. |
@dre3002 that should be fine. |
Does it work with crossplay too? |
Again I've been too busy with work etc so completely forgot about the Valheim server. Not sure if increasing STRONGMEM could help. The crash seem to be Mono related from what I can see from the systemd logs.
Update 2024-03-22: After the restart (after the mentioned crash) the server is still running fine (Box64 with Dynarec v0.2.7 ea86b0e built on Feb 17 2024 12:21:23). [valheim_server.x86_64]
BOX64_DYNAREC_BLEEDING_EDGE=0
BOX64_DYNAREC_STRONGMEM=3 |
Upon updating box64 from v. 0.2.7 ed2697d to v. 0.2.7 eda857c, my Valheim OCI server is now once again silently crashing.
But now it doesn't make any difference to use those values.
Is anyone else having a similar issue? |
I have not had much time to look into it myself but since my last update (#1182 (comment)) the server was running for just over a month (using v0.2.7 ea86b0e). I just rebuilt box64 with the commit (v0.2.7 eda857c) mentioned by @tiagojofran and restarted the server, no issues so far, but I'll check in on it a bit later today. |
I forgot about the server, bogged down with work. @tiagojofran you didn't mention, but did you also update the Valheim server? I'll update it now, restart it and let it run for a while. Updated: Just updated the server to |
Hi, @husjon. |
Hm, I've not been able to replicate the issue @tiagojofran Shouldn't make much of a difference, but these are the environment variables I use for box64.
Update 2024.04.17: server is still chugging along with the version mentioned above. |
Thank you for testing it, @husjon, I think we can now assume that the issue is probably only affecting my system. |
@husjon hoping you can shed some light on the use of v0.2.7 eda857c or v0.2.7 eda857c. When you stated its still chugging along with respect to v0.2.7 eda857c, did it have faults as I do below? Most of these faults are due to mono lib it seems since these generate the mono-blob files. I am still using v0.2.6 stable branch with the following params.
For whatever reason if I set BIGBLOCK to 0, it causes a lot of issues on a pi, I flagged multiple short duration faults on Apr 28th when I tried to use Bigblock=0. Either way for the most part it runs smooth anywhere from 12-24hrs. I am thinking about rolling it forward to v0.2.7 eda857c, however wasn't sure if that was less faulty with respect to mono related crashes, or whether v0.2.7 eda857c was either. nitro@raspberrypi:~ $ journalctl -u valheim | grep -i exited
Mar 04 13:02:06 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
Mar 04 13:13:17 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
Mar 04 21:03:57 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
Mar 05 23:03:18 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 08 06:03:06 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 10 09:36:38 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 11 04:08:51 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 11 22:21:56 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 14 02:02:16 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 18 04:39:19 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 19 00:13:36 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 19 14:46:33 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 21 06:03:23 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 21 22:08:01 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 27 13:42:31 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Mar 28 08:59:51 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 29 16:48:01 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Mar 30 00:19:53 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 01 07:07:59 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 02 08:55:50 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 02 16:28:19 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 04 10:43:54 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 08 16:03:09 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 08 23:50:41 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 09 23:22:53 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 11 10:55:55 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 12 15:43:33 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 15 15:40:13 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 18 12:22:02 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 19 11:27:54 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
Apr 19 20:43:00 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 20 22:19:27 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 21 12:08:44 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 21 12:12:38 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 22 04:03:24 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 23 14:15:34 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 23 19:50:41 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 26 09:05:31 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 26 12:27:22 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 27 09:02:43 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 27 23:57:28 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
NOTE:~ $ Apr 28 20:32:54 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
NOTE:~ $ Apr 28 21:05:41 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
NOTE:~ $ Apr 28 21:09:31 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
NOTE:~ $ Apr 28 21:45:20 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=9/KILL
Apr 29 09:09:45 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
Apr 29 12:23:10 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
Apr 30 11:27:14 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=5/TRAP
May 01 19:19:13 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT
May 02 00:08:26 raspberrypi systemd[1]: valheim.service: Main process exited, code=killed, status=6/ABRT |
@nitroinferno from what I've gathered from the previous comments, there is a quite a difference running it on OCI versus a Raspberry Pi. As for status on the server, it is still running.
I see you have a Raspberry Pi 4, so it will not be an exact match but I'll do some testing on my Raspberry Pi 5 (I unfortunately do not have a Pi 4 available) over the weekend. |
Oh wow 2 weeks uptime with no faults! With this I think ill roll my box64 forward to v0.2.7 eda857c. I'll Give a test run and report back. What's interesting is when I was using build: v0.2.7 57ca9df it ran best with BIGBLOCK=3, however v0.2.6 cannot use bigblock=3 at all. It will not even get through start sequence. There were not a lot of crashes with v0.2.7 57ca9df, however it would instead silently lock up about the same amount: 12-24hr intervals. |
Hello there,
I had been running Valheim dedicated server on Ampere arm based server, using the below install script:
https://gist.github.com/husjon/c5225997eb9798d38db9f2fca98891ef
This was working for quite some time.
However, as of a November 7th update of the server and clients to Unity 2022, the server will disconnect clients and then crash about 20 seconds after a client connects. I can confirm I am pulling and compiling a fairly recent version of box64:
Finally, I would like to submit logs, but the logfiles generated are over 350 MB in size. How would I submit them here?
I am generating these logs via
BOX64_LOG=2 BOX64_TRACE_FILE=valheim_arm_crash.txt /home/ubuntu/valheim_server/valheim_server.x86_64 -nographics - batchmode -port 2456 -public 1 -name serverName -password sometemporarydummypassword -savedir /home/ubuntu/valheim_data
Thank you for your time and for this awesome project.
The text was updated successfully, but these errors were encountered: