Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in Map::getSectorNoGenerateNoLock whilst pathfinding #13837

Open
pietru2004 opened this issue Sep 23, 2023 · 26 comments
Open

Segfault in Map::getSectorNoGenerateNoLock whilst pathfinding #13837

pietru2004 opened this issue Sep 23, 2023 · 26 comments
Labels
Bug Issues that were confirmed to be a bug @ Script API @ Server / Client / Env.

Comments

@pietru2004
Copy link

pietru2004 commented Sep 23, 2023

Minetest version

Minetest 5.8.0-dev - server
Minetest 5.7.0 - client
docker

Operating system and version

Linux server - docker

Summary

My server self closes without any info why it happens

I run minetest using docker compose file that looks like this

version: '3.7'
services:
  minetest:
    image: registry.gitlab.com/minetest/minetest/server:master
    restart: unless-stopped
    command: " --port 25564 --gameid minetest --world server_world_technic" #--verbose 
    volumes:
      - ./data:/var/lib/minetest
      - ./conf:/etc/minetest
    #environment:
    #  port: 25564
    #  gameid: "1"
    #  worldname: "World"
    ports:
      - 25564:25564/udp

Server world config
world.txt
Server debug log
debug.txt

The part I mean is
obraz
at this point I get Access denied. Connection timeout.
and I think it might be related to docker downloading dev version (I fought it would download stable version as I was following production server setup for docker part of running server)
Also running server with --verbose does not change anything, it still looks similar to log

GDB Output

Thread 8 "Server" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 122]
0x000055a0bb647599 in Map::getSectorNoGenerateNoLock(irr::core::vector2d<short>) ()
#0  0x000055a0bb647599 in Map::getSectorNoGenerateNoLock(irr::core::vector2d<short>) ()
#1  0x000055a0bb647963 in Map::getNode(irr::core::vector3d<short>, bool*) ()
#2  0x000055a0bb6a6687 in GridNodeContainer::initNode(irr::core::vector3d<short>, PathGridnode*) ()
#3  0x000055a0bb6a7ff1 in MapGridNodeContainer::access(irr::core::vector3d<short>) ()
#4  0x000055a0bb6a6c74 in Pathfinder::updateAllCosts(irr::core::vector3d<short>, irr::core::vector3d<short>, int, int) ()
#5  0x000055a0bb6a6d20 in Pathfinder::updateAllCosts(irr::core::vector3d<short>, irr::core::vector3d<short>, int, int) ()
   [...snip...]
#743 0x000055a0bb6a6d20 in Pathfinder::updateAllCosts(irr::core::vector3d<short>, irr::core::vector3d<short>, int, int) ()
#744 0x000055a0bb6a9906 in Pathfinder::getPath(irr::core::vector3d<short>, irr::core::vector3d<short>, unsigned int, unsigned int, unsigned int, PathAlgorithm) ()
#745 0x000055a0bb6a9ed0 in get_path(Map*, NodeDefManager const*, irr::core::vector3d<short>, irr::core::vector3d<short>, unsigned int, unsigned int, unsigned int, PathAlgorithm) ()
#746 0x000055a0bb52a7f2 in ModApiEnvMod::l_find_path(lua_State*) ()
#747 0x000055a0bb4fbfe6 in script_exception_wrapper(lua_State*, int (*)(lua_State*)) ()
#748 0x00007f4976fdcb4b in ?? () from /usr/local/lib/libluajit-5.1.so.2
#749 0x00007f4976ff13b5 in lua_pcall () from /usr/local/lib/libluajit-5.1.so.2
#750 0x000055a0bb508a0d in ScriptApiEntity::luaentity_Step(unsigned short, float, collisionMoveResult const*) ()
#751 0x000055a0bb57ef4d in LuaEntitySAO::step(float, bool) ()
#752 0x000055a0bb6f44f6 in std::_Function_handler<void (ServerActiveObject*), ServerEnvironment::step(float)::{lambda(ServerActiveObject*)#2}>::_M_invoke(std::_Any_data const&, ServerActiveObject*&&) ()
#753 0x000055a0bb577d31 in server::ActiveObjectMgr::step(float, std::function<void (ServerActiveObject*)> const&) ()
#754 0x000055a0bb707fa7 in ServerEnvironment::step(float) ()
#755 0x000055a0bb6ea4dd in Server::AsyncRunStep(bool) ()
#756 0x000055a0bb6ed2ae in ServerThread::run() ()
#757 0x000055a0bb599c76 in Thread::threadProc(Thread*) ()
#758 0x00007f4976c7e349 in ?? () from /usr/lib/libstdc++.so.6
#759 0x00007f497726b1f5 in ?? () from /lib/ld-musl-x86_64.so.1
#760 0x0000000000000000 in ?? ()

Steps to reproduce

Not sure...

@pietru2004 pietru2004 added the Unconfirmed bug Bug report that has not been confirmed to exist/be reproducible label Sep 23, 2023
@pietru2004
Copy link
Author

Note: I tried switching from master to 5.7.0 and it still hapens...

@rubenwardy
Copy link
Member

rubenwardy commented Sep 23, 2023

Hi, we need to get the output of the debugger to work out where it's crashing. To do this, include --debugger when starting the application from the terminal. You may need to modify your docker container to set the entry point to something else, and then connect with bash and run minetest manually:

# Open a bash shell in the docker container
docker exec -it minetest_minetest_1 bash

# Run Minetest manually
minetestserver --port 25564 --gameid minetest --world server_world_technic --debugger

When it crashes, type bt full to get a backtrace, and then paste that here

@pietru2004
Copy link
Author

it told me

2023-09-23 19:44:32: WARNING[Main]: Couldn't find a debugger to use. Try installing gdb or lldb.

2023-09-23 19:44:32: WARNING[Main]: Continuing without debugger```

@pietru2004
Copy link
Author

also I think it is caused by forgotten_monsters cause after disabling it it seams it stopped stopping server...

@pietru2004
Copy link
Author

I runned same world in singleplayer and it seams to run just fine...

@rubenwardy
Copy link
Member

rubenwardy commented Sep 23, 2023

it told me

2023-09-23 19:44:32: WARNING[Main]: Couldn't find a debugger to use. Try installing gdb or lldb.

2023-09-23 19:44:32: WARNING[Main]: Continuing without debugger```

You need to have GDB installed inside the docker container

@pietru2004
Copy link
Author

is it mod or apt install gdb ?

@appgurueu
Copy link
Contributor

GDB is the GNU debugger. apt install gdb should work on any Debian-based systems.

@pietru2004
Copy link
Author

cause minetest dockerfile is based on alpine I found I needed to use apk add gdb also it seams bt full does not work with this

@pietru2004
Copy link
Author

pietru2004 commented Sep 23, 2023

I got result
result.txt

@pietru2004
Copy link
Author

also it seams to disappear when I disable mod/modpack forgotten_monsters

@rubenwardy rubenwardy changed the title Server self closing without any info Segfault in Map::getSectorNoGenerateNoLock whilst pathfinding Sep 23, 2023
@pietru2004
Copy link
Author

@rubenwardy I need to ask was it necessary to edit my msg? I mean now it takes more space... no offence

@rubenwardy
Copy link
Member

It's better than to take more space than to hide the actual segfault message. Really, it could even be added the opening post

@pietru2004
Copy link
Author

hm... ok

@rubenwardy
Copy link
Member

rubenwardy commented Sep 23, 2023

GitHub limits the length and adds a scrollbar

@pietru2004
Copy link
Author

pietru2004 commented Sep 23, 2023

I can agree on that...
I got idea -

Summary Goes Here ...this is hidden, collapsable content...
<details>
 <summary>Summary Goes Here</summary>
 ...this is hidden, collapsable content...
</details>

@rubenwardy
Copy link
Member

rubenwardy commented Sep 23, 2023

Looks like GitHub only limits the length on Desktop, and not on mobile, so I'll reduce the length

@SmallJoker
Copy link
Member

A few observations:

  1. There is nothing special happening in Map::getSectorNoGenerateNoLock, thus m_pathf->m_map in GridNodeContainer::initNode might be invalid.
  2. Pathfinder previously used m_map->getNode, an argument against the invalid pointer hypothesis

Interestingly this error only happens after approx. 700 iterations, what makes me wonder whether this might be an issue related to async. Would you please be so nice to provide the output of thread apply all bt in gdb - perhaps trim repetitions manually and/or upload it as a text file to GitHub.

@pietru2004
Copy link
Author

pietru2004 commented Sep 24, 2023

erm I need to build custom image for this...

from registry.gitlab.com/minetest/minetest/server:5.7.0

USER root
RUN apk add gdb 
USER minetest

ENTRYPOINT /bin/sh

#
#CMD ["--config", "/etc/minetest/minetest.conf"]
#/usr/local/bin/minetestserver --config /var/lib/minetest/.minetest/minetest.conf  --port 25564 --gameid minetest --world server_world_technic --debugger

how to run gdb ? I mean what args if I run it as comand

@pietru2004
Copy link
Author

also
obraz

@pietru2004
Copy link
Author

I tried this cmd:

/usr/bin/gdb -q --batch -iex set confirm off -ex run -ex thread apply all bt --args /usr/local/bin/minetestserver --config /var/lib/minetest/.minetest/minetest.conf --port 25564 --gameid minetest --world server_world_technic

And got:

Thread 8 "Server" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 31]
0x000056067eeed599 in Map::getSectorNoGenerateNoLock(irr::core::vector2d<short>) ()
[Current thread is 8 (LWP 31)

@pietru2004
Copy link
Author

Also I might be wrong but I think it occurs at specific time of day... gona check it

@pietru2004
Copy link
Author

I was able to get server to crash in about 3 minutes 3 times in server time hours between 21-23

@sfan5
Copy link
Member

sfan5 commented Sep 24, 2023

This is a simple stack overflow just like #7132 (in fact our containers use Alpine too).

@appgurueu
Copy link
Contributor

Alright, so the best fix would probably be to convert the recursive implementation into an iterative one, manually managing the stack?

@sfan5
Copy link
Member

sfan5 commented Sep 24, 2023

possible fixes:

  • convert the algorithm to iterative
  • fix the algorithm bug that causes this high recursion (if there is one)
  • set higher stack sizes for musl (only helps if the algo is working correctly)

@Zughy Zughy added Bug Issues that were confirmed to be a bug and removed Unconfirmed bug Bug report that has not been confirmed to exist/be reproducible labels Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issues that were confirmed to be a bug @ Script API @ Server / Client / Env.
Projects
None yet
Development

No branches or pull requests

6 participants