Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPC's have a (Lua) memory leak & crash potential #3552

Open
2 tasks done
Erza opened this issue Aug 11, 2021 · 4 comments · May be fixed by #3553
Open
2 tasks done

NPC's have a (Lua) memory leak & crash potential #3552

Erza opened this issue Aug 11, 2021 · 4 comments · May be fixed by #3553

Comments

@Erza
Copy link
Contributor

Erza commented Aug 11, 2021

Before creating an issue, please ensure:

  • This is a bug in the software that resides in this repository, and not a
    support matter (use https://otland.net/forums/support.16/ for support)
  • This issue is reproducible without changes to the C++ code in this repository

Steps to reproduce (include any configuration/script required to reproduce)

  1. Stick this code in a talkaction
local tests = {
    npc = 1,
    monster = 2,
    item = 3,
}

local activeTest = tests.npc

local monsterName = "Enslaved Dwarf"
local npcName = "Banker"
local itemId = 2148

local function runTest(playerPos)
    for i = 1, 5000 do
        if activeTest == tests.npc then
            local npc = Game.createNpc(npcName, playerPos)
            npc:remove()
        elseif activeTest == tests.monster then
            local monster = Game.createMonster(monsterName, playerPos)
            monster:remove()
        elseif activeTest == tests.item then
            local item = Game.createItem(itemId, 100, playerPos)
            item:remove()
        end
    end
end

function onSay(player, words, param)
    local playerPos = player:getPosition()
    
    local start = os.clock()
    runTest(playerPos)
    local reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    local testEnd = os.clock()
    print("Test #1: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    start = os.clock()
    runTest(playerPos)
    reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    testEnd = os.clock()
    print("Test #2: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    start = os.clock()
    runTest(playerPos)
    reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    testEnd = os.clock()
    print("Test #3: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    playerPos:sendMagicEffect(CONST_ME_BLOCKHIT)
    print(("-"):rep(50))
    return false
end
  1. Run the talkaction a few times and keep an eye on the memory usage of the process

Expected behaviour

NPC's cleanly deallocate their memory upon being removed

Actual behaviour

NPC's do not deallocate their memory upon being removed

Observations

  1. After commenting out this call to lua_gc, reload times stay consistent, which gives evidence of the Lua environment getting trashed
    lua_gc(g_luaEnvironment.getLuaState(), LUA_GCCOLLECT, 0);
  2. Using local activeTest = tests.monster or local activeTest = tests.item does not produce the same memory leak
  3. It doesn't matter which reload type we use, whether it's RELOAD_TYPE_TALKACTIONS, RELOAD_TYPE_CREATURESCRIPTS or RELOAD_TYPE_ACTIONS etc, the entire Lua environment is affected

Test Outputs

With the call to lua_gc enabled (increasing timings)
Test #1: Test: 3.841 - Reload: 0.059000000000001 - Total: 3.9
Test #2: Test: 3.913 - Reload: 0.11 - Total: 4.023
Test #3: Test: 3.833 - Reload: 0.167 - Total: 4
--------------------------------------------------
Test #1: Test: 3.832 - Reload: 0.221 - Total: 4.053
Test #2: Test: 3.841 - Reload: 0.274 - Total: 4.115
Test #3: Test: 3.896 - Reload: 0.33 - Total: 4.226
--------------------------------------------------
Test #1: Test: 3.897 - Reload: 0.383 - Total: 4.28
Test #2: Test: 3.916 - Reload: 0.439 - Total: 4.355
Test #3: Test: 3.934 - Reload: 0.492 - Total: 4.426
--------------------------------------------------
Test #1: Test: 3.954 - Reload: 0.547 - Total: 4.501
Test #2: Test: 3.971 - Reload: 0.594 - Total: 4.565
Test #3: Test: 3.988 - Reload: 0.657 - Total: 4.645
--------------------------------------------------
Test #1: Test: 4.006 - Reload: 0.706 - Total: 4.712
Test #2: Test: 3.993 - Reload: 0.76400000000001 - Total: 4.757
Test #3: Test: 4.019 - Reload: 0.815 - Total: 4.834
--------------------------------------------------
Test #1: Test: 4.017 - Reload: 0.87 - Total: 4.887
Test #2: Test: 4.017 - Reload: 0.929 - Total: 4.946
Test #3: Test: 4.061 - Reload: 0.982 - Total: 5.043
--------------------------------------------------
With the call to lua_gc disabled (consistent timings)
Test #1: Test: 3.875 - Reload: 0.0040000000000013 - Total: 3.879
Test #2: Test: 3.921 - Reload: 0.0040000000000013 - Total: 3.925
Test #3: Test: 3.911 - Reload: 0.0040000000000013 - Total: 3.915
--------------------------------------------------
Test #1: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049
Test #2: Test: 3.844 - Reload: 0.0040000000000013 - Total: 3.848
Test #3: Test: 3.967 - Reload: 0.0039999999999978 - Total: 3.971
--------------------------------------------------
Test #1: Test: 4.112 - Reload: 0.0049999999999955 - Total: 4.117
Test #2: Test: 4.124 - Reload: 0.0039999999999978 - Total: 4.128
Test #3: Test: 3.94 - Reload: 0.0039999999999978 - Total: 3.944
--------------------------------------------------
Test #1: Test: 3.974 - Reload: 0.0040000000000049 - Total: 3.978
Test #2: Test: 4.007 - Reload: 0.0039999999999978 - Total: 4.011
Test #3: Test: 4.217 - Reload: 0.0049999999999955 - Total: 4.222
--------------------------------------------------
Test #1: Test: 4.528 - Reload: 0.0049999999999955 - Total: 4.533
Test #2: Test: 4.186 - Reload: 0.0039999999999907 - Total: 4.19
Test #3: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049
--------------------------------------------------
Test #1: Test: 4.108 - Reload: 0.0039999999999907 - Total: 4.112
Test #2: Test: 4.33 - Reload: 0.0040000000000049 - Total: 4.334
[Warning - NpcScript::NpcScript] Can not load script: bank.lua
not enough memory

Memory Graphs

NPC Test

NPC Test

Monster Test

Monster Test

Item Test

Item Test

Impact on the server

Eventually, the Lua environment will run out of available memory to allocate and you will get the following error, after which the server will shortly crash

[Warning - NpcScript::NpcScript] Can not load script: bank.lua
not enough memory
@nekiro
Copy link
Member

nekiro commented Aug 11, 2021

Probably the npc system is at fault (I think it's known from having memory leaks), if you empty the lua files to bare minimum is it still leaking? Or if you use another npc.

@Erza
Copy link
Contributor Author

Erza commented Aug 11, 2021

For all of these tests I was using 33333 iterations per test, to get to a total of 100k spawns

local function runTest(playerPos)
    for i = 1, 33333 do
        if activeTest == tests.npc then
            local npc = Game.createNpc(npcName, playerPos)
            npc:remove()
        elseif activeTest == tests.monster then
            local monster = Game.createMonster(monsterName, playerPos)
            monster:remove()
        elseif activeTest == tests.item then
            local item = Game.createItem(itemId, 100, playerPos)
            item:remove()
        end
    end
end

Results

The first dip is always after the talkaction finished executing, the 2nd dip is always after manually running /reload npc after execution finishes

Test 1 (Removing the script="bank.lua" tag from the Banker.xml file entirely

Looks like we got perfect memory deallocation in this case

Test 2 (Bare minimum NPC script file)

It's seemingly less with an "empty" NPC script file, but I can still observe a leak coming from data/npc/lib/npc.lua (more specifically, data/npc/lib/npcsystem.lua)

Test 3 (Adding KeywordHandler)


Test 4 (Adding NpcHandler)


Test 5 (Parsing parameters & enabling NpcHandler callbacks)


Test 6 (Adding FocusModule)


Test 7 (Adding a few keywords to listen for)


@Zbizu
Copy link
Contributor

Zbizu commented Feb 4, 2023

There's some exception with code 0xE24C4A02 spamming the log when running a debugger
it's being launched from this line (and has npc onThink in stacktrace):

int ret = lua_pcall(L, nargs, nresults, error_index);

could someone confirm this with unmodded tfs?

@Zbizu Zbizu mentioned this issue Jun 4, 2023
2 tasks
@EPuncker
Copy link
Contributor

EPuncker commented Jun 29, 2023

#4483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants