Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leaking memory if socket send failed? #194

Closed
mikewen opened this issue Feb 8, 2015 · 35 comments
Closed

leaking memory if socket send failed? #194

mikewen opened this issue Feb 8, 2015 · 35 comments
Assignees
Labels

Comments

@mikewen
Copy link

mikewen commented Feb 8, 2015

I am running 3 ESP-12 modules, post sensors data to Internet every 60 seconds.

I noticed that if socket send failed, the heap size reduced by about 2K, if too many fails, it will cause out of memory panic and reboot.

Is there a way to listen to socket send fail event and release memory?

@arnarg
Copy link

arnarg commented Feb 9, 2015

This may be more of an hack than a solution. If the "sent" event handler doesn't run when it fails you could do something like this:

conn:on("sent", function(conn)
    delivered = true
end)

function send(data)
    conn:send(data)
    delivered = false
    tmr.alarm(0, timeout, 0, function()
        if delivered == false then
            -- cleanup code
        end
    end)
end

@mikewen
Copy link
Author

mikewen commented Feb 10, 2015

Thanks!

It would help to release some memory.

Right now, if not able to send, NodeCMU will just stuck and not release the memory used by socket.
It should have timeout option.

@mikewen
Copy link
Author

mikewen commented Feb 10, 2015

Sorry, the hack does not work. It will free the memory, but it would not able to send any more.

@myplacedk
Copy link

Not being able to send after freeing up memory could be related to #202.

@Toshik
Copy link

Toshik commented Feb 19, 2015

Maybe there is possibility to have timeout configured for outbound sockets?

@dperlman
Copy link

I'm seeing a memory leak on connections whether they are successful or not.

I'm not clear on all the conditions that lead to the leak, but the following code reproduces it. Note that this code isn't doing anything special at all; it just opens a connection, sends a request, and closes the connection when it receives a reply. I ran this a few times, then waited several minutes for the lazy garbage collection, but it made no difference; memory gets permanently eaten with each run of this function. Note that in this case, this happens even with successful send, so it is different than this issue's description, but it seems like it must be related.

function TestSockLeak(ip, port)
    collectgarbage()
    print("start heap:", node.heap())
    local c=net.createConnection(net.TCP, 0)
    c:on("connection", function(conn)
        print("event 'connection' happened, sending")
        c:send("HEAD / HTTP/1.1\r\n\r\n")
    end)
    c:on("sent", function(conn)
        print("event 'sent' happened")
    end)
    c:on("receive", function(conn, p)
        print("event 'receive' happened, received: "..p:gsub("\r\n(.*)","...etc.",1))
        conn:close()
    end)
    c:on("disconnection", function(conn)
        print("event 'disconnection' happened")
        collectgarbage()
        print("collected garbage, heap now", node.heap())
    end)
    c:connect(port, ip)
end



> TestSockLeak("google.com",80)
start heap: 16152
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 200 OK...etc.
event 'disconnection' happened
collected garbage, heap now 15768
TestSockLeak("google.com",80)
start heap: 15592
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 200 OK...etc.
event 'disconnection' happened
collected garbage, heap now 15256
TestSockLeak("google.com",80)
start heap: 15280
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 200 OK...etc.
event 'disconnection' happened
collected garbage, heap now 14984
TestSockLeak("google.com",80)
start heap: 14848
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 200 OK...etc.
event 'disconnection' happened
collected garbage, heap now 14520
TestSockLeak("google.com",80)
start heap: 14512
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 200 OK...etc.
event 'disconnection' happened
collected garbage, heap now 14216

@TerryE
Copy link
Collaborator

TerryE commented Nov 3, 2015

See #719

@TerryE TerryE closed this as completed Nov 3, 2015
@jmattsson
Copy link
Member

Actually, I'll re-open this one Terry. Using the TestSockLeak() function here I'm seeing a steadily declining heap even on 1.4. If I get some time I'll look into this, though it'd be great is someone else beats me to it ;)

@jmattsson jmattsson reopened this Nov 4, 2015
@TerryE
Copy link
Collaborator

TerryE commented Nov 4, 2015

+1 I am assuming that a few of the issues will either need to stay open or be reopened, and a memory leak on core functionality is definitely a valid issue to be maintained. Sorry for missing this

@3dot3
Copy link

3dot3 commented Nov 4, 2015

I can confirm this leaking still there though it is much better than dev120

@czarny
Copy link

czarny commented Jan 13, 2016

Actually problem seems to be connected with referencing local variable c in 'connection' callback function.
Memory management performs much better (even without explicit garbage collector calls) when the send method is called on conn variable.

conn:send("HEAD / HTTP/1.1\r\n\r\n")

@dvv
Copy link
Contributor

dvv commented Jan 14, 2016

FYI, my way of leakless GET (by trial and error, the point seems in releasing conn):

local conn = net.createConnection(net.TCP, 0)
conn:on("connection", function()
  conn:send(("GET /home/pub?%s HTTP/1.0\r\n\r\n"):format(s), function()
    conn:close()
    conn = nil
    collectgarbage()
  end)
end)
conn:connect(80, "192.168.1.1")

@czarny
Copy link

czarny commented Feb 16, 2016

@dvv This is exactly the same problem, you are using local conn variable in the callback closure.
The code will start leaking when you will start a second request before the first end up.
Here is example which doesn't leak and doesn't require garbage collector call:

local conn = net.createConnection(net.TCP, 0)
conn:on("connection", function(c)
   c:send("GET / HTTP/1.1\r\n\r\n")
end)
conn:on("receive", function(c)
   c:close()
end)
conn:connect(port, ip)

@TerryE
Copy link
Collaborator

TerryE commented Feb 16, 2016

It's not so much as conn is a local, but that using conn and not c in conn:on("some-event", function() conn:send("abc") end) creates an upvalue in the abstract function which refers to itself. This circular dependency means that the entry is left in the registry and this will frustrate the GC. You can track registry entries by using for k,v in pairs(debug.getregistery() print (k,v) end

@dvv
Copy link
Contributor

dvv commented Feb 17, 2016

@czarny My code is not meant to send like a machinegun. The code runs in a timer callback and it never leaked a byte for several months which gave me reason to state it as leakless. Other variations didn't.

@marcelstoer
Copy link
Member

The example in the docs isn't very nice (and should have a "see also" to the HTTP module) but fortunately it doesn't make that mistake http://nodemcu.readthedocs.org/en/dev/en/modules/net/#example_5.

@TerryE TerryE self-assigned this Feb 17, 2016
@TerryE
Copy link
Collaborator

TerryE commented Feb 17, 2016

Marcel, IMO, there's something wrong with the clean-up logic here. Apps programmers shouldn't be able to create this sort of unsweepable loop in Lua, so I've added this to my list.

@czarny
Copy link

czarny commented Feb 17, 2016

@marcelstoer This example will cause problems too.
I agree with @TerryE, this is surely bug in GC.

@marcelstoer
Copy link
Member

@TerryE and @czarny now you confused me...

@marcelstoer This example will cause problems too.

This or that example i.e. the one in the docs?

there's something wrong with the clean-up logic here.

Again, "here" being...what?

Are you saying the example in docs is a bad example in general? AFAICS it doesn't create an upvalue. What am I missing?

@czarny
Copy link

czarny commented Feb 17, 2016

I am talking about example in the net module documentation.

sk = net.createConnection(net.TCP, 0)
sk:on("receive", function(sck, c) print(c) end )
sk:connect(80,"192.168.0.66")
sk:on("connection", function(sck,c)
    -- Wait for connection before sending.
    sk:send("GET / HTTP/1.1\r\nHost: 192.168.0.66\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n")
end)

Callback registered for 'connection' event keeps reference to sk variable, and sk keeps reference to callback closure what creates a reference cycle. To resolve this problem callback closure should use sck variable what will break the cycle.
Please note that example above is completely fine in terms of lua language. But in case of nodemcu it causes memory leak.

@marcelstoer
Copy link
Member

Grr...thanks, I found the missing 'c'.
/OT
So, yes the example is bad not just because it causes a memory leak but also because the code ain't very readable. The abbreviated variable names sk and sck are too similar, particularly because there's also a c. I'll prepare a PR to fix that.
/OT

@TerryE
Copy link
Collaborator

TerryE commented Feb 17, 2016

Marcel, let me track down the bug before we update this one sample. Thanks.

@TerryE TerryE added the bug label Feb 19, 2016
@TerryE
Copy link
Collaborator

TerryE commented Feb 19, 2016

OK, I've got my head around this. It's definitely a bug. I'll post some test cases to explain the issue. The underlying reason is that there are design flaws in the net module that will be quite difficult to patch over. This isn't surprising really. It took me a year to get me head around how the SDK works and how the Lua runtime and GC model plays against this -- mainly because so much isn't documented and so has to be worked out by retro-engineering the functionality and behaviours. The net module had to be the one of the first to be written, so we should be surprised that we got caught by this Catch-22.

In fact the penny only dropped because I decided to completely rework the net module, and in doing so realised the issue. The new version isn't even at the PR stage because it is a fundamental restructuring, but it is less than the size of the original net.c source and also smaller in binary. (There is a lot of cut&paste in the original, and redundant checking).

Anyway, I'll post these examples to explain the issues once I've tested them on a current dev build.

@pjsg
Copy link
Member

pjsg commented Feb 19, 2016

I'm trying to get my head around it too -- I don't see where the base reference is to the sk is comign from... Yes, the callback has a reference to sk, but it should be that all of that is unreachable and hence ought to be GCed.

The reference cycle itself should not cause a problem as the GC is mark and sweep rather than reference count based.

@TerryE
Copy link
Collaborator

TerryE commented Feb 19, 2016

Phillip, have a look and play with this gist and variations thereof. Try doing a restart and then doing the test1 multiple times. Not that I swapped out from google to a local apache instance for my connections. You will need to do similar.
Yes, the GC mark and sweep will still fail -- as in this case -- if the close down logic has execution paths which don't unref the cb's in the registry.

Replace the last timer by

tmr.alarm(0,10000,0, function() enum_reg_funcs() g1=nil end)
tmr.alarm(1,20000,0, function() enum_reg_funcs() g2=nil end)
tmr.alarm(2,30000,0, enum_reg_funcs)

Now dofile it and let the GC clean up. Do the same but re dofile it after 8 secs, say, when the connections have been closed but the clean down hasn't happened. Or set g1 and g2 to nil at the same time. Uaaaghhh

@TerryE
Copy link
Collaborator

TerryE commented Feb 19, 2016

There are so many execution paths that steadily fill the registry.

@wuyuanyi135
Copy link

I was using mqtt module and I found the heap is not steady (it should be). I guess the publish sometimes failed and the memory leaked. Would be nice if the timeout option can free the failed requests.

@pjsg
Copy link
Member

pjsg commented Mar 6, 2016

Which version of the firmware are you using? Various fixes were made in the dev branch.
Can you post some sample code that demonstrates the problem?

@amancinellib
Copy link

amancinellib commented Apr 28, 2016

I'm using nodemcu_float_0.9.6-dev_20150704.bin. I had the same problem when I send HTTP GET every 10 seconds, on each connection about 700 bytes of heap was consumed. I solved the problem, open only one connection (with Keep Alive) and inside timer I just send data. Here is the code:

SERVER_IP="127.0.0.1"
SERVER_PORT=80
CONST_CONN_SENT="sent"
CONST_CONN_RECEIVE="receive"
CONST_CONN_CONNECTION="connection"
HTTP_REQ_GET="GET /Service?id=NMCU001&data=XXXXX  HTTP/1.1\r\nHost: www.somehost.com\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n"

-- SETUP ONLY ONE TCP CONNECTION AND CALL BACKS
local conn=net.createConnection(net.TCP, false)
conn:on(CONST_CONN_RECEIVE, function(conn, pl) 
            print(pl)
            pl=nil;
            collectgarbage();
        end)
conn:on(CONST_CONN_SENT, function(conn) collectgarbage("collect") end)
-- CONNECT TO HOST
conn:connect(SERVER_PORT,SERVER_IP)
-- FINISH SETUP

-- FIRE TIMER EVERY 5 seconds TO SEND HTTP REQUEST
tmr.alarm(1, 5000, 1, function()
if (wifi.sta.getip() == nil) then
    tmr.stop(1)
    conn=nil;
    collectgarbage();
else
    print("timer:invoked()="..node.heap())
    conn:send(HTTP_REQ_GET)
    collectgarbage();
end

@TerryE
Copy link
Collaborator

TerryE commented Apr 28, 2016

@amancinellib, what happens on a very old release isn't relevant.

@amancinellib
Copy link

I'm using build 0.96 dev, is a very old release ?

On 28/04/2016, at 14:58, Terry Ellison notifications@github.com wrote:

@amancinellib, what happens on a very old release isn't relevant.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@marcelstoer
Copy link
Member

Oh yes, it is. You should build your own firmware, ideally from the dev branch: http://nodemcu.readthedocs.io/en/dev/en/build/

@devsaurus
Copy link
Member

Running the test code on the current dev branch exhibits the same behaviour as described in #1303: Heap drops during 5 invocations and then toggles between discrete values.

TestSockLeak("google.com",80)
start heap: 43120
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42696
TestSockLeak("google.com",80)
start heap: 42840
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42504
TestSockLeak("google.com",80)
start heap: 42672
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42304
TestSockLeak("google.com",80)
start heap: 42488
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42120
TestSockLeak("google.com",80)
start heap: 42312
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 41936
TestSockLeak("google.com",80)
start heap: 42128
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42160
TestSockLeak("google.com",80)
start heap: 42312
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 41936
TestSockLeak("google.com",80)
start heap: 42128
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 42160
TestSockLeak("google.com",80)
start heap: 42312
> event 'connection' happened, sending
event 'sent' happened
event 'receive' happened, received: HTTP/1.1 302 Found...etc.
event 'disconnection' happened
collected garbage, heap now 41936

@marcelstoer
Copy link
Member

Has been fixed in the meantime as per Arnim's comment -> closing.

@temp3l
Copy link

temp3l commented Jan 17, 2017

fixed in master Master 81ec366
leak still/again occurs in current dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests