Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UIPEthernet stops working after some time #30

Closed
nilrog opened this issue Jan 20, 2014 · 35 comments · Fixed by #44
Closed

UIPEthernet stops working after some time #30

nilrog opened this issue Jan 20, 2014 · 35 comments · Fixed by #44
Labels

Comments

@nilrog
Copy link
Contributor

nilrog commented Jan 20, 2014

I have a sketch, written for the stock Ethernet library, that posts information to http://emoncms.org with 30 secs interval. When I run this sketch on an Uno + Ethernet shield it justs keeps posting every 30 secs.

But when I recompile this sketch for UIPEthernet and run in to a Nano + ENC28J60 shield it has so far been running for at most 20 hours, but the track record is down as low as < 1 hour. I have had to add additional error handling and timeout handling to the sketch to improve stability with UIPEthernet. With the stock Ethernet library it "just worked".

What happens is that the sketch just stops posting. I not yet had the possibility to see this lockup when the Nano was connected to my Mac. Because where I have it wired up for longterm testing I have no possibility to log serial out.

I am currently running the lastest "master".

I know this is a bit vague, but I will try to gather more info. I just wanted to toss this out here since you have way more knowledge of the internals in UIPEthernet and might have a clue about what is happening :)

/Roger

@ntruchsess
Copy link
Owner

My best guess as you mention Serial output not being connected: writing to an unconnected Serial might causes stability-issue when used in sketches that use a good amount of memory as UIPEthernet does (stock Ethernet need nearly no memory since everything is done on the WIZ5100 chip). It may help to just connect to Serial-monitor to prevent an overflow of the Serial-device buffers. There is an open issue with Arduino about this: https://github.com/arduino/Arduino/issues/1724. I allways use some '#ifdef debug' statements to remove all debug-logging from the sketch for production use.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 20, 2014

Well, I can try to remove those printouts and see what happens. But I have been running sketches on the same hardware with NanodeUIP instead and serial printouts without problems before. But that library requires you to write code that cannot be reused easily with Wiznet based hardware. So that's why i'm hoping this library would work :)

@gibsop1
Copy link

gibsop1 commented Jan 20, 2014

I had this same issue, however since running the very latest master, with the connect returning 0 instead of -1 fix in, I have not encountered this issue. I had my code running for around 50 hours before I turned it off. This was working as a server. Code I'm using: https://gist.github.com/gibsop1/ba4cd682913144fecd18

@nilrog
Copy link
Contributor Author

nilrog commented Jan 20, 2014

Issue #29 was reverted in the latest master since the stock Ethernet library also return negative values. They should update their API documentation because it says true/false but in case of connect that requires DNS lookup they will return negative values.

And in my code i'm checking the return value and only accept > 0.

I have now removed all my serial output and started up the sketch again so we will see if this run lasts any longer.

@gibsop1
Copy link

gibsop1 commented Jan 20, 2014

Ah about thanks. I'll update my code with >0.

Thanks
Paul

On Mon, Jan 20, 2014 at 8:27 PM, nilrog notifications@github.com wrote:

Issue #29 #29 was
reverted in the latest master since the stock Ethernet library also return
negative values. They should update their API documentation because it says
true/false but in case of connect that requires DNS lookup they will return
negative values.

And in my code i'm checking the return value and only accept > 0.

I have now removed all my serial output and started up the sketch again so
we will see if this run lasts any longer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-32794125
.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 21, 2014

Serial.print was not the problem. I removed all those printouts yesterday and tonight the sketch has stopped posting again. There is something in here that is not 100% ok but I don't yet know what it is.

@ntruchsess
Copy link
Owner

ok, can you provide the complete code that is running on the arduino? Either here or as link to your repo?

@nilrog
Copy link
Contributor Author

nilrog commented Jan 25, 2014

I have done some more tests and tracings and there seems to be a problem in the connect() call when you use a hostname. After a while the connect() call will still return 1 (or > 0 which I am testing for) but nothing is sent to the remote host.

Snooping with Wireshark reveals that the DNS-request for the remote host is sent, and a reply to that is returned, when the sketch calls connect(). But the actual HTTP-request is never sent from the sketch. So the sketch is then waiting for a response that will never come.

I have put part of the network trace in this gist, https://gist.github.com/nilrog/27745bbf923388c94ec4

The problem begins with the request in frame 481. The response for this request seems to be duplicated (in frame 483 and 486) and from that point on UIPEthernet stops working properly.

@ntruchsess
Copy link
Owner

I assume 192.168.2.11 is the Arduino?

Frame 486 is just a retransmission of Frame 483. This is caused by the duplicate-ack-packet in Frame 485 which tells 176.227.193.50 that it didn't receive or process frame 481. Frame 486 (dup of 483) is acknowledged in 487, after that 176.227.193.50 agains send the FIN,ACK to close the connection... That looks all good - retransmissions are not an error, since packets can (and may) allways be dropped. And in this case the retransmitted http-response in frame 486 seems to have been processed by 192.168.2.11 as it acknowledge the close of connection afterwards.

The strange thing is that from this point on every connection-attempt from 192.168.2.11 is going to fail because the ACKs of 192.168.2.11 (e.g. Frame 497) to the SYN,ACK-packets from 176.227.193.50 (Frame 496) seem not to reach its target as 176.227.193.50 retransmits the SYN,ACK-packet (Frame 498), ignores the retransmitted ACK of 192.168.2.11 (Frame 499) and then closes the connection (Frame 500) which is acknoledged by 192.168.2.11 in Frame 501. This frame obviously reached 176.227.193.50, as it's acked in Frame 502 and the connection is closed.

From what I can see 192.168.2.11 works fine, as it's 176.227.193.50 not accepting the new connections. This could also be a firewall/router issue (possible triggered by the retransmitted ACK in the previous connection) when the ACKs from 192.168.2.11 would be dropped on the way not reaching 176.227.193.50.

connect()-method returns with return-value of 1 as soon the connection is established which is when the ACK (see Frame 497) to 176.227.193.50's SYN,ACK-packet is sent. Bad thing is that the connection is closed by 176.227.193.50 just after that so your sketch never gets to send out the request.

  • Norbert

@nilrog
Copy link
Contributor Author

nilrog commented Jan 25, 2014

Yes, 192.168.2.11 is my Arduino.

The thing is that this is only an issue with UIPEthernet. I have an identical Arduino that has been running with EtherCard posting to the same server for a month now without problems. No hiccups, it just keeps on working. I have also used NanodeUIP, which also uses UIP, on the same hardware and it "just works". But both of these libraries have their drawbacks and they are also not so user-friendly as the stock Ethernet API.

If I reset the Arduino when it is in this state everything starts working again. No other measures taken. So I do not believe this is an issue with a router. It's just that I have not been able to put my finger on what is wrong yet :(

But I will continue digging :)

@ntruchsess
Copy link
Owner

@nilrog: The wireshark log gives evidence that the connections are closed by 176.227.193.50 aprox. 20 seconds after the ACK to 176.227.193.50 SYN,ACK-packet was sent. This can also be just because of configured timeout on 176.227.193.50, or could be because of not having received the ACK - we cannot know without having access to the webservers log. What we know is your sketch is not sending further packets within these 20 sec after the connection is established. Can you please post the complete code your sketch runs (Either here or as a gist)?

@nilrog
Copy link
Contributor Author

nilrog commented Jan 27, 2014

By the time that 20 secs has gone the Arduino has already closed the connection. If no response, or no complete response has been received within 10 secs i'm calling client.stop().

The 20 sec seem to be the default timeout on Linux.

@ntruchsess
Copy link
Owner

Can you please post the complete code your sketch runs (Either here or as a gist)?

@nilrog
Copy link
Contributor Author

nilrog commented Jan 28, 2014

Sure, but it just connects and prints the request to the client.
https://gist.github.com/nilrog/990536d313e6c9db1fe9

@nilrog
Copy link
Contributor Author

nilrog commented Jan 28, 2014

I created a pull request (#33) that moves all debug printouts from RAM to FLASH. That was needed otherwise the sketch would soon crash due to lack of free RAM.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 28, 2014

Here is serial output where I have turened on UIPETHERNET_DEBUG_CLIENT and UIPETHERNET_DEBUG. Unfortunately I couldn't get the full serial printout since the Arduino IDE seems to have a limit to how much it can store. But I have extracted what should be one attempt to post.

Posting update...
network_send uip_packet: 1, hdrlen: 42
--------------
receivePacket: 255
readPacket type IP, uip_len: 87
freeing packet: 255
network_send uip_buf (uip_len): 58, packet: 1
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_connected
UIPClient allocation failed
network_send uip_buf (uip_len): 54, packet: 1
freeing packet: 255
connected, state: 255, first packet in: 0
>>> Making HTTP request...
<<< Waiting...
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
network_send uip_buf (uip_len): 54, packet: 1
freeing packet: 255
--------------
receivePacket: 255
readPacket type IP, uip_len: 174
freeing packet: 255
Timeout #2

<<< Disconnecting!
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
network_send uip_buf (uip_len): 54, packet: 1
freeing packet: 255
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
freeing packet: 255

(Updated serial printout to show the correct flow of events from the start of one post.)

@ntruchsess
Copy link
Owner

UIPClient allocation failed <<-- that means that all available slots to manage connection-data are in use.

Maybe this should better be checked in connect()-method even before the connection is initiated. Currently it's checked when the first attempt to use the established connection is made so connect returns 1, which is misleading. You could detect this situation by checking the return-value of Client::write() (which would be 0 in this case).
Nevertheless, the reason for this allocation failure is that all slots in UIPClient::all_data have state set to non-0. That means there are 4 connections with outstanding unread data that went out of scope without having stop called on the client-object. There's either a race-condition in your code or UIPEthernet or your code allows the call of connect() without having called stop() or it's related to the use of Timer1 (There's an unconfirmed report #26 that Timer do not work with UIPEthernet that I still wait for code to reprodue for).

To narrow it down:

  1. add a line 'if (client.connected()) { client.stop() }; right before the call to 'client.connect()' to be 100% sure this didn't slip through.
  2. If this doesn't make a difference, replace the use of Timer1 with a simple check a la 'if (timeout - millis() < 0)' and see whether it still happens.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 28, 2014

Ok, I will try #2 tonight when I get home.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 29, 2014

I have tested removing the Timer and resorted to counting millis instead. But the result is the same. The sketch stops working and UIPEthernet is out of resources. As I expected the use of Timer shouldn't have interfered.

Tonight i'm going to test adding a call to client.stop() before I dispatch a new request. But I can't see in my code how this isn't already done, unless UIPEthernet is returning some really weird statuses for connected() and available(). In the state where i'm waiting for response I have two 10 second timers that call client.stop(). One in the loop where i'm reading the response and one that will always trigger, as can be seen in the serial logs.

But I will try it anyway just to try and narrow down on this problem.

Btw, what version of uIP did you use as base for UIPEthernet?

@ntruchsess
Copy link
Owner

Thank you for testing this. I for myself wouln't have expected Timer1 to interfere with UIPEthernet, it's just that I have this report #26 without further info.

There is another way UIPEthernet can run out of available connections - if the remote side fails to acknowledge the outgoing packets (because it just might be disconnected from network) this outstanding data will be cleaned up not before the tcp-timeouts are expired and the connection is closed by uip due to timeout. Calling Client.stop() only removes unread incoming data - outstanding data-to-send is disassociated from the client-object but still kept for (re-)transmission. So the uip-connection slot stays allocated longer than the client-object itself is valid.
I havn't seen evidence for this to have happened in your Wireshark-log though.

uip is from contiki-OS: https://github.com/contiki-os/contiki/tree/master/core/net

@nilrog
Copy link
Contributor Author

nilrog commented Jan 29, 2014

No problem...I want to crack this damned bug :P

I can take the easy route and go back to EtherCard, which is enough for this project. But I like the idea of a unified Ethernet API regardless of what hardware you are using. This API is also so much easier to use that what EtherCard etc. are using. So unless I hit a dead end I will continue to try and hunt this down :)

I checked the serial log a bit further and to me it looks like UIPClient suddenly stops "closing"...

This is how it looks from the beginning (grep "UIPClient "):

UIPClient uip_connected
UIPClient allocated state: 19
UIPClient uip_newdata, uip_len:0
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_newdata, uip_len:213
UIPClient uip_closed

But after a while it starts to look like this:

UIPClient uip_connected
UIPClient allocated state: 19
UIPClient uip_newdata, uip_len:0
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked
UIPClient uip_acked

And the response from the server never reaches the sketch. The sketch will continue to post thinking all is ok, since UIPEthernet has not flagged anything, and later on UIPEthernet runs out of resources.

What is weird is that I cannot see that the sketch prints "Disconnected" before it prints "Posting update". But I cannot see how the sketch can start posting again before it has called client.stop().

After a while I start seeing "Timeout #2" from the sketch which indicates that reading the response has taken more than 10 seconds, and then I also see "Disconnecting", which is expected since i'm stopping the client. The serial log is unfortunately too big to put in a gist...at least it gets cut-off when I try to create it. But if you want to have it I can share it somewhere else.

This is basically how the state machine in the sketch looks like. And this is what I have been using many times before with EtherCard, NanodeUIP etc. The remote server is also up and running all the time because I have another, EtherCard based, sketch that is posting 24/7 without hiccups.

state_need_to_post_update -> state_posting_update (waiting for response for up to 10 seconds) -> state_noconnection (after client.stop() has been called).

The only way the sketch can get back to state "state_need_to_post_update" is if the timer/millis has expired AND the sketch is in state "state_noconnection". And it cannot get to "state_noconnection" without calling client.stop().

I'm going to test tonight to either add some checks for client.connected() and call client.stop() before posting, or i'm going to add some more debug printouts to try and see in what state the client is.

@ntruchsess
Copy link
Owner

do you have a Wireshark-log that catches the packets at 'But after a while it starts to look like this:...'? That would show wether there is a response that is not processed by UIPEthernet or not (And it would proove whether there is unacknowledged outgoing data or not).

@nilrog
Copy link
Contributor Author

nilrog commented Jan 29, 2014

Not from the latest run when I "know" that this happened. The problem is also that it's hard to find out when in the Wireshark log it would be. I need to add some timestamps to the serial logs to be able to correlate the two logs.

I will see what I can do tonight.

@nilrog
Copy link
Contributor Author

nilrog commented Jan 29, 2014

Here is one occasion where it seems like the sketch starts to post again before it has disconnected. But it hasn't. Because I added a printout before posting if the client is still connected.

The corresponding wireshark log is in this gist https://gist.github.com/nilrog/19327348563aefb18cae

802145: Posting update...
network_send uip_packet: 1, hdrlen: 42
--------------
receivePacket: 255
readPacket type IP, uip_len: 87
freeing packet: 255
network_send uip_buf (uip_len): 58, packet: 1
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_connected
UIPClient allocated state: 16
UIPClient uip_newdata, uip_len:0
network_send uip_buf (uip_len): 54, packet: 1
freeing packet: 255
connected, state: 16, first packet in: 0
>>> Making HTTP request...
UIPClient.write: writePacket(1) pos: 0, buf[0-1]: 'P'
UIPClient.write: writePacket(1) pos: 1, buf[0-1]: 'U'
UIPClient.write: writePacket(1) pos: 2, buf[0-1]: 'T'
UIPClient.write: writePacket(1) pos: 3, buf[0-1]: ' '
UIPClient.write: writePacket(1) pos: 4, buf[0-1]: '/'
UIPClient.write: writePacket(1) pos: 5, buf[0-1]: 'i'
UIPClient.write: writePacket(1) pos: 6, buf[0-1]: 'n'
UIPClient.write: writePacket(1) pos: 7, buf[0-1]: 'p'
UIPClient.write: writePacket(1) pos: 8, buf[0-1]: 'u'
UIPClient.write: writePacket(1) pos: 9, buf[0-1]: 't'
UIPClient.write: writePacket(1) pos: 10, buf[0-1]: '/'
UIPClient.write: writePacket(1) pos: 11, buf[0-1]: 'p'
UIPClient.write: writePacket(1) pos: 12, buf[0-1]: 'o'
UIPClient.write: writePacket(1) pos: 13, buf[0-1]: 's'
UIPClient.write: writePacket(1) pos: 14, buf[0-1]: 't'
UIPClient.write: writePacket(1) pos: 15, buf[0-1]: '.'
UIPClient.write: writePacket(1) pos: 16, buf[0-1]: 'j'
network_send uip_packet: 2, hdrlen: 54
UIPClient.write: writePacket(1) pos: 17, buf[0-1]: 's'
UIPClient.write: writePacket(2) pos: 0, buf[0-1]: 's'
UIPClient.write: writePacket(2) pos: 1, buf[0-1]: 'o'
UIPClient.write: writePacket(2) pos: 2, buf[0-1]: 'n'
UIPClient.write: writePacket(2) pos: 3, buf[0-1]: '?'
UIPClient.write: writePacket(2) pos: 4, buf[0-1]: 'n'
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_acked
eatblock(1): 1 2 0 0 0 -> 2 0 0 0 0 
freeing packet: 255
UIPClient.write: writePacket(2) pos: 5, buf[0-1]: 'o'
UIPClient.write: writePacket(2) pos: 6, buf[0-1]: 'd'
UIPClient.write: writePacket(2) pos: 7, buf[0-1]: 'e'
UIPClient.write: writePacket(2) pos: 8, buf[0-1]: '='
UIPClient.write: writePacket(2) pos: 9, buf[0-2]: '13'
UIPClient.write: writePacket(2) pos: 11, buf[0-1]: '&'
UIPClient.write: writePacket(2) pos: 12, buf[0-1]: 'a'
UIPClient.write: writePacket(2) pos: 13, buf[0-1]: 'p'
UIPClient.write: writePacket(2) pos: 14, buf[0-1]: 'i'
UIPClient.write: writePacket(2) pos: 15, buf[0-1]: 'k'
UIPClient.write: writePacket(2) pos: 16, buf[0-1]: 'e'
UIPClient.write: writePacket(2) pos: 17, buf[0-1]: 'y'
UIPClient.write: writePacket(2) pos: 18, buf[0-1]: '='
UIPClient.write: writePacket(2) pos: 19, buf[0-32]: 'xxxxxxxxxxxxxxxxxxxxxxxxx'
UIPClient.write: writePacket(2) pos: 51, buf[0-1]: '&'
UIPClient.write: writePacket(2) pos: 52, buf[0-1]: 'j'
UIPClient.write: writePacket(2) pos: 53, buf[0-1]: 's'
network_send uip_packet: 1, hdrlen: 54
UIPClient.write: writePacket(2) pos: 54, buf[0-1]: 'o'
UIPClient.write: writePacket(1) pos: 0, buf[0-1]: 'o'
UIPClient.write: writePacket(1) pos: 1, buf[0-1]: 'n'
UIPClient.write: writePacket(1) pos: 2, buf[0-1]: '='
UIPClient.write: writePacket(1) pos: 3, buf[0-1]: '{'
UIPClient.write: writePacket(1) pos: 4, buf[0-1]: '"'
UIPClient.write: writePacket(1) pos: 5, buf[0-1]: 'p'
UIPClient.write: writePacket(1) pos: 6, buf[0-1]: 'u'
UIPClient.write: writePacket(1) pos: 7, buf[0-1]: 'l'
UIPClient.write: writePacket(1) pos: 8, buf[0-1]: 's'
UIPClient.write: writePacket(1) pos: 9, buf[0-1]: 'e'
UIPClient.write: writePacket(1) pos: 10, buf[0-1]: '"'
UIPClient.write: writePacket(1) pos: 11, buf[0-1]: ':'
UIPClient.write: writePacket(1) pos: 12, buf[0-1]: '"'
UIPClient.write: writePacket(1) pos: 13, buf[0-2]: '24'
UIPClient.write: writePacket(1) pos: 15, buf[0-1]: '"'
UIPClient.write: writePacket(1) pos: 16, buf[0-1]: '}'
UIPClient.write: writePacket(1) pos: 17, buf[0-1]: ' '
UIPClient.write: writePacket(1) pos: 18, buf[0-1]: 'H'
UIPClient.write: writePacket(1) pos: 19, buf[0-1]: 'T'
UIPClient.write: writePacket(1) pos: 20, buf[0-1]: 'T'
UIPClient.write: writePacket(1) pos: 21, buf[0-1]: 'P'
UIPClient.write: writePacket(1) pos: 22, buf[0-1]: '/'
UIPClient.write: writePacket(1) pos: 23, buf[0-1]: '1'
UIPClient.write: writePacket(1) pos: 24, buf[0-1]: '.'
UIPClient.write: writePacket(1) pos: 25, buf[0-1]: '1'
UIPClient.write: writePacket(1) pos: 26, buf[0-1]: '
'
UIPClient.write: writePacket(1) pos: 27, buf[0-1]: '
'
UIPClient.write: writePacket(1) pos: 28, buf[0-1]: 'H'
UIPClient.write: writePacket(1) pos: 29, buf[0-1]: 'O'
UIPClient.write: writePacket(1) pos: 30, buf[0-1]: 'S'
UIPClient.write: writePacket(1) pos: 31, buf[0-1]: 'T'
UIPClient.write: writePacket(1) pos: 32, buf[0-1]: ':'
UIPClient.write: writePacket(1) pos: 33, buf[0-1]: ' '
UIPClient.write: writePacket(1) pos: 34, buf[0-11]: 'emoncms.org'
UIPClient.write: writePacket(1) pos: 45, buf[0-1]: '
'
UIPClient.write: writePacket(1) pos: 46, buf[0-1]: '
'
UIPClient.write: writePacket(1) pos: 47, buf[0-1]: 'C'
UIPClient.write: writePacket(1) pos: 48, buf[0-1]: 'o'
UIPClient.write: writePacket(1) pos: 49, buf[0-1]: 'n'
UIPClient.write: writePacket(1) pos: 50, buf[0-1]: 'n'
UIPClient.write: writePacket(1) pos: 51, buf[0-1]: 'e'
UIPClient.write: writePacket(1) pos: 52, buf[0-1]: 'c'
UIPClient.write: writePacket(1) pos: 53, buf[0-1]: 't'
UIPClient.write: writePacket(1) pos: 54, buf[0-1]: 'i'
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_acked
eatblock(2): 2 1 0 0 0 -> 1 0 0 0 0 
freeing packet: 255
UIPClient.write: writePacket(1) pos: 55, buf[0-1]: 'o'
UIPClient.write: writePacket(1) pos: 56, buf[0-1]: 'n'
UIPClient.write: writePacket(1) pos: 57, buf[0-1]: ':'
UIPClient.write: writePacket(1) pos: 58, buf[0-1]: ' '
UIPClient.write: writePacket(1) pos: 59, buf[0-1]: 'c'
network_send uip_packet: 2, hdrlen: 54
UIPClient.write: writePacket(1) pos: 60, buf[0-1]: 'l'
UIPClient.write: writePacket(2) pos: 0, buf[0-1]: 'l'
UIPClient.write: writePacket(2) pos: 1, buf[0-1]: 'o'
UIPClient.write: writePacket(2) pos: 2, buf[0-1]: 's'
UIPClient.write: writePacket(2) pos: 3, buf[0-1]: 'e'
UIPClient.write: writePacket(2) pos: 4, buf[0-1]: '
'
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_acked
eatblock(1): 1 2 0 0 0 -> 2 0 0 0 0 
freeing packet: 255
UIPClient.write: writePacket(2) pos: 5, buf[0-1]: '
'
UIPClient.write: writePacket(2) pos: 6, buf[0-1]: '
'
UIPClient.write: writePacket(2) pos: 7, buf[0-1]: '
'
<<< Waiting...
network_send uip_packet: 1, hdrlen: 54
--------------
receivePacket: 255
readPacket type IP, uip_len: 60
UIPClient uip_acked
eatblock(2): 2 0 0 0 0 -> 0 0 0 0 0 
freeing packet: 255
--------------
receivePacket: 255
readPacket type IP, uip_len: 267
--------------
receivePacket: 255
readPacket type IP, uip_len: 174
freeing packet: 255

833639: Posting update...

At this moment the sketch is still working. But i'm a bit confused as to what i'm seeing.

The library handles multiple connections right? And afaik I cannot see which connection it is that is used. Could it be that with time UIPEthernet starts to loose track of which connection is handling which request?

@ntruchsess
Copy link
Owner

Yes, the library handles multiple connections, but a single instance of Client can only be associated with a single connection. Calling 'connect()' will associate the client with a new connection. If you happen to not have called stop() before connecting again the previous connection will be orphaned, but not discarded, if there is any unread inbound data left. That seems to have happened here, your log shows that it didn't read the 'HTTP 1.1 200 OK'-response that is being found in the wireshark-trace.
I guess I better close any previous connection associated with a client-instance if connect()-method is called without stop to prevent this.

btw, I have an idea why your stat-machine fails (and it propably does fail with ethercard or wiz5100 as well - it's just these libraries seem to be more robust when being used improperly): the enum used for the states is 16-bit internally if the gcc compiler switch -fshort-enums is not being used. This results in assignments that eventually fail when interrupt is happening in between the two bytes being assigned. As UIPEthernet internally uses some 16-bit integers as well (because the signatures of stock Ethernet-api require it) there propably are some places that require masking of interupts too. Try replacing the enum with values defined as 'const byte'

@nilrog
Copy link
Contributor Author

nilrog commented Jan 31, 2014

It's the Arduino serial terminal that is borked, not the state machine, which I was suspecting but had ignored. I tried tonight again and captured the serial data with a better terminal and now it all looks much better...from a sketch point-of-view...up until UIPEthernet stops working just as before. I did change the state variable to a plain uint8_t with defined values instead of using the enum to avoid what you were saying could happen. But that didn't make any difference. I also added printouts that shows everytime the sketch changes state.

In the serial output I can see that every time the sketch wants to send a request the sketch is in state STATE_NOCONNECTION, which it gets to after calling client.stop(). So what we were seeing up until now was not the whole truth...thanks to the Arduino serial terminal.

After a long run the sketch suddenly stops receiving data from UIPEthernet and goes to "Timeout 2". This means that the client is not connected and it has been >10 secs since the request was sent down to UIPEthernet. This happens 2-3 times and then UIPEthernet starts spitting out "UIPClient allocation failed", right after "UIPClient uip_connected". So UIPEthernet has told my sketch that it was successfully connected but it cannot allocate any resources to send out what the sketch is writing to it. So I guess UIPEthernet disconnects, otherwise "Timeout 1" would have kicked in, but it doesn't. So when the sketch is in state STATE_POSTING_UPDATE waiting for the response the client is not connected.

Here is an example of a timeout:

12599923: State = STATE_NOCONNECTION
--------------
receivePacket: 255
readPacket type IP, uip_len: 174
freeing packet: 255
12629577: State = STATE_NEED_TO_POST_UPDATE

12629323: Posting update...
network_send uip_packet: 1, hdrlen: 42
--------------
receivePacket: 255
readPacket type IP, uip_len: 87
freeing packet: 255
--------------
receivePacket: 255
readPacket type IP, uip_len: 174
freeing packet: 255
>>> Making HTTP request...
<<< Waiting...
12644663: State = STATE_POSTING_UPDATE
Timeout #2


12654375: <<< Disconnecting!
12654688: State = STATE_NOCONNECTION

@ntruchsess
Copy link
Owner

short update: I'm setting up a testcase using your code and try to reproduce on my side. Sorry I'm not faster doing these things, but this is a freetime-only-project and time to work on issues is a volatile resource.
Your contribution and time to investigate this error is very wellcome!

P.S. I've added a call to stop() to UIPClient::connect() 36a0479, included in branch https://github.com/ntruchsess/arduino_uip/tree/dev and https://github.com/ntruchsess/arduino_uip/tree/Arduino_1.5.5_dev
I'll use this version for testing.

P.P.S: Test is running now. I'm not posting to emoncms.org but to local apache installed on Raspberry pi, serial is attached to Pi running picomon. Repeat-interval is down to 5 seconds....

P.P.P.S:
3.5 hours, aprox. 2500 requests, that's about the same as 20h with 30sec repeat interval - still running, no issues yet.
... now after 4,5h the error occured. So nothing really new until now, but reproduced. Will inverstigate further, so stay tuned ;-)

@nilrog
Copy link
Contributor Author

nilrog commented Feb 2, 2014

No worries...we will get this bug eventually :)

I don't expect you to work 24/7 on this. You have done a great job so far with this library. Finally we have a hardware agnostic Ethernet API that allows us to write sketches that can use either Wiznet chips or the ENC28J60. Both EtherCard and NanodeUIP are great, tailored, libraries for the ENC28J60. But they have their own set of limitations and quirks making them hard to work with. The Arduino Ethernet API in contrast is dead simple. And now we can use that with this chip too :)

Glad to hear you got it reproduced locally. I did a quick test with a small bottle.py web server a few days ago but wasn't able to reproduce it then. Perhaps I didn't wait long enough...

I don't think there is an issue that the client isn't stopped before a new request is made. I think it is more likely that something in the interwork between uIP and UIPEthernet goes wrong sometimes. Or that there is some subtle bug in UIPEthernet that sometimes pops up.

I noticed there are a few compiler warnings when compiling UIPEthernet that I have neglected before. I have written issues for them (#34, #35, #36, #37 and #38). But I have not yet had time to look into and try and fix all of them. And GitHub is messing with me tonight so I cannot push the changes I have done :(

There are some other warnings too but they are less severe so I haven't written any issues for those.

@ntruchsess
Copy link
Owner

I could prove that calling stop before connect doesn't solve this issue (did include stop() as very first call in connect-method). So there is definitly something going on that is not obvious as the place that disassociates the clients data-member from the uip-connection (https://github.com/ntruchsess/arduino_uip/blob/master/UIPClient.cpp#L398) on remote-close doesn't disassociate the data-member from the client-object but only sets the data's state. The data-object is disassociated during read (https://github.com/ntruchsess/arduino_uip/blob/master/UIPClient.cpp#L258) if remote disconnect occured before or in stop (https://github.com/ntruchsess/arduino_uip/blob/master/UIPClient.cpp#L110). But in both these cases data remains associated with the uip_connection to be cleaned up after successful close of connection (in https://github.com/ntruchsess/arduino_uip/blob/master/UIPClient.cpp#L398).
There still is the chance the remote doesn't acknowledge the FIN so this last step will be delayed until tcp-timeout has happened, but in this case this uip_connection is not usable during connect and if all uip_connections are in use this way connect would fail.

So I think it's either not interrupt-safe or other race-condition or memory-corruption. All three very hard to find though.

I'm currently running a test with no Serial output at all and no Timer (to minimize interrupts), watching the webservers-log to see whether it's still active. (aprox. 1000 requests so far...)

Completely disabling interrupts within UIPEthernet for testing-purpose by just wrapping all calls to UIPEthernet in pairs of cli() and sei() didn't work, Client.connect() would never return (which is something I'm still a bit surprised, but it could be that millis() fails within interrupt-disabled sections so connect() wouldn't send out further packets as internal timeouts never expire...)

@ntruchsess
Copy link
Owner

good news. Guess I found the issue!
Changed: https://github.com/ntruchsess/arduino_uip/blob/dev/UIPClient.cpp#L98
included in 027171e

ntruchsess added a commit that referenced this issue Feb 10, 2014
fix #30, caused by leaked client.data-members when remote closed connection with no unread inbound data left. Add explicit check for state!=0 to detect this in client.close();
@nilrog
Copy link
Contributor Author

nilrog commented Feb 11, 2014

Great job hunting this one down! :)

I tried running my sketch over night and it was still working this morning. A few timeouts had happened during night but they didn't cause UIPEthernet to run out of resources. So i'm a happy camper now :)

@ntruchsess
Copy link
Owner

@nilrog: great!
While it sometimes feels a bit bad having produced only about 8 characters of sourcecode (including the whitespace) in 10 hours of work it immediatly feels better when you envision what difference these few characters can make :-)

@nilrog
Copy link
Contributor Author

nilrog commented Feb 12, 2014

The best fixes are the ones that requires few changes :)

I will try to run the sketch for a few days and see how it works.
I did however see that sometimes getHostByName returns -1, which is TIMED_OUT. Not sure why though as I didn't have any trace running. And I didn't recall seeing that before.

@nilrog
Copy link
Contributor Author

nilrog commented Feb 14, 2014

48+ hours and counting now. I think it is safe to say that this library is quite stable now :)

Haven't seen anymore getHostByName == -1 so that might have been a temporary network glitch.

@lephu
Copy link

lephu commented May 7, 2014

Thank ntruchsess and nilrog for finding and fixing the bug. After update the code and run sketch for over 18 hours, my arduino still hang out. I use the same hardware as nilrog used: minipro + enc28j60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants