Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to upstream lwIP + ESP8266 patchset instead of sloppy vendor code-drop #263

Open
pfalcon opened this issue May 21, 2017 · 19 comments

Comments

@pfalcon
Copy link
Owner

pfalcon commented May 21, 2017

https://github.com/kadamski/esp-lwip existed for a long time, but had following issues:

  1. It was unclear what methodology was used to create it.
  2. At some vendor SDK update, it stopped working, and due to p.1, hardly anyone knew or cared to fix it.

That's why I went for partial solution of https://github.com/pfalcon/esp-open-lwip/ , though rebasing onto upstream was always the goal.

I finally went for that, making followed steps:

  1. Cross-referenced vendor history-less codedrop as captured in https://github.com/pfalcon/esp-open-lwip/ against upstream git (tool to be posted).
  2. The "most matching" revision turned out to be STABLE-1_4_0-RC2.
  3. Diff against https://github.com/pfalcon/esp-open-lwip/ was made then, and all changes were carefully categorized in 3 categories: a) useless dirt, to be dropped; b) changes required to just get lwIP run on esp8266; c) changes beyond that. c) turned out to be backported cherry-picks from newer upstream version and some truly magic vendor changes.
  4. When I had a look at upstream + b) changes from above, and compared to https://github.com/kadamski/esp-lwip , they suddenly appeared to be almost matches. The only difference is approaches: I made shy removals to get to the core of changes (took 2 days), while @kadamski threw away as much as possible, then added something with commits like "without this, it crashes" or "without this, pings don't work". I definitely was impressed by the bravery of the approach, as confirmed by my coward approach from the other side, and went to make https://github.com/kadamski/esp-lwip work with SDK 2.0+.
  5. Of course, nothing worked initially, but in few hours, it did.
  6. Then I proceeded to test how well it works, and oops, standard MicroPython smoke test https://github.com/pfalcon/micropython-projs/blob/master/esp8266-tests/test_dl.py wasn't able to download 6MB even once.
  7. I then switched back to my approach, trying to minimize it to the level of https://github.com/kadamski/esp-lwip. That had the same issues with the test.
  8. Then I paid attention that https://github.com/kadamski/esp-lwip lwipopts.h used TCP_MSS 1460 which is huge and known to lead to memory overflow. Just switching it to lwIP's default of 536 didn't help, but I got an idea that there may be further important config changes dug in improperly made lwipopts.h from the vendor codedrop.
  9. Finally, when those config options were extracted and cleaned up, https://github.com/pfalcon/micropython-projs/blob/master/esp8266-tests/test_dl.py showed much better behavior. I still had a case when it stopped, with WiFi TX buffers exhausted. But currently it has downloaded 100MB and counting.

Results of this work:

  1. Repo with my rebasing approach: https://github.com/pfalcon/lwip-esp8266
  2. My fork of https://github.com/kadamski/esp-lwip : https://github.com/pfalcon/esp-lwip . It's already patches and reworked (original commits are updated, squashed, and patched).

The plan remains to use @kadamski's work, but with my extracted patchset around to investigate any issues.

This was referenced May 21, 2017
@pfalcon
Copy link
Owner Author

pfalcon commented May 21, 2017

@dpgeorge: FYI.

@pfalcon
Copy link
Owner Author

pfalcon commented May 21, 2017

I still had a case when it stopped, with WiFi TX buffers exhausted.

That's definitely a noticeable issue:

Free WiFi driver buffers of type:
0: 0 (1,2 TX)
1: 0 (4 Mngmt TX(len: 0x41-0x100))
2: 8 (5 Mngmt TX (len: 0-0x40))
3: 4 (7)
4: 7 (8 RX)
lwIP PCBs:

@pfalcon
Copy link
Owner Author

pfalcon commented May 21, 2017

Suspicion lies on packet retransmission issues, e.g. vendor codedrop has pretty magic code like pfalcon/lwip-esp8266@79dc343

@dpgeorge
Copy link

@pfalcon, ack, nice work! It's a shame thought that it's not possible to use lwip "out of the box". Are there plans to upgrade to lwip v2?

@pfalcon
Copy link
Owner Author

pfalcon commented May 22, 2017

Are there plans to upgrade to lwip v2?

Well, the idea is that it will make it possible, but I don't have any specific plans to got that far, more realistic idea is to upgrade minor release by minor release (and even that requires better testing process than we have now).

@dpgeorge
Copy link

Well, the idea is that it will make it possible, but I don't have any specific plans to got that far,

Ok. lwip v2 had some changes to how netif's are added/initialised so that makes it difficult to upgrade if Espressif have hidden that initialisation code in a binary blob.

@kadamski
Copy link

@pfalcon: Great work. I unfortunately never had time to maintain the code I've made. I'm glad you did the work. Our methodology was very similar, I remember spending a lot of time looking at the disassembly of the Esperssif binary blobs too, though :)

@LongHairedHacker
Copy link

Hi @pfalcon,
I've been playing around with your reworked fork (https://github.com/pfalcon/esp-lwip) and while it compiles fine for me, there seem to be some missing symbols: os_malloc, os_realloc, os_zalloc, os_free.
Those are, as far is I can tell, not provided by esp-open-sdk (I'm using esp-open-sdk based on 2.1.0) and kadamski has defines for them: https://github.com/kadamski/esp-lwip/blob/esp8266-1.4.1/config/lwipopts.h#L40

Are they missing intentionally or am I doing something wrong?

@pfalcon
Copy link
Owner Author

pfalcon commented Jun 8, 2017

@LongHairedHacker : I don't have time to look into that issue now (or soon), but feel free to provide exact steps on how you got to that issue so it can be investigated later.

@davydnorris
Copy link

Doesn't seem to be any way to create an issue on the /esp-lwip repo directly so I've had to comment here.

Compiled this last night and got some warnings/errors:

  • implicit definitions for some functions, in particular mem_malloc/mem_calloc/mem_free. Suggest adding #include "mem.h" to lwipopts.h and using os_malloc/os_calloc/os_free in the defines because vPortMalloc has extra parameters that will cause compile errors when the function is properly defined. the os_* versions have those parameters taken care of. This will also solve the problem reported by @LongHairedHacker

  • error in compiling dhcpserver.c - dhcpserver.h is missing. It's actually there but the path in the .c file is wrong or the file itself is in the wrong place. I wasn't sure which it was so leave it up to you whether to move it or change the path in dhcpserver.c

@pfalcon
Copy link
Owner Author

pfalcon commented Jun 10, 2017

@kadamski : Thanks for ping-back ;-).

Our methodology was very similar, I remember spending a lot of time looking at the disassembly of the Esperssif binary blobs too, though :)

Well, I've been spending enough time looking at the disassembly too. And if you look e.g. at https://pfalcon.github.io/xtensa-subjects/2.0.0-p20160809/out.html#esf_buf_alloc , you'll see that it allocates different buffer structures than you saw when you looked at that, and there likely were number of changes across SDK evolution. So, for the above particular work I actually didn't look at the disasm, instead following more black-boxish idea that while there're many changes across SDK versions, only some of them would affect lwIP integration. That's still heuristic and I'm not sure whether to continue digging your version, or stick with 1.4.0-rc2 with more or less complete vendor patchset I did initially.

@pfalcon
Copy link
Owner Author

pfalcon commented Jun 10, 2017

Doesn't seem to be any way to create an issue on the /esp-lwip repo directly

Github by default disables issue tracker in forked projects, considering that any communication should happen upstream. I've enabled it now.

Suggest adding #include "mem.h" to lwipopts.h and using os_malloc/os_calloc/os_free in the defines because vPortMalloc has extra parameters that will cause compile errors when the function is properly defined. the os_* versions have those parameters taken care of.

There're various conflicts between lwIP headers and SDK headers. I definitely worked that around for it to be able not just compile, but also run, but it's not fully clean/correct, and might have left in my git stash.

error in compiling dhcpserver.c - dhcpserver.h is missing.

I used dhcpserver from SDK2.0 even for kadamski's upgraded repo. Bits and pieces might have left in git stash too. Generally, the situation with dhcpserver isn't clear - it's unclear what it's licensing is, etc. I'd have an idea to replace Espressif's proprietary version e.g. with one from esp-open-rtos, but that's written with threaded style, so needs conversion to native lwIP API, and then long testing for any possible regressions comparing to proprietary server.

But the current situation is, to remind, that version upgraded from kadamski's work has obvious regressions. So, I'm not sure which version to work with, there're 2 ways:

  1. Take kadamski's version and fix regression.
  2. Take vendor-based version and keep removing patches while checking regressions on each step.

Both ways are long and boring, but only no.2 is sustainable.

@pfalcon
Copy link
Owner Author

pfalcon commented Jun 10, 2017

To xref other work:

  1. @d-a-v's work on "non-OS" SDK upgrade to lwIP 2.0: https://github.com/kadamski/esp-lwip/issues/8#issuecomment-304589345
  2. esp-open-rtos work to upgrade FreeRTOS-based env to lwIP 2.0: LwIP v2 support SuperHouse/esp-open-rtos#389

@davydnorris
Copy link

What i was saying is that if you clone the repo and do the make like in your instructions it breaks. You get warnings (interpreted as errors) about implicit declarations for vPortMalloc etc. and you get an error about not being able to find dhcpserver.h.

If you fix the implicit declaration error by adding mem.h before your define, you then get errors about too few parameters. These are fixed if you use os_malloc instead of vPortMalloc because that expands and supplies values for the extra parameters, and the code is still the same.

If you go looking to fix the missing include, it is there in the tree but is not in the right place. Either it needs to be moved, or the reference in dhcpserver.c needs to be changed to suit.

If you are not expecting the current repo to build then that's fine, but right now it won't complete the build.

The one other thing I did in the makefile was to make it suitable for a non-standalone toolchain by putting in $(SDK_BASE) and pointing to my SDK. I figure it's easier to play with a modified libmain that way while keeping your toolchain operational

@davydnorris
Copy link

davydnorris commented Jul 1, 2017

@pfalcon have you seen the latest lwIP related commits in ESP_NONOS_SDK from Espressif? Does this start to make life easier for us?

BTW I am trying to get some code out right now but as soon as I am done I'll be jumping into the lwIP2 stuff d-a-v has been working on - this is really good.

@pfalcon
Copy link
Owner Author

pfalcon commented Jul 1, 2017

@davydnorris: I didn't, I'm still in "reduced performance" mode. If you have any specific links, those would be welcome.

@davydnorris
Copy link

espressif/ESP8266_NONOS_SDK@cf83aba

I don't ever remember lwIP source being a part of the SDK, and now it's also in a third party folder of its own with separate build instructions. Is this just what we already know or does this give us some more hints?

@c0d3z3r0
Copy link

@d-a-v
Copy link

d-a-v commented May 3, 2019

Maybe you should try @someburner 's fork
https://github.com/someburner/esp-open-sdk/tree/sdk2-lwip212
and report here ?

note: lwIP-2.x is not usable with nonos-sdk firmware (any version) without an adaptation layer
so espressif lwIP-2.x is not for us (but for esp8266-rtos-sdk or more generally their IDF subsystem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants