Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with on-demand fast DHCP timer with multiple interfaces (IDFGH-9738) #53

Closed
rojer opened this issue Mar 29, 2023 · 5 comments
Closed

Comments

@rojer
Copy link
Contributor

rojer commented Mar 29, 2023

I'm pretty sure this is supposed to be outside the NETIF_FOREACH loop. It is even indented as such, yet it is inside the loop, and may, depending on when tmr_restart gets set, cause every new iteration to schedule multiple additional timers.
This blows up pretty spectacularly if a DHCP server is slow or down. I am looking at a core dump of a device that ran out of memory with 3000+ dhcp_fine_timeout_cb timers :)

cc @freakyxue

@github-actions github-actions bot changed the title Bug with on-demand fast DHCP timer with multiple interfaces Bug with on-demand fast DHCP timer with multiple interfaces (IDFGH-9738) Mar 29, 2023
@david-cermak
Copy link
Collaborator

The bug has already been fixed in 86df9f4 and d5e56d0

@rojer
Copy link
Contributor Author

rojer commented Mar 29, 2023

indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.

@binary1230
Copy link

Thanks for the fix!

I just ran into what is almost certainly this bug on IDF 5.0.1 by total random chance. tracked it here after a day of work figuring out trying to figure out what was up.

We had two netifs (wifi and ethernet) on our device. By mistake, our access point was powered up and the ESP32 could associate to it, but the ethernet cable from the AP to the router was (by mistake) not plugged in. So, the ESP was never able to reach a DHCP server, triggering this bug, and an out of memory crash within 15-20 seconds of boot. Nasty

Separately: not 100% sure it's the same, but I was also recently getting some really odd out-of-memory crashes in the same areas (with DHCP and sys_ timers allocations) when a device was on the edge of wifi range.


I would just +1 for, is there an ETA for when a stable IDF version might pick this up? Seems like both 4.4 and 5.0.1 are broken. If not, do you think it's reasonably safe to cherry pick those two commits into a fork and use that in production?

Thanks!

@AxelLin
Copy link
Contributor

AxelLin commented May 20, 2023

indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.

@david-cermak
IDF v4.4 and v4.3 branches also need fix.
BTW, I don't find d5e56d0 in esp-lwip 2.1.2-esp branch.

@espressif-abhikroy
Copy link
Collaborator

This bug is fixed in commit 8dad8d3 in esp-lwip 2.1.3 and in commit 8290c3b in esp-lwip 2.1.2.
This issue will be closed, and if there is any other issue, a new issue can be opened or this one can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants