Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Panic] add user configurable reboot delay in seconds (IDFGH-8506) #9962

Conversation

chipweinberger
Copy link
Contributor

@chipweinberger chipweinberger commented Oct 12, 2022

Issue: #9462

Adds a CONFIG_ESP_SYSTEM_PANIC_REBOOT_DELAY_SECONDS option to the panic handler.

Very simple change & tested on a ESP32-S3, and ESP32-S2.

Why?

  • "print & reboot" is annoying. Boot loops log way too much, forever & ever. There's no time to look at the stack trace. I always need to unplug the device, and then scroll back and look for the point it crashed.
  • "print & halt" is not good for production or beta testing. A device that does not reboot is not useable.

This middle approach is the best of both, especially for beta testing, and is sorely needed! =)

@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 12, 2022
@github-actions github-actions bot changed the title [Panic] add user configurable reboot delay in seconds [Panic] add user configurable reboot delay in seconds (IDFGH-8506) Oct 12, 2022
@chipweinberger chipweinberger force-pushed the user/chip/panic-reboot-delay branch 3 times, most recently from f5c87b5 to 8607d3f Compare October 12, 2022 23:15
@Alvin1Zhang
Copy link
Collaborator

Thanks for your contribution.

@chipweinberger
Copy link
Contributor Author

I have been testing this change for a ~week without issue. Very nice quality of life change =)

@0xjakob
Copy link
Collaborator

0xjakob commented Oct 18, 2022

@chipweinberger We will discuss this PR tomorrow morning, then let you know about the result.

@chipweinberger
Copy link
Contributor Author

Any word?

This feature is awesome! If I do say so myself 😜

@chipweinberger
Copy link
Contributor Author

@0xjakob bump.

I've been using this myself for weeks now. Users will really appreciate the option.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented Nov 14, 2022

@igrr Just hoping for an update. positive or negative, just want to house-clean.

@0xjakob
Copy link
Collaborator

0xjakob commented Nov 15, 2022

Hi @chipweinberger, sorry for the huge delay! Part of the reason is that we had some discussion going on internally about this. Let me explain:

One issue is that we're not sure if it's not a tooling problem. E.g., on the command line using idf.py monitor, you can always press Ctrl+] to stop the output (if not, let us know, it's probably a bug). Using tools to interrupt the output, or alternatively log to a log file, wouldn't require any additional configuration in IDF. What tool are you using to monitor output from devices?

Another minor risk is that something goes wrong inside the code where the WDT is disabled. It's a minor risk but might still be an issue for people who have devices in beta-phase. Maybe adding a brief explanation in the Kconfig option would be nice. Alternatively, changing the WDT to match the delay might be possible, but I'm not familiar with WDT.

Another option would be to wrap panic_restart(), but that would still be on the user side. In case you want to try, you can use the wrappers example.

So far, we haven't reached a final conclusion what is the best solution.

Copy link
Collaborator

@0xjakob 0xjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code in general looks good to me. Just one very minor note about coding style.

components/esp_system/panic.c Outdated Show resolved Hide resolved
@chipweinberger
Copy link
Contributor Author

Thanks for the update!

What tool are you using to monitor output from devices?

VSCode. AFAIK, there is no way to "pause" the VSCode console after a crash. I always need to pull the cable.

you can always press Ctrl+] to stop the output

Even still, by the time I've noticed the crash my device will have logged 1000 lines immediately on boot, so it's quite a bit of annoying scrolling backwards...

Another minor risk is that something goes wrong inside the code where the WDT is disabled.

disable_all_wdts(); is already called in the panic handler code, so this should not add additional risk.

I don't see the reason not to add it, given that it is just an "option", with no obvious downside AFAICT. I've really enjoyed the past month of using it. But of course, it is your decision ultimately.

@chipweinberger chipweinberger force-pushed the user/chip/panic-reboot-delay branch 3 times, most recently from a4cd7b0 to 36f0e15 Compare November 15, 2022 12:02
@chipweinberger
Copy link
Contributor Author

Not sure why precommit just failed... was working before.

"fatal: could not read Username for 'https://gitlab.com/': No such device or address"

@chipweinberger
Copy link
Contributor Author

Not sure how to fix precommit. All I did was fix a bracket.

fatal: could not read Username for 'https://gitlab.com/': No such device or address

@0xjakob
Copy link
Collaborator

0xjakob commented Nov 15, 2022

Not sure how to fix precommit. All I did was fix a bracket.

fatal: could not read Username for 'https://gitlab.com/': No such device or address

Sorry for the inconvenience, this is an issue with a mirror of a third party project. The fix has been merged already on our internal gitlab server, but needs to run the pipeline on master to sync to github. With the next sync, it should be available here, please rebase then.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented Nov 17, 2022

updated.

Please decide to merge or deny! Don't want too many open PRs 😃 thanks @0xjakob !

@0xjakob
Copy link
Collaborator

0xjakob commented Nov 28, 2022

sha=71a8410547c4718393cc4dee354bf7d99cb0b2fa

@0xjakob 0xjakob added the PR-Sync-Merge Pull request sync as merge commit label Nov 28, 2022
Copy link
Collaborator

@0xjakob 0xjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chipweinberger,

we finally discussed this PR internally and found a couple of unsolved questions that we would like to address before we merge it. Feel free to resolve them yourself, otherwise we would like to add another commit addressing these.

First of all, the concern is that disabling the watchdog timers for such a long period is quite risky. It would be better to adjust the timeout of the watchdog according to the reboot delay. We furthermore would like to test somehow that the configuration works, probably inside the test applications. Also, would you be OK to adjust the code so that there is no continuous reboot print anymore but just a one-time print, like "Waiting %d seconds before reboot"?

@chipweinberger
Copy link
Contributor Author

That sounds OK to me.

It is great if you make the adjustments you like.

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Dec 2, 2022
@espressif-bot espressif-bot added Status: In Progress Work is in progress Status: Reviewing Issue is being reviewed and removed Status: Selected for Development Issue is selected for development Status: In Progress Work is in progress labels Dec 14, 2022
@0xjakob
Copy link
Collaborator

0xjakob commented Dec 22, 2022

@chipweinberger We merged the feature. It's available on master now.

@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally and removed Status: Reviewing Issue is being reviewed labels Dec 22, 2022
@chipweinberger
Copy link
Contributor Author

I noticed! looks great. thanks for you work getting it in.

@chipweinberger chipweinberger deleted the user/chip/panic-reboot-delay branch May 20, 2023 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-Sync-Merge Pull request sync as merge commit Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants