Fix repeat reboot bug when rfinit data is filled with 0xff #1500

Closed
wants to merge 4 commits into
from

Projects

None yet

5 participants

@vowstar
Collaborator
vowstar commented Sep 15, 2016

Fix repeat reboot bug when rfinit data is filled with 0xff.

  • This PR is compliant with the contributing guidelines (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/en/*.

When erase a flash or fill flash with 0xff, using latest 19f80ab will cause reboot repeatly.

To verify this, I use these command below:

# erase flash
../tools/esptool.py --port /dev/ttyUSB0 erase_flash
# programming firmware
../tools/esptool.py --port /dev/ttyUSB0 write_flash -fm dio -fs 32m -ff 40m 0x00000 ../bin/0x00000.bin 0x10000 ../bin/0x10000.bin

After then I got repeatly reboot, so I changed baudrate to 74880 and listen the serial port:

cat /dev/ttyUSB0

and the mesage is output again and again:

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 26420, room 16 
tail 4
chksum 0xea
load 0x3ffe8000, len 2200, room 4 
tail 4
chksum 0x83
load 0x3ffe8898, len 8, room 4 
tail 4
chksum 0x1f
csum 0x1f
rf_cal[0] !=0x05,is 0xFF

The reason is printed at 74880 baud when reboot:
rf_cal[0] !=0x05,is 0xFF

So we should not believe esp_init_data_default.bin is optional in espressif's data sheet, the rf init data must write to flash before run nodemcu firmware.

This change will write rf init data to flash when do make flash512k andmake flash4m, and fix it.

@pjsg
Collaborator
pjsg commented Sep 15, 2016

Does this reset the saved SSIDs etc?

@marcelstoer
Collaborator

I don't support this for several reasons:

  • the flashing instructions don't become any clearer with this addition, the'll be less cohesive instead
  • all the necessary information is in the "Upgrading" section already
  • the make file contains one specific version of init data hard coded which is a pain to main
@jmattsson
Collaborator

So let me get this straight:

  1. Espressif initially did not include default init data, but required you to always flash that as well as the firmware.
  2. NodeMCU then started including a default init data block that would get automatically written if none present, to save users from the flashing hassle.
  3. The SDK then started doing the same, leading to lovely confusions over which init data was being used.
  4. We (well, I) removed the init data block from NodeMCU to remove the confusion.
  5. The SDK introduces the new rfinit data block, but requires you to always flash that as well as the firmware.

If that's what's going, it sounds to me like we should be back to step 2 here.

@vowstar
Collaborator
vowstar commented Sep 16, 2016 edited
  • Verify is SSID and password exist after write esp_init_data_default.bin
  • Rewrite flashing instructions what I changed to make it more clear
  • Don't use specific version of init data and make it same as SDK

Hi @pjsg I've tested that whrite esp_init_data_default.bin to flash not reset password and SSID, it will remember until we erase it. I use this command below to verify that, and I confirm that Wi-Fi SSID and password information not in esp_init_data_default.bin's sector.

python tools/esptool.py --port /dev/ttyUSB0 write_flash -fm dio -fs 32m -ff 40m 0x3fc000 bin/esp_init_data_default.bin

So we can do this with no worries 😄

Hi @marcelstoer , your suggestions is very nice, and I will improve it.

Now, I add address table to list init data addresses and blank addresses, and fixed an error links, (I feel) the document is more clear than before.

And then I delete all hardcode and use esp_init_data_default.bin form espressif's SDK, not use specific version of esp_init_data_default.bin.

Hi @jmattsson, the espressif's SDK always confused me, and will confused me in future, many strange change will appear in some days 😂

vowstar added some commits Sep 16, 2016
@vowstar vowstar Update docs/en/flash.md to make it more clear.Add blank.bin address d…
…escription.

Add address table to list init data addresses and blank addresses.
Add RF & SoC configuration and Wi-Fi configuration to overwrite confused  and
blank.bin
f81fa30
@vowstar vowstar Don't use specific version of esp_init_data_default.bin- Delete all h…
…ardcode and use esp_init_data_default.bin form espressif's SDK
7f3e4b8
@marcelstoer
Collaborator
marcelstoer commented Sep 18, 2016 edited

I really think this issue is much bigger than the few code changes this PR proposes. I hope I don't open a can of worms but here's my story.

I can't contribute code but I spend a lot of time answering questions on Stack Overflow and esp8266.com. 80% of them are from users stuck with flashing the firmware. 10% are from users who don't understand the asynchronous nature of NodeMCU. 9% ask how to implement a (HTTP) server and the remaining 1% are about deep sleep and hardware issues.
To give you an idea, here's a recent example: http://www.esp8266.com/viewtopic.php?p=55196#p55196
-> we can make the strongest impact if we improve tools & documentation for firmware flashing

One can always argue that users didn't read the manual thoroughly (they don't) or that they were simply too stupid to understand. One could also argue that if people "don't get it" it's because we don't explain things well enough. Or because the available tools are not simple enough.
If we want people to use our firmware, and I assume we all want that, then the entry barrier should be as low as possible.

My guess is that only a very small fraction of our users actually build the firmware from scratch. And not even all of those few probably use the flash target in our Makefile.
-> we can make the strongest impact if we improve tools & documentation for all the others

I see the following measures:

  • The flashing docs should be re-written from ground up. IMHO the current page has all the necessary information but its form/structure isn't appropriate for our biggest target group.
  • The NodeMCU flasher for Windows is becoming a liability. It wouldn't be such a big problem I guess if it wasn't associated with 'NodeMCU'. I've never used it because I'm not on Windows but it:
    • seems unmaintained
    • ships a very outdated default firmware (0.9.x). This makes people in the support forums then claim "but it works if I flash the firmware with NodeMCU flasher" not realizing that what they flashed is an ancient firmware.
    • has some odd UX quirks (judging from screen shots I see)
  • We have decided to dump the flash targets from our Makefile a long time ago as we want to get rid off our esptool.py copy.
@vowstar
Collaborator
vowstar commented Sep 19, 2016 edited

To make esp8266 easy to use is my objectives. Let me tell the history of NodeMCU Flasher.
In mid 2015, Esp8266 is a fairly closed hardware. To get all technology document's, I must sign a Non-disclosure agreement with espressif, And then can download form espressif's FTP using password that espressif given. The ealy tool for programming esp8266 named XTCOM_UTIL, this tool also under espressif's NDA, but very hard for using. The can only use under WindosXP, and bugly. When I want to change address to load binary, I must restart it, and at that time I haven't make automatically download circuit, so I must alway's adjust switches on every flash. It made me almost crazy, so I begin to developing ESP8266-Flasher(It becomes esp8266 flasher). But I don't know the protocol. So I use Logic Analyzer to watch XTCOM_UTIL's data, and guess the protocol to write esp8266's flash. So the esp8266-flasher (now is nodemcu-flasher) is born. But, the protocol is based on watch and even guess, so it is unstable.
To solve esp8266 programming problem, make it easy and automatically, I developed nodemcu-devkit-0.1, which use a Capacitor on DTR to reset like arduino, and RTS to handle GPIO0, but I'm failed, because DTR's Capacitor may generate Negative voltage when DTR triggered. Sometimes Negative voltage on esp8266's reset will cause boot abnormally( I use oscilloscope catched it ), very unstable. Then I developed next version, which uses DTR on RST and RTS to gpio0, it works. But when I plug or unplug an USB-Disk on my Labtop(while USB to UART not open), it always caused esp8266 rebooot. I found that when OS Reenumerated USB devices, CH340/CP2102/PL2303 will change DTR's level so that esp8266 reboot. To prevent this, I tried many method, until nodemcu-devkit-0.9. It was a very fortuitous thing, I found that esp8266's ROM bootloader check gpio0's level not on boot time, It will delay 90~300ms to do it. In this time slot, we can use DTR and RTS's Combinational logic to make esp8266's GPIO0 and RST's level change. Because when OS Reenumerated USB devices, the DTR and RST will changed at same time, so it will not influence esp8266. Only DTR and RST in different level, the GPIO0 and RST will change. Because esp8266's check GPIO0 is delayed, I can reset esp8266 first and then Change GPIO0, and ignore any OS Reenumerated USB devices. But I found CH340 is hard to use especially under MAC OS X(Digital signature issue, I've report this issue to CH340's vendor WCH), I developed nodemcu-devkit-1.0 which use CP2102. I consider using FT232, but it is more expensive than esp8266. If don't consider price, I prefer FT232. CP2102 and FT232 is friendly to OS X.

The espressif becomes open after nodemcu-devkit-0.9, and open-sorce many SDK's code. Now espressif open the firmware_download_protocol, so it is easier to do the same thing.

Now it seems that nodemcu-flasher is becomes next XTCOM_UTIL, it is old. I always using esptool-ck and esptool.py to do that, but I think for beginner's a easily and cross-platform(also I'm using linux) GUI-flash-tool is important. Now the esptool.py is super fast because of Cesanta stub. I think write a beautiful GUI frontend for esptool may a good idea.

@pjsg
Collaborator
pjsg commented Sep 21, 2016

I use the cesanta version of esptool.py as it programs nice and fast and it just works. I feel that the user experience of flashing an image to the nodemcu board should be:

somecommand <specify flash size> <specify port> <specify firmware image>

If rf cal data needs to be programmed, then we should do that. If other stuff needs to be done, then the firmware should do it (at least by default).

If all else fails, and the board won't boot, then the sequence:

erase all flash
program using above command

should have the best chance of making it work again.

@jimparis
Contributor

@marcelstoer, you wrote

all the necessary information is in the "Upgrading" section already

and

One can always argue that users didn't read the manual thoroughly (they don't) or that they were simply too stupid to understand.

The manual currently states that erasing the flash completely is all you need to do:

If there is no init data block found during SDK startup, the SDK will install one itself. [...]
Hence, here are two strategies to update the SDK init data:

  • Erase flash completely. This will also erase the (Lua) files you uploaded to the device! The SDK will install the init data block during startup.

This is misleading and wrong, at least for 1.5.4.1, and I would not argue that users who believed it are "simply too stupid to understand".

@marcelstoer
Collaborator

This is misleading and wrong, at least for 1.5.4.1

Meaning that erasing alone does not work? I can't verify that atm. Feel free to create a PR to correct that.

I would not argue that users who believed it are "simply too stupid to understand".

Neither do I. That sentence is in conditional, 3rd-person form on purpose. My point is that if lots of people don't understand our docs then it may be our fault. I hope this becomes clear if you read the entire comment.

@jimparis
Contributor
jimparis commented Oct 3, 2016

Yeah, erasing alone does not work. I will file a PR. I did understand the point of your comment, just wanted to make sure people were aware that the documentation is no longer accurate (which is a different problem than improving its form/structure).

@jmattsson
Collaborator

While I had been hoping to stay out of this, the rf_cal[0] != 0x05 issue hit me here as well, so I took some time off from the ESP32 work to sort this out. Please give PR #1526 a spin. It should fix this issue at the root. At least until Espressif pulls this rug out from underneath us too...

The main downside is that we'll once again run the risk of going out-of-sync with the SDK's esp_init_data_default.bin. I left a comment in the code on how I think we might avoid that, but I don't have time to pursue that avenue at this point. Someone else is quite welcome to if that PR gets merged.

@jimparis jimparis referenced this pull request Oct 4, 2016
Merged

Reimplement esp_init_data_default without hardcoded data #1527

3 of 4 tasks complete
@marcelstoer
Collaborator

Superseded by #1525 and #1527.

@marcelstoer marcelstoer closed this Oct 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment