Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mega-20191116 Core260 upgrade experience: DS18B20's error; Analog re-calibrate #2786

Closed
Domosapiens opened this issue Nov 28, 2019 · 26 comments
Labels
Category: Build Related to building/IDE/releases Type: Discussion Open ended discussion (compared to specific question)

Comments

@Domosapiens
Copy link

Domosapiens commented Nov 28, 2019

Before upgrade:

  • 2 WEMOS D1 units
  • with 2 DS18B20 and 4 DS18B20 working correct
  • both version mega-20190522
  • update reason: 6-7 reboots/day

Both units upgraded to mega-20191116 core 260 sdk 222 4M1M

Problem (the unit with 4 sensors):

  • All DS18B20 give 0.0

  • All DS18B20 don't give a log contribution

  • No scheduling?

  • Cold start, same problem

  • (re-) Submit of first DS18B20 task (all actions with ErrorStateValue -127) does not help

  • (re-) Submit of second DS18B20 task: sensor is working !

  • (re-) Submit again of first DS18B20 task: both sensors not working

Try again:

  • (re-) Submit of second DS18B20 task: sensor is working
  • (re-) Submit of third DS18B20 task : sensor is working
  • (re-) Submit of forth DS18B20 task : sensor is working!
  • (re-) Submit of first DS18B20 task: All sensors stop working!!!

Other unit with 2 sensors

  • re-flash with normal mega-20191116 4M1M
  • (re-) Submit of first DS18B20: not working
  • (re-) Submit of second DS18B20 task: sensor is working !
  • (re-) Submit again of first DS18B20 task: both sensors not working

.
.
.
Conclusion:
First sensor is omitted, 2,3 and 4 are working.

Bug search suggestion:
A start counting with 0 or 1 bug?

Additional info:
Bug introduced after mega-20190809
(other units of mine, with mega-20190809, don't have this DS18B20 problem)

Related to #2585 ??
First sensor not scheduled ???
#2745 ??
First sensor of a type not scheduled ???

(no need to discuss the hardware, that was working for > 3 months on both units)

@jimmys01
Copy link
Contributor

How about wiping the board with a blank.bin and then reinstall easyesp and re do all the configuration. Make a backup first.
I remember having to do that on some of my bs18b20 nodes

@TD-er
Copy link
Member

TD-er commented Nov 28, 2019

Can you disconnect one of the not working sensors (and disable the plugin for it) and check if the non working sensor starts working then?

@uzi18
Copy link
Contributor

uzi18 commented Nov 28, 2019

@TD-er have also reports about some kind of instability also on my own builds, will ask more details from my friends

@TD-er
Copy link
Member

TD-er commented Nov 28, 2019

Make sure you also check how many are connected (and active) on the same string.

@uzi18
Copy link
Contributor

uzi18 commented Nov 28, 2019

@TD-er but this is related to configuration submit or rules editing/sending

@TD-er
Copy link
Member

TD-er commented Nov 28, 2019

Ah, OK, I thought it was more about the init being called when submitting a task config and maybe that would be the issue here, when the plugins are initialized at boot they are all init'ed quite fast one after the other.

Well, you're the official Dallas expert here :)

@uzi18
Copy link
Contributor

uzi18 commented Nov 28, 2019

and DHT ;)

@Domosapiens
Copy link
Author

Just after a cold start.
Modifying the DS18B20 task, results in:
FS : Daily flash write rate exceeded! (powercycle to reset this)
In GUI and Log

Just after a cold start.
Removing the DS18B20 task, results in:
FS : Daily flash write rate exceeded! (powercycle to reset this)
Only in Log

So something is writing extreme often to flash

@TD-er
Copy link
Member

TD-er commented Nov 28, 2019

Hmm that's strange.
Can you see in the Tools => info page how often it has written to flash?

@Domosapiens
Copy link
Author

[OT for this problem]
I took the unit from the wall, to my bench.
Re-power with nothing connected
Daily flash write rate on 0... problem gone

I tough ... The Off cycle was too short (due to an overkill of capacitors for stability ;)

Then after a number of experiments, say 10 different configurations.
Daily flash write rate exceeded!
Ok

But after a long Off cycle, still the problem
Flash Writes: | 101 daily / 357 boot

Solution:
disconnect all sensors/actuators .... then the counter is reset.
Configuration like this:
https://www.letscontrolit.com/forum/viewtopic.php?f=2&t=5955#p32473
But why is disconnecting 3 extra plugs the solution??

@Domosapiens
Copy link
Author

Back to the observations:

  • 2# DS18B20 working as ,Task 4 and 5, correct for more than 3 months with mega-20190522 ...but with 6-7 reboots/day
  • both DS18B20 connected to 1 plug (ending with number 79 and ee)
  • update to mega20191116

Can you disconnect one of the not working sensors (and disable the plugin for it) and check if the non working sensor starts working then?

Spoiler alert .... Task4 position is corrupted !

  • removed both DS18B20 tasks
  • cold reboot
  • added first DS18B20 number 79 as task4: no result in log
  • added second DS18B20 number ee as task5: ID ee with Temp in log
  • modified first DS18B20 number 79 task4: no result in log, ID ee not in log anymore
  • modified second DS18B20 number ee task5: ID ee with Temp back in the log
  • modified second DS18B20 to number 79 task5: ID 79 en Temp back in the log

And then:
Moved from Task4 to Task11 (the last free one)

Both sensors are working correct.
The Task4 position is corrupting or corrupted ??

@TD-er
Copy link
Member

TD-er commented Nov 28, 2019

Do you have rules active?
Is espeasy p2p active?
What controllers do you use?

@Domosapiens
Copy link
Author

Yes, 3 pages of rules, but ..
w.r.t. temperature: only to check for error readings ( >0 )
(but log shows ...NO readings)

Yes, P2P active. 2 units. Both mega-20191116.
No task sharing
Only SendTo command from one, to the other

2 Domoticz HTTP controllers are active (2 different IP's)

@Domosapiens
Copy link
Author

Some wrap-up:
For Unit1, changing the DS18B20 Task, from Task 4 to Task 11 made both DS18B20 sensors working.

So I did try the same for Unit2, and swapped the DS18B20 Task 4 with Task 11 (in this case a Switch input) and YES. Now #4 DS18B20 tasks are operational.

To me it seems that there are some strange memory-out-of-bound things going on.
Something is writing on the Task 4 reserved memory ??

About the reboot mania:
Unit 1 seems very stable ( I saw more than 24 hour on the counter before I did other tests and rebooted)
Unit 2 with #4 DS18B20 seems less stable, (4 pd) not sure why yet.

Hope these observations will help.

@uzi18
Copy link
Contributor

uzi18 commented Dec 3, 2019

@Domosapiens please show your configuration - screenshots
how it work with p2p disabled?

@TD-er
Copy link
Member

TD-er commented Dec 3, 2019

One explanation for this behavior can be that you increased the "distance" between the tasks.
The tasks are iterated in the order they appear in the list to see if something has to be done.
If there are other tasks in between that do actual work, then this may introduce some delay between Dallas related operations for different tasks.
So if you place them on task 4 and 11, with no tasks in between, I expect there will be no difference to when they run on task 4 and 5.
However, if you have some active tasks in between, the behavior may be different.

@Domosapiens
Copy link
Author

@uzi18 : to help your bug search ..

Unit with #2 DS18B20 sensors
image
Uptime limited by intended cold re-boot

Situation as after re-order:
image

Task11 sensor
image

Task5 sensor
image

Code related to P2P, just SendTo:
Function: if movement detection, illuminate the LCD of the other unit also.
Both units have the same mega20191116 release

`On Detect#Switch do
IF [Detect#Switch]=1
LCDCMD,on
SendToHTTP 192.168.1.7,8080,/json.htm?type=command&param=switchlight&idx=218&switchcmd=On
SendTo 30,"event,LCD_ON"
//SendTo 19,"event,LCD_ON"
//SendTo 14,"event,LCD_ON"
timerSet,1,600 //voor test, later 60
EndIf
Endon

On Touch#Switch=0 do
LCDCMD,on
SendToHTTP 192.168.1.7,8080,/json.htm?type=command&param=switchlight&idx=218&switchcmd=On
SendTo 30,"event,LCD_ON"
timerSet,1,600 //voor test, later 60
Endon

On Rules#Timer=1 do
LCDCMD,off
SendToHTTP 192.168.1.7,8080,/json.htm?type=command&param=switchlight&idx=218&switchcmd=Off
SendTo 30,"event,LCD_OFF"
//SendTo 19,"event,LCD_OFF"
//SendTo 14,"event,LCD_OFF"
Endon
`

  • no avail. time now for stopped P2P experiments

Unit with #4 DS18B20 sensors

Second unit config:
image
Uptime by intend!

After my task 4 - task 8 swap:
image

@Domosapiens
Copy link
Author

@TD-er ,
Gijs ...
for both 2 units (!) it was "a" sensor moving away from the current position that solved the problem.
By coincidence (??) it was Task 4.
The task 4 sensor was preventing the good operation of the Task 5 sensor.
And .. for the second unit ... also the Task 6 sensor and Task 7 sensor.

I remember the discussion about 12/24 Tasks and Task/Memory allocation.
The current definition and allocation of those memory blocks is error prone because it is unrelated.
Just my wild guess for this problem: Memory wreckage.

About mega20191116 release:
The 2 sensor unit shows a huge "reboot" improvement.
The 4 sensor unit shows less (??) improvement ??
(sorry, conclusions are disturbed by irregularities of my logging machine)

In generally I operate sensor tasks at at different time, like 29, 30, 31, 33 seconds (or 58,59,60,61) to create some random aspects for the scheduler.

@TD-er
Copy link
Member

TD-er commented Dec 4, 2019

Was one of them (or both) running a test firmware? (see firmware name in the sysinfo tab)

@Domosapiens
Copy link
Author

No, no test version.
But ... thanks for the hint to look at sysinfo

About mega20191116 release:
The 2 sensor unit shows a huge "reboot" improvement.

Running: ESP82xx Core 2_6_0, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support

The 4 sensor unit shows less (??) improvement ??

Running ESP82xx Core 2_5_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support

Will update the 4 sensor unit to Core 2_6_0, also.

@TD-er
Copy link
Member

TD-er commented Dec 4, 2019

Maybe even go for core 2.6.1 or 2.6.2?
Those have some more improvements.

@Domosapiens
Copy link
Author

Core 2.6.1 or 2.6.2 not seen in mega20191116 release
(self compilation too complicated for a 1955-er).

My first objective is to get my 7 production units to an acceptable reliability level.
Still hoping/waiting for some stability and not for the latest from the (untested) latest.

(second unit looks promising: running now 8hr without RB) !

@ghtester
Copy link

ghtester commented Dec 5, 2019

BTW. Self compilation is not that complicated anymore thanks to great Vagrant environment prepared by TD-er. All you need is a machine with (preferably) Ubuntu 18 LTS OS connected to Internet and able to run a virtual machine inside (so some RAM and CPU features are needed but even 10 years old PC is enough). If you are interested, I can help you with it.
Regarding to stability - I am still using 20180311 build on one ESP node which is just watching temperature by 18b20 sensor, publishing the data to Thingspeak and to another local DB through HTTP and sending an e-mail if the temperature is out of interval. The uptime was over 150 days when I recently checked.
In latest releases the stability is also significantly improved but it depends on which plugins / features are you using...

@Domosapiens
Copy link
Author

@ghtester

not that complicated anymore

Funny .... and then you use an amount of slang that this ESP User don't (want to) understand.
The last time I touched UNIX was in the Seventies.


ESP_Easy_mega-20191116_normal_core_260_sdk222_alpha_ESP8266_4M1M, is a great release!!!
First unit up for 3day16hour, second for 2day21hour

A third unit, previous running v2.0.0-dev13, now also updated with positive experience: 13hour uptime:
image

To share my experience, some minor remarks:

  • DS18B20 Task5: A re-select of address, and resolution and re-save was needed change of Error State Value from NAN to -127 (i.o.w. the sensor was lost)
  • compared to previous: this unit has no DS18B20 Task on Task4, so therefore no repetition of the problem?
  • pulse counter (water-flow meter) seems to run smooth!
  • Task3&4: Generic System Info tasks need a re-save
  • Task11: The Analog input needs re-calibration !! Is a 10% higher than before. (the Boiler limit of 8Bar, reeds now 8.8Bar, what impossible is due to a security valve)

Q: Is this Analog input change a known and explainable effect?

@Domosapiens Domosapiens changed the title Complex error with DS18B20's after upgrade to mega-20191116 mega-20191116 Core260 upgrade experience: DS18B20's error; Analog re-calibrate Dec 7, 2019
@TD-er
Copy link
Member

TD-er commented Dec 7, 2019

Q: Is this Analog input change a known and explainable effect?

Only thing I can imagine is that the plugin now does some more filtering and also not samples during WiFi (re)connect attempts.
So the only thing I can think of is that is now probably more stable (with oversampling enabled).
If you use a calibration, it is also possible the interpolation is now more accurate.

v2.0.0-dev13 used core 2.3.x and we're now using a newer core.
Maybe something changed in the core with regards to the internal calibration of the ADC done in the factory?

@TD-er TD-er added Category: Build Related to building/IDE/releases Type: Discussion Open ended discussion (compared to specific question) labels Dec 7, 2019
@Domosapiens
Copy link
Author

Great experience with mega-20191116 Core260 !
At this moment 4 units running this release. Have seen only intended reboots.
I see on these units uptimes of 11, 7, 3, 3 days. Congrats !

Today an other production unit was loaded with the mega-20191116 Core260 update.
And ... surprise ...this unit with, as Task4 a DS19B20, has the same DS18B20 problem as mentioned in the begin.
In short re-save Task4 will stop all DS18B20 measurements (seen in Log)
T5, T6 and T11 are working correct when re-saved after T4.
A re-save of T4 will will stop all DS18B20 measurements again (seen in Log)

image

The solution ... swap Task4 with Task9 a system info uptime task
All 4 DS18B20 tasks are now working (seen in Log)

B.t.w.
This unit measures input and output temperature of a floor heating unit and and adjusts via PWM the pump speed, according to the temperature difference between in and out. The larger the difference the higher the speed. No difference is no speed.
Secondary this unit has a flow counter to calculate instantaneous kW energy: Flow [L/s] * DeltaT * 4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Build Related to building/IDE/releases Type: Discussion Open ended discussion (compared to specific question)
Projects
None yet
Development

No branches or pull requests

5 participants