Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rrd4j] Persistence warnings #15220

Closed
mstormi opened this issue Jul 10, 2023 · 13 comments · Fixed by #15223
Closed

[rrd4j] Persistence warnings #15220

mstormi opened this issue Jul 10, 2023 · 13 comments · Fixed by #15223
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@mstormi
Copy link
Contributor

mstormi commented Jul 10, 2023

Since recent persistence optimization by @J-N-K I'm seeing a lot of messages like the one below and so do several others on the forum. Running S3515.

I think it happens when I update(+persist) an item multiple times in one second.
To be clear, it's okay that this will only persist one value per second.
But the remaining updates should not result in WARN level output so at least the log level should be lowered to DEBUG.

2023-07-10 10:41:07.473 [WARN ] [org.openhab.persistence.rrd4j.internal.RRD4jPersistenceService   ] - RRD4J does not sup                                               port adding past value this=1688978467, last update=1688978468. Discarding Netzeinspeisung - -8.0
@mstormi mstormi added the bug An unexpected problem or unintended behavior of an add-on label Jul 10, 2023
@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-4-0-milestone-discussion/145133/539

@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

I see this warning but the Items are not persisted every second. In my most recent instance I saw

2023-07-09 23:11:04.887 [WARN ] [d4j.internal.RRD4jPersistenceService] - RRD4J does not support adding past value this=1688965860, last update=1688965861. Discarding JennsOfficeSensors_Temperature - 76.53

The events in question from a full minute before and after the two changes to that Item.

2023-07-09 23:10:00.533 [INFO ] [openhab.event.ChannelTriggeredEvent ] - mqtt:broker:broker:dadsoh_out triggered dadsoh/out/DadsOH_Heartbeat/state#2023-07-09T23:10:00.484-0600
2023-07-09 23:10:00.539 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'DadsOH_Heartbeat' changed from 2023-07-09T23:05:00.487-0600 to 2023-07-09T23:10:00.484-0600
2023-07-09 23:10:07.485 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 53 % to 52.9 %
2023-07-09 23:10:12.629 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F
2023-07-09 23:10:17.854 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'RichOfficeSensors_LightLevel' changed from 6 to 5
2023-07-09 23:10:19.831 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'vWeather_Conditions' changed from clear sky to few clouds
2023-07-09 23:10:19.833 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'vWeather_Temp' changed from 60.476 °F to 60.242 °F
2023-07-09 23:10:19.834 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'vCloudiness' changed from 0 % to 20 %
2023-07-09 23:10:19.835 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'vWeather_Conditions_Icon' changed from raw type (image/png): 2817 bytes to raw type (image/png): 2922 bytes
2023-07-09 23:10:25.663 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 52.9 % to 53 %
2023-07-09 23:10:28.754 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F
2023-07-09 23:10:29.690 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 53 % to 52.9 %
2023-07-09 23:10:33.729 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 52.9 % to 53 %
2023-07-09 23:10:35.736 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 53 % to 52.9 %
2023-07-09 23:10:36.821 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F
2023-07-09 23:10:40.861 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F
2023-07-09 23:10:44.893 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F
2023-07-09 23:10:47.850 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 52.9 % to 53 %
2023-07-09 23:10:49.872 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 53 % to 52.9 %
2023-07-09 23:10:54.975 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F
2023-07-09 23:10:55.912 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 52.9 % to 53 %
2023-07-09 23:10:57.551 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MBRHumidifier_HumidityLevel' changed from 0.54 to 0.56
2023-07-09 23:10:57.552 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MBRHumidifier_MistLevel' changed from 2 to 1
2023-07-09 23:10:57.657 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'UPS_BatteryRuntime' changed from 1854 s to 1713 s
2023-07-09 23:10:57.657 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'UPS_UPSLoad' changed from 25 % to 31 %
2023-07-09 23:10:59.001 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Humidity' changed from 40.5 % to 40.6 %
2023-07-09 23:10:59.003 [INFO ] [hab.event.GroupItemStateChangedEvent] - Item 'MinIndoorHumidity' changed from 0.405 to 40.6 % through JennsOfficeSensors_Humidity
2023-07-09 23:11:01.024 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Humidity' changed from 40.6 % to 40.5 %


2023-07-09 23:11:01.024 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F


2023-07-09 23:11:01.028 [INFO ] [hab.event.GroupItemStateChangedEvent] - Item 'MinIndoorHumidity' changed from 0.406 to 40.5 % through JennsOfficeSensors_Humidity
2023-07-09 23:11:01.928 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MasterBedroomSensors_Humidity' changed from 51.7 % to 52 %


2023-07-09 23:11:09.097 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F


2023-07-09 23:11:14.076 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 53 % to 52.9 %
2023-07-09 23:11:16.093 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Mainfloorsensors_Humidity' changed from 52.9 % to 53 %
2023-07-09 23:11:21.197 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F
2023-07-09 23:11:34.698 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'NatesBedroomSensors_LightLevel' changed from 252 to 253
2023-07-09 23:11:36.519 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'mqtt:topic:broker:basement_waveplus' changed from ONLINE to OFFLINE
2023-07-09 23:11:36.567 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'mqtt:topic:broker:family' changed from OFFLINE to ONLINE
2023-07-09 23:11:36.570 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FamilyRoomWaveplus_BarometricPressure' changed from 23.305657827750395 inHg to 23.305067227972696 inHg
2023-07-09 23:11:36.572 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FamilyRoomWaveplus_Temperature' changed from 66.614 °F to 66.596 °F
2023-07-09 23:11:36.573 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FamilyRoomWaveplus_CO2' changed from 635 ppm to 622 ppm
2023-07-09 23:11:36.574 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FamilyRoomWaveplus_VolatileOrganicChemicals' changed from 89 ppb to 91 ppb
2023-07-09 23:11:37.353 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F
2023-07-09 23:11:41.386 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.53 °F to 76.35 °F
2023-07-09 23:11:58.005 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MBRHumidifier_HumidityLevel' changed from 0.56 to 0.55
2023-07-09 23:12:00.092 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'UPS_BatteryRuntime' changed from 1713 s to 1854 s
2023-07-09 23:12:00.093 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'UPS_UPSLoad' changed from 31 % to 25 %
2023-07-09 23:12:02.050 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MasterBedroomSensors_Humidity' changed from 52 % to 51.8 %
2023-07-09 23:12:13.664 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 76.35 °F to 76.53 °F

I've added a spaces to highlight the two events for the Item in question.

The warning indicates that the second event is the one that was rejected.

Things of note:

  1. The events took place eight seconds apart.
  2. Only two other changes occurred to other Items between those two Temperature Item changes.
  3. This is a reasonably powerful machine with a load of around 1 so it's unlikely that it's been waiting around for eight seconds waiting to save events.
  4. The timestamps in the warning message don't make sense. It's saying that the event that occurred at 23:11:09 actually occurred at 23:11:00 (this=1688965860). It's correct in the last update's time of 23:11:01 (last update = 1688965860).

This is pretty consistent for me. In another caseI have an warning for an Item that was changed almost exactly two seconds apart. In this case the events occurred at 23:00:05 and 23:00:07. It threw out the second one but the timestamp for that event is 23:00:05 (this=1688965205) and the last update is 23:00:07 (last update=1688965207).

Looking at more it seems like it's somehow swapping the timestamps for the previous event and the latter event and rejecting setting the data point because of that. Could there be a TOCTOU bug or maybe the variables are swapped or something like that?

I've reviewed about half a dozen of these and they seem to all follow this similar pattern. The update is rejected but the reported timestamp is wrong in the rejection message and considering rounding the timestamps are swapped (i.e. the "this" timestamp is for the previous event and the latest update timestamp is for the current event).

@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

Can you set org.openhab.persistence.rrd4j.internal to DEBUG? It should then print statements what is stored together with the timestamps.

@J-N-K J-N-K transferred this issue from openhab/openhab-core Jul 10, 2023
@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

Done and amazingly enough an example just happened immediately after changing the log level. :-D

openhab.log

20│2023-07-10 11:29:03.207 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'JennsOfficeSensors_Temperature' as value '71.85' with timestamp 1689010142 in rrd4j database                 │
20│2023-07-10 11:29:03.208 [WARN ] [d4j.internal.RRD4jPersistenceService] - RRD4J does not support adding past value this=1689010140, last update=1689010142. Discarding JennsOfficeSensors_Tempe│
20│rature - 72.03

relevant events.log

20│2023-07-10 11:26:07.073 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 71.67 °F to 71.85 °F                                              │
20│2023-07-10 11:29:00.585 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'JennsOfficeSensors_Temperature' changed from 71.85 °F to 72.03 °F                                              │

There's almost three minutes between the two events this time. Unfortunately I didn't turn on the logging before 11:29 so I don't have the logs from when that change to 71.85 occurred to tell me why it took so long to save. I'll continue to monitor the logs and post when I have something more complete.

Based on observations, I'm wondering if the problem is if a change occurs within a second of the everyMinute strategy save. That seems to be what happened in this particular case. I'll continue to monitor.

Edit: It happened again and indeed again it was within a second of the everyMinute dump of Item states.

@J-N-K J-N-K changed the title Persistence warnings [rrd4j] Persistence warnings Jul 10, 2023
@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

What persistence strategy do you have? is to "everyUpdate, everyMinute"?

@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

Yes.

Strategies {
        everyMinute  : "0 */1 * * * ?"
        default = everyChange
}
Items {
        * : strategy = everyMinute, everyChange
}

I use MapDB for restoreOnStartup so I created a .persist file. Based on the docs everyMinute/everyChange matches the default saving strategies which is why I used both. I wanted the default and to just disable restoreOnStartup.

@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

I guess that might be the issue (and I have no good idea how to solve that). When "everyChange" is triggered at second 0 of a minute, they may be scheduled with the same timestamp.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

they may be scheduled with the same timestamp.

Does rrd4j save the everyMinute as a single transaction? It kind of looks like that might be the case and it makes sense that it would but it's hard to tell.

If so, would it make sense to adjust the timestamp on events that come in during that transaction so it's not before the most recently saved record? At least with my warning entries and watching the debug logs it's looking like new events are blocked from being saved while the everyMinute is running. By the time the everyMinute is done the timestamp for the new event is now in the past.

I'm sure it's worthy of a debate, but it seems like it would be less wrong to save the value with an off by one or two seconds timestamp than it is to throw away the value entirely meaning it won't actually be saved to the database (assuming it doesn't change in the mean time) until one minute later.

@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

Behind the scenes there is a scheduled job in core that calls the persistence service for each item that has the time-based strategy defined. This happens very quickly (probably faster than the persistence service is able to store them, because that happens sequentially) and if a large number of items is persisted, each of them is scheduled for storage as a single job. If an update occurs, it is also scheduled for storage, most likely after all time-based. That may explain the difference in the timestamps, opening 100 files, checking them, writing the new data and closing them might take quite some time.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

That may explain the difference in the timestamps, opening 100 files, checking them, writing the new data and closing them might take quite some time.

Based on the logs, I can confirm that it takes about 2.5 seconds on my system to complete the everyMinute. It seems to take between 2 to 20 msec per Item. I've about 300 Items that gets saved to rrd4j. There are some 100+ msec gaps here and there too.

@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

This has always been the case, but the event handler was blocked until a write completed. This is why there were less (or no) collisions.

@J-N-K
Copy link
Member

J-N-K commented Jul 10, 2023

One solution would be to use the timestamp when the data is really written to the file instead of the timestamp when .store is called. That may create times that are a few seconds off, but I believe for RRDs this is not such a great issue as data is usually binned on larger scales.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 10, 2023

I personally would be happy with that. A few seconds this way or that is better than just short of a whole minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants