Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rrd4j] Persistence stops working after some time #15237

Closed
Flole998 opened this issue Jul 6, 2023 · 15 comments
Closed

[rrd4j] Persistence stops working after some time #15237

Flole998 opened this issue Jul 6, 2023 · 15 comments
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@Flole998
Copy link
Member

Flole998 commented Jul 6, 2023

This appears to be a regression in the latest Milestone.

After some time (it seems like it is every full hour) charts aren't loading anymore. When I restart openHAB there's a flat line that begins for example at 15:00 in the charts, so I believe no data was written since then, even though the values changed later aswell (I can see that in the event log). That indicates, that the RRD4J persistance service I am using apparently completely stopped working, it can't be queried anymore and doesn't write data anymore.

My persistence config looks like this:

Strategies {
	// for rrd charts, we need a cron strategy
	everyTenSeconds : "0/10 * * * * ?"
	everyMinute : "0 * * * * ?"
	everyHour   : "0 0 * * * ?"
	everyDay    : "0 0 0 * * ?"
}
Items {
	PersistenceLog*		: strategy = everyTenSeconds, everyMinute, everyHour, everyDay, restoreOnStartup
	RRDPersistence*		: strategy = everyTenSeconds, everyMinute, everyHour, everyDay, restoreOnStartup
}

Could it be that everyMinute and everyHour try to update at the same time, causing this issue? It was working fine on Milestone 3, so I believe this should be adressed before the release.

@Flole998 Flole998 added the bug An unexpected problem or unintended behavior of an add-on label Jul 6, 2023
@J-N-K
Copy link
Member

J-N-K commented Jul 6, 2023

I'm not sure. Does it only happen with a special persistence service or for all persistence services? And can you explain, why you would like to apply all those strategies? At 00:00 the same value is persisted four times, what would be the use-case for that?

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

I am only using RRD4J and MapDB, so I can't speak for the other services. I ramped up the debug level now, in 30 minutes we should know more about what's going on. I might be able to ramp up the debug logging for mapdb aswell to check if that one stops working aswell or if this is a RRD4J-Issue.

I tried changing the strategies to only "everyTenSeconds, restoreOnStartup" to rule this out but then I am getting an "invalid header" message after a restart from the rrd4j persistence service, I assume this somehow defines the format of the RRD4J file? The config was originally done for openHAB 1.x and I have carried it through since then, and I believe back then it was mentioned in the docs or in the forum, I know for sure that I copied it from somewhere.

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

So it does apparently attempt to add it twice:

RRD4J does not support adding past value this=1688655600, last update=1688655601. Discarding TestItem - 100.0

But then things magically stop. If I attempt to load a chart I can see the "Querying persistence" message but the chart never loads.

Apparently it is still working to some extend as I am still seeing a message Ignoring item 'Weather_Condition' since its type String is not supported every 10 seconds.

MapDB is still working according to the logs, so this is probably at the wrong place and should be moved to openhab-addons as it seems to affect rrd4j only.

Actually I think I know what's wrong: In

the database is opened, in there's a return without ever closing the database again. So this could lead to the database beeing left open and if it is "locked" it will get stuck on the next attempt to open it. Most likely this issue is at the wrong place then as it affects openHAB-addons (more specifically the rrd4j-addon), so please go ahead and move it.

As I don't have an IDE setup at the moment, could you please add the change so the DB is closed properly @J-N-K ? Thanks a lot!

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

There's probaly another case where things can go wrong, with the same cause: In https://github.com/openhab/openhab-addons/blob/07e640387c1fe5dcea53add4d5edac788b182200/bundles/org.openhab.persistence.rrd4j/src/main/java/org/openhab/persistence/rrd4j/internal/RRD4jPersistenceService.java#L260C23-L260C23 the function calls itself, but without closing the DB first. So it will get stuck there aswell.

@J-N-K
Copy link
Member

J-N-K commented Jul 6, 2023

Especially the last one looks like a good candidate. Since I never managed to reproduce the issue in my setup, it is a bit hard to properly fix it. I'll have a look.

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

I'd just properly close the database before calling return (leaking it isn't a good idea anyways), and probably add a parameter to the function to pass an already opened database for the second case (should be a performance benefit aswell: Keeping it open vs. closing and opening it again). If you submit a PR there's an automated build I believe so I could drop in the bundle in my addons dir and see if it's fixed, right?

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

I believe this could even be tested by a unit test: Call internalStore with the same parameters twice and then again with an increased timestamp and in that case data must be added to the DB. If it isn't, this bug has occurred.

@Flole998
Copy link
Member Author

Flole998 commented Jul 6, 2023

What I previously wrote is only somewhat complete: Another issue is, that by not calling close() there is a file descriptor leaked every time apparently. I have verified that using lsof. So we should really be sure to call close() when we are done. Otherwise this will cause issues in the long term that we won't be able to see when testing a milestone release for only a few days/weeks. I had a case where the same item was in 2 groups which each had the same persistence strategy (this will cause the same issue) and since I started openHAB like 4 hours ago I have now collected 648 open file descriptors for that single rrd file.

And an additional test: After one of those "double-saves" every hour I have a total of 144000 RRD files open.

@J-N-K
Copy link
Member

J-N-K commented Jul 14, 2023

Can you check with latest code? That should improve the situation for RRD4J.

I don't think this is a general issue, I'll transfer it to add-ons and label it as RRD4J

@J-N-K J-N-K transferred this issue from openhab/openhab-core Jul 14, 2023
@J-N-K J-N-K changed the title Persistence stops working after some time [rrd4j] Persistence stops working after some time Jul 14, 2023
@Flole998
Copy link
Member Author

I can check it once the new build is available. I have fixed my configuration now to avoid those double-saves which also fixed the issue mostly.

@Bitrider11
Copy link

I think I have the same or equal problem. After update from openHAB 4.0.0.M4 to 4.0.0.M5 Persistence rrd4j stops working. It looks like, that there is a problem with the database of the rrd4j datafile. Header is corrupt or there is an incompatibility with the database file.

Error message when openHAB starts:
2023-07-19 20:47:54.027 [ERROR] [d4j.internal.RRD4jPersistenceService] - Could not create rrd4j database file '/var/lib/openhab/persistence/rrd4j/WC_Day4_Night_TemperatureWindChill.rrd': Invalid file header. File [/var/lib/openhab/persistence/rrd4j/WC_Day4_Night_TemperatureWindChill.rrd] is not a RRD4J RRD file

@jlaur
Copy link
Contributor

jlaur commented Jul 19, 2023

I also had a problem with invalid headers in all .rrd files shortly after installing 4.0M5. Unfortunately I destroyed all evidence by deleting them all, and it hasn't reappeared. I thought I might had cut the power from the Raspberry Pi before completing a shutdown, but thinking again, probably this was not the case.

@Flole998
Copy link
Member Author

That's not the same issue I was having. I caused the issue by modifying the persistence strategy in an attempt to prevent double (or triple) persisting every hour and I think the rrd4j file format is based on that, so it was complaining about the structure no longer being correct.

The "real" issue I was having was, that rrd4j files were opened again and again leaking file descriptors in certain cases until openHAB eventually ran out of available file descriptors/hit the limit allowed. That appears to be resolved now, so this can be closed.

@mhett
Copy link

mhett commented Oct 23, 2023

In which openhab version this issue (missing close() commands) has been fixed / will be fixed?

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/all-persistance-stopped-saving-data-suddenly-mapdb-not-active/150379/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

No branches or pull requests

6 participants