Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"From each to each" QU conversion definitions makes QU conversion resolving take very long #2297

Closed
berrnd opened this issue Aug 6, 2023 · 38 comments
Milestone

Comments

@berrnd
Copy link
Member

berrnd commented Aug 6, 2023

Noticed in a database copy @TurnrDev sent me for debugging.
Also references: https://www.reddit.com/r/grocy/comments/15jq2qx/comment/jv2kx7x/?context=3

Before v4.0.0 that wasn’t a problem, since only 1 level of QU conversion hierarchy was supported (the rest was just ignored).

Now the levels / hierarchy are/is unlimited (and (!) such a "from each to each" definition approach is practically no longer needed), therefore that’s a problem – on resolving QU conversions the view quantity_unit_conversions_resolved tries to create then trillions of rows, which takes very long and produces a webserver timeout at the end (what maybe looks like that the upgrade / database migrations failed when having that during the upgrade from <= v3.3.2).

So in conclusion a classic GIGO problem, normally nothing Grocy cares about, but let's somehow workaround that to at least let the upgrade to >= v4.0.0 not fail due to timeouts.

@berrnd berrnd added the bug label Aug 6, 2023
@berrnd berrnd added this to the vNEXT milestone Aug 6, 2023
@berrnd berrnd added enhancement and removed bug labels Aug 6, 2023
@berrnd berrnd changed the title Bug: Circular QU conversion definitions makes QU conversion resolving take endlessly Circular QU conversion definitions makes QU conversion resolving take endlessly Aug 6, 2023
@TurnrDev
Copy link

TurnrDev commented Aug 6, 2023

I really appreciate you looking into this :) I thought I was being smart by adding the QU Conversions, but evidently not!

Does the unlimited conversion hierarchy mean that if there exists a QU Conversion for Cups to Millilitre and another from Millilitre to Litre, the system can now calculate Cups to Litre? If so, that's excellent, I can remove a load of my QU Conversions!

@berrnd
Copy link
Member Author

berrnd commented Aug 6, 2023

Does the unlimited conversion hierarchy mean that if there exists a QU Conversion for Cups to Millilitre and another from Millilitre to Litre, the system can now calculate Cups to Litre?

Exactly that was the point about all that, so yes.

@TurnrDev

This comment was marked as off-topic.

@berrnd

This comment was marked as off-topic.

@berrnd
Copy link
Member Author

berrnd commented Aug 7, 2023

Here is a minimal reproducible example for this - it's basically an empty v4.0.1 database with

  • a single product of @TurnrDev's database
  • all QUs and default QU conversions of @TurnrDev's database
  • and without any product specific QU conversions (since they are reproducibly not the problem here).

For the default QU conversions there is essentially a definition of "from each to each QU", so every QU has the "absolute" conversion to each other QU defined (not really each to each, but most of them). This wasn't a problem before the new recursive approach and (I guess) was used to have practically (kind of) transitive QU conversions without having more than 1 level (which was the limit before).

A SELECT * from quantity_unit_conversions_resolved runs indefinitely there.

I've tried a couple of things but haven't had success so far in tracking down what the root cause here is. The WHERE clauses commented with -- Prevent cycles in the recursive handling of the closure CTE should be pretty save I would say, however it's somehow not the case - example code ref:

-- First recursive case: Add a product-associated conversion to the chain
SELECT
c.depth + 1,
c.product_id,
c.from_qu_id,
s.to_qu_id,
c.factor * s.factor,
c.path || s.to_qu_id || '/'
FROM closure c
JOIN conversion_factors s
ON c.product_id = s.product_id
AND c.to_qu_id = s.from_qu_id
WHERE c.path NOT LIKE ('%/' || s.to_qu_id || '/%') -- Prevent cycles

The "each to each" QU conversion definition is no longer needed with the new transitive approach, so another option would be maybe to identify the unneeded ones and to automatically remove them (manually doing this fixes this). The question would be how to identify them.

Pinging @esclear just for the case you have some time and desire to help looking into this again, the example database file is attached below.

grocy_v4.0.1_#2297.zip

@esclear
Copy link
Contributor

esclear commented Aug 7, 2023

I think I'll have a chance to look at this tomorrow.

@berrnd
Copy link
Member Author

berrnd commented Aug 8, 2023

I've tried to further isolate the problem based on the example database:

It's something about those two units / their conversion factors:

SELECT *
FROM quantity_units
WHERE id IN (23, 25)

grafik

Deleting those two units (DELETE FROM quantity_units WHERE id IN (23, 25)) or all corresponding conversion factors (DELETE FROM quantity_unit_conversions WHERE from_qu_id IN (23, 25) OR to_qu_id IN (23, 25)) reproducibly prevents the infinite loop, resolving is then just a little slow (about 7000 ms for the single product on my development machine).

@esclear
Copy link
Contributor

esclear commented Aug 8, 2023

Okay, I just started looking into this.
My first observation: The query does actually terminate, after 128 s on my machine.
This is too long, no question, though I wanted to check that first, as I wrote it with guaranteed termination in mind.

Also, this example only generated 73 rows for me, which I find interesting in that I had larger resulting tables when trying this out initially.

Gotta do some debugging/benchmarking first.

@berrnd
Copy link
Member Author

berrnd commented Aug 8, 2023

The query does actually terminate

True, guess I've waited not long enough so far - terminated after 217 seconds just now on my machine. So the "Prevent cycles WHERE clauses" seem to work as expected though and it is about something else (my focus was on that part of the query when I've tried to narrow it down yesterday).

@esclear
Copy link
Contributor

esclear commented Aug 8, 2023

Okay, so the cause is pretty clear to me, namely that a lot of redundant conversions are being generated (but they are discarded afterwards).

I faintly remember preventing that in the recursive CTE using the depth column and/or a SELECT on the recursive CTE.
Though I think that due to some limitations, I was unable to do that correctly in sqlite. :/

@berrnd
Copy link
Member Author

berrnd commented Aug 8, 2023

Please don't invest too much time into this, if it's unsolvable for the current state of the query (which works beside that more than great), I would also have no problem with dropping all default QU conversions during the update (rolling that out for everybody as a migration).

Practically nobody will define such "each to each conversions" again, since now there is no need for that. And for any unit list I can think of, it should be not that much work defining the needed factors again after upgrading from v3.x.

@esclear
Copy link
Contributor

esclear commented Aug 8, 2023

So the issue is not really related to default QU conversions and could also happen with any other of the conversion types.

The problem is that there is no way of way of knowing whether a particular conversion created by one of the recursive cases already exists.
At least not one I can think of and which would be straightforward or one I was able to think of in the past.

The obvious solution would be to check whether the conversion that would be created does already exist in the cases where the duplicate would be created.
I would do that with another NOT EXISTS condition that checks whether closure already contains this conversion.
The problem here is that we'd have multiple recursive references within one CTE case then, which is not supported by sqlite.

So yeah, as this could happen in basically any installation, I'd really like to fix this.
I'll think about this a bit, maybe I'll be able to achieve this some other way.

@berrnd
Copy link
Member Author

berrnd commented Aug 8, 2023

Thanks a lot for your time looking into this and the explanations.

I have to admit that I'm not so deep into recursive SQL CTEs at all, but I would say then somehow using a "conversion identifier" (let's take a string <product_id>#<from_qu_id>-><to_qu_id>) and tracking/appending that (similar to path) in closure for each recursive case + another "WHERE NOT CONTAINS" (using INSTR) for closure on that should do the trick. But practically it does not (what I've tried is attached below).

query.sql.txt

@esclear
Copy link
Contributor

esclear commented Aug 8, 2023

The identifier, as you have described it, would only prevent sqlite from reusing the conversion in another recursive iteration/join with this row.
This is already handled by the path, however (although one might be a bit more coarse than the other).

The problem is however, that this CTE is effectively an exhaustive search over all paths.
And we can't "prune" redundant paths, due to limitations on recursive CTEs in sqlite.

This is way less of a problem, if all conversions form an acyclic graph.
(The problem is not that we can have cycles, but that we have multiple ways of reaching the same conversion, which can cause an exponential growth in the number of rows the query produces.)

This could be solved for arbitrary data, e.g. by using an extension to sqlite, such as https://github.com/abetlen/sqlite3-bfsvtab-ext.
Though this is platform-dependent, I think, and thus not trivially cross-platform / the users would need to install the plugin.

Also, storing the conversions as a proper table and updating it using triggers might also be possible and allows for performance improvements.
The only possible downside I see here is that this table might be somewhat large for a large number of products.

Another way would be to handle the calculations in PHP, where the search / calculation could be much more optimized.
I didn't do this originally, because writing this query was the least effort.
But I'd gladly have a shot at implementing this on the PHP side, though I'd need some pointers regarding where this should go ideally.
If the view is used in other sql queries, this would require another solution for those.

@berrnd
Copy link
Member Author

berrnd commented Aug 9, 2023

This is way less of a problem, if all conversions form an acyclic graph.

Which is what practically should be the normal case when not using the "each to each conversions approach" no longer needed, correct me if I'm wrong / if there is a simple example using a "natural form of definition" causing this. Of course not a solution, but I just want to say that it's practically maybe not a really relevant problem at the end. It definitely also happened not to everyone who upgraded so far (including me on my personal instance).

 

Also, storing the conversions as a proper table and updating it using triggers might also be possible and allows for performance improvements.

I already did this (noticed that a SELECT * on the view is rather fast, but joining on it from other queries make things slow), but only by reusing / querying the view and updating the corresponding resolved conversions when a product or conversion definition was added/changed/deleted (code ref). But since it only reuses the current view, it doesn't change anything for the problem. (The involved triggers are disabled/modified in the example database provided above due to then not being able to modify conversions or products.)

 

But I'd gladly have a shot at implementing this on the PHP side, though I'd need some pointers regarding where this should go ideally.

Currently big parts of the general logic are done in SQL views/triggers/etc., as this was my initial main idea how to do it for this project. Everywhere throughout the PHP code kind of "low level" access to the database happens (using lessql). Means that there is currently no single PHP function like GetConversionsForProduct(int $productId) and all other places are using that function then. It's rather the case that there are other views (and also PHP code e.g. here) using the quantity_unit_conversions_resolved view.

So without changing the base of everything, a PHP function calculating the resolved conversions / filling a quantity_unit_conversions_resolved table would be kind of easily possible. Any object add/edit/delete action happens through those API endpoints, which are generic ones for all entities/database tables, but the corresponding actions needed for (re)calculation/updating the resolved conversions could be traced there (PHP code ref of the endpoints).

If you want to give such a PHP function like UpdateResolvedConversionsForProduct(int $productId) (and maybe another one is needed handling the add/change of an unit) a shot, I can gladly take care of firing it up where needed.

 
And of course not using SQLite any longer would be another option. I've always said that unless a use case where it doesn't fullfil the needs arises, I won't do that for Grocy, well ... but in the meantime this project is kind of married to SQLite in terms of having it integrated deeply and also relying on SQLite specific syntax/functions here and there...

@berrnd berrnd changed the title Circular QU conversion definitions makes QU conversion resolving take endlessly "From each to each" QU conversion definitions makes QU conversion resolving take endlessly Aug 9, 2023
@berrnd berrnd changed the title "From each to each" QU conversion definitions makes QU conversion resolving take endlessly "From each to each" QU conversion definitions makes QU conversion resolving take very long Aug 9, 2023
@berrnd
Copy link
Member Author

berrnd commented Aug 9, 2023

If the view is used in other sql queries, this would require another solution for those.

Just to add: It's technically possible to call PHP functions from within SQLite queries, I already did this for one place where there was the need for taking a user setting into account in a view (code ref PHP, code ref SQL). (User settings are also stored in the database, but default settings from config.php may apply when the setting was not yet modified/set by the user - that was the need for requiring a bridge to PHP there).

But the big downside of this is that it makes debugging "inside the database" very hard (since the by PHP registered user defined functions are just unknown when not "connecting through Grocy")...

@alkuzman
Copy link
Contributor

alkuzman commented Aug 9, 2023

Hey guys, I have been following this discussion and I tried to optimize this query. I managed to bring it down to 50ms execution. I managed to do so by creating separate closures. One for product-specific conversions and one for default conversions. This results in tables that allow huge performance gains by pruning.

Please have a look and see if you can spot any issues. I have tested on my DB (Result: 3536 rows returned in 62ms) and the DB shared by @berrnd (in this issue Result: 73 rows returned in 38ms).

The good thing is that the more user-defined conversions are present the more the query will prune.

new_query.txt

@esclear
Copy link
Contributor

esclear commented Aug 9, 2023

Also, storing the conversions as a proper table and updating it using triggers might also be possible and allows for performance improvements.

I already did this (noticed that a SELECT * on the view is rather fast, but joining on it from other queries make things slow), but only by reusing / querying the view and updating the corresponding resolved conversions when a product or conversion definition was added/changed/deleted (code ref). But since it only reuses the current view, it doesn't change anything for the problem. (The involved triggers are disabled/modified in the example database provided above due to then not being able to modify conversions or products.)

This isn't really what I had in mind. My idea is that basically whenever a conversion is added / updated / deleted, we could recompute the parts of the table that use the conversion.
So we'd have the "total" conversion table available as a database table at all times and a recalculation would only happen on an update and only for the conversions affected.

@berrnd
Copy link
Member Author

berrnd commented Aug 9, 2023

Thanks a lot @alkuzman!

I've compared if the output of the view is the same as before on the default demo database, my personal one and the one @TurnrDev sent me => that's the case.

I've tried if @TurnrDev's original private v3.x database passes the upgrade to v4.0.1 with the new view from @alkuzman without timeouts => that's the case.

@TurnrDev: I will sent you this database shortly via mail, would be cool if you can give it a test drive if the conversion factors and such work practically how you expect it.

@alkuzman: As already noticed above, a SELECT * from the view is quite fast, but joining on it is, however, slightly slower (especially noticable e.g. for the recipes_pos_resolved view and having a lot recipes / meal plan data). So I'd like to keep the new caching approach introduced in 1d7f7b2, which speeds this and more up by a lot.

For that the path was used to update the corresponding derived conversions when conversion definition changes (trigger SQL ref if you want to have a look if this is save to cover all changes). I've added that column again for the final output of the view, here is the updated statement:

qu_resolved_alkuzman_with_path.txt

So from that, all looks great I would say - again thanks a lot!

If nobody has any remarks, I would then merge that last state of the view, or you can of course also open a PR for it if you prefer that. But no need for a hurry on that all.

@alkuzman
Copy link
Contributor

alkuzman commented Aug 9, 2023

This time it would be best if you merge the change since I haven't started the dev env (I cannot test) and just worked directly on the query. I would try to set up everything, but I guess it will take some time since I never worked with Php before.

Just a quick question. Do you prefer forks and then PRs on this repo, or I can create a branch on this repo and then PR?

@berrnd
Copy link
Member Author

berrnd commented Aug 9, 2023

Do you prefer forks and then PRs on this repo

That's the common normal way I guess, at least used here in the past - so that would be good.

Database schema changes are no more than creating a new file (<NewNumber.sql>) in the migrations folder, just like the others already there.

But for this case, also existing migrations need to be touched to make the overall upgrade from v3.x work (no big thing at the end). I gladly can take care of that if everyone says that practically everything works (for me it did so far 👍).

@berrnd
Copy link
Member Author

berrnd commented Aug 9, 2023

If anyone else (probably affected by that problem) is following this discussion:

If you want to give the changes @alkuzman contributed a test drive (after making a backup if using your "production" instance!) - assuming you have a working v3.3.2 (or older) installation:

migrations.zip

@esclear
Copy link
Contributor

esclear commented Aug 9, 2023

Hmm, I just had a look at the query in qu_resolved_alkuzman_with_path.txt and this is very nice, @alkuzman.
I tried something similar before, but that didn't quite pan out.

When playing around with some data, I think that I have found an issue with the conversions, which might also be present in the current query.
I hope that I'll be able to have a closer look at this in the coming days and check this with proper test data which also includes cases like the one I'm currently thinking about.

@alkuzman
Copy link
Contributor

alkuzman commented Aug 9, 2023

Hi @esclear , I kind of expected that I am not covering all cases. If you have a case or general idea of what might be the problem, please keep me in the loop. I still haven't erased my whiteboard :). I will give it another try tomorrow.

@mattmahn
Copy link

When I was trying to update from v3.3.2 to v4, I apparently have the same issue in my database. I adjusted the PHP max_execution_time to 10 minutes, and the migrations / page loads never finished; 1 CPU core was pegged at 100% and RAM & disk usage slowly increased. Remembering this new unlimited hierarchy feature, I went through all my products to remove any redundant mass-mass or volume-volume QU conversions. There was marginal improvement.
Since I was still having issues on v4, I guess that means I have cyclical default QU conversions defined.

Anyway, I tried new DB migration @berrnd referred to and I can mostly use v4.0.1 now. All the pages load acceptably fast, except Recipes and Meal Plan — I assume because there's many more products and QU conversions? I only have <12 recipes defined with ingredients, so there's not a lot to process there…

With this new 0208.sql, the resources are consumed much more rapidly, so that's nice in a something-is-very-wrong sort of way 😅

@TheCDC
Copy link

TheCDC commented Aug 10, 2023

@mattmahn are you me? I've tried the same kinds of things by increasing the timeouts on both the php and nginx ends (using the Linux server image). I've also tried the grocy-docker image (https://github.com/grocy/grocy-docker) and had the same performance issues.

Every page that uses units is much slower than before.

I see some of the latest commits on main that cache the resolved unit conversions and am very exited for the next version.

@berrnd
Copy link
Member Author

berrnd commented Aug 10, 2023

Thanks for the feedback!

@mattmahn product specific conversions are not the problem here (there was another, unrelated to this problem, one which was sorted in v4.0.1), here it's about the default "each to each conversion definitions" (so the ones defined at unit level, not on product level).

If the timeout problem is now gone for you, it was most likely about that - you should clean up the now unneeded "in-between" conversions, which should further improve performance (since the view in question just needs to do less). I also added that retrospectively, explained by simple practical example, to the v4.0.0 changelog (and will do that another time for the next release), in the hope it's now clear what the problem is about practically:

  • Explained by a practical example: When a conversion between Teaspoons and Milliliters and another one between Milliliters and Liters exists (and so forth; unlimited levels), Grocy can now calculate Teaspoons to Liters (before a direct conversion definition between Teaspoons and Liters was required)
  • Heads up: If you have such "each to each absolute conversion definitions" currently (for the example above the conversion between Teaspoons and Liters), you should clean them up, since they are no longer needed

Maybe clearing out all default conversions and redefining them once is the even simpler/faster approach, since there are most likely not that much definitions needed now.

 

I see some of the latest commits on main that cache the resolved unit conversions

And also other expensive stock data calculations (like getting a product's current price, taking the "default consume rule" into account and such) are now cached (already part of v4.0.1). I've tried that on a massive database and the performance improvements were huge (with the little downside that editing/saving products and QU conversions or stock transactions now takes a little longer, since at that events the related cached data is recalculated/updated, but that's practically barely noticeable).

@alkuzman
Copy link
Contributor

alkuzman commented Aug 10, 2023

Hey, I looked into the query again and after the changes made by @berrnd some duplicates were introduced. Here is the fixed version.

qu_resolved_alkuzman_with_path_no_dups.txt

After the comment from @esclear I also tried to review whether the query covers all the cases and I ended up writing an alternative query that joins both closures in a different way which is easier to prove that it works. But it is a bit slower. I get 90ms instead of 65ms. However, both queries return identical results, so I believe it should be correct. @esclear please let me know if you find something.

qu_resolved_alkuzman_alternative.txt

@berrnd
Copy link
Member Author

berrnd commented Aug 10, 2023

qu_resolved_alkuzman_alternative_query.txt

Thanks again - I think you've shared the wrong file (?), the query is identical to your first one posted above.

@alkuzman
Copy link
Contributor

alkuzman commented Aug 10, 2023

Sorry. I updated the comment with the correct file. And I optimized the alternative a bit more, with more pruning. Now it runs in around 70ms on my DB and my machine.

Edit: the more QU conversions I add the gap between the queries increases in favor of the first query.

@berrnd
Copy link
Member Author

berrnd commented Aug 10, 2023

qu_resolved_alkuzman_alternative.txt

Did another test round with the different databases used for this so far: Output is still the same on all, performance is slightly slower, as already mentioned, but still far away from being a disaster.

If the alternative query covers more potential edge cases as you say, I think that's better (keeping also the new caching approach in mind, where the view itself only plays a role when changing products or QU conversions - all references needing resolved QU conversions will use the cached data (table cache__quantity_unit_conversions_resolved) which should be as fast as a simple / properly indexed table can be).

@alkuzman
Copy link
Contributor

If the alternative query covers more potential edge cases as you say

I say they are identical, but I cannot prove it

@esclear
Copy link
Contributor

esclear commented Aug 10, 2023

So one example where this query fails:
Consider the units A, B, C, and D with a default conversion B -> C (with a factor of 3), and product specific conversions A -> B (factor 2) and C -> D (factor 5).

A  --2-->  B  ~~3~~>  C  --5-->  D

With the query, I get the following result:

id product_id from_qu_id from_qu_name from_qu_name_plural to_qu_id to_qu_name to_qu_name_plural factor path
-1 1 1 A As 2 B Bs 2.0 /1/2/
-1 1 1 A As 3 C Cs 6.0 /1/2/3/
-1 1 2 B Bs 1 A As 0.5 /2/1/
-1 1 2 B Bs 3 C Cs 3.0 /2/3/
-1 1 2 B Bs 4 D Ds 15.0 /2/3/4/
-1 1 3 C Cs 1 A As 0.166666666666667 /3/2/1/
-1 1 3 C Cs 2 B Bs 0.333333333333333 /3/2/
-1 1 3 C Cs 4 D Ds 5.0 /3/4/
-1 1 4 D Ds 2 B Bs 0.0666666666666667 /4/3/2/
-1 1 4 D Ds 3 C Cs 0.2 /4/3/

Between those ten conversions, two are missing, namely the transitive conversions of A <----> D with factors of 30 and 1/30 respectively.

The current view / query does calculate the correct result:

id product_id from_qu_id from_qu_name from_qu_name_plural to_qu_id to_qu_name to_qu_name_plural factor path
-1 1 1 A As 2 B Bs 2.0 /1/2/
-1 1 1 A As 3 C Cs 6.0 /1/2/3/
-1 1 1 A As 4 D Ds 30.0 /1/2/3/4/
-1 1 2 B Bs 1 A As 0.5 /2/1/
-1 1 2 B Bs 3 C Cs 3.0 /2/3/
-1 1 2 B Bs 4 D Ds 15.0 /2/3/4/
-1 1 3 C Cs 1 A As 0.166666666666667 /3/2/1/
-1 1 3 C Cs 2 B Bs 0.333333333333333 /3/2/
-1 1 3 C Cs 4 D Ds 5.0 /3/4/
-1 1 4 D Ds 1 A As 0.0333333333333333 /4/3/2/1/
-1 1 4 D Ds 2 B Bs 0.0666666666666667 /4/3/2/
-1 1 4 D Ds 3 C Cs 0.2 /4/3/

Here's a zipped version of this case in the database.

@alkuzman
Copy link
Contributor

alkuzman commented Aug 11, 2023

Great work @esclear. I totally missed the case where multiple disjunctive graphs are present in the product specific conversions. I was just looking into disjunctive graphs in default conversions.

Have you tried both versions of the query, and both don't work?

I will not be able to revisit the query until tomorrow evening because I am traveling. But, I have a feeling that it should not be that hard to solve.

@alkuzman
Copy link
Contributor

Hey, I was thinking a bit during my flight and I realized that I made mistake doing the last closure with the default conversions instead with the product conversions (felt kind of stupid :) ). The fix now covers all the cases and works properly on all databases. As bonus it is the fastest query so far :).

@esclear would be nice if you have a look (Thanks).
qu_esclear_usecase_fixed.txt

@aptx4869

This comment was marked as resolved.

@berrnd

This comment was marked as resolved.

@berrnd
Copy link
Member Author

berrnd commented Aug 15, 2023

qu_esclear_usecase_fixed.txt

Did another test with that, checking performance and output - all still looks good to me for the different test cases/database used here.

Unless anyone has further remarks on that, I'm going to merge and release that the coming weekend. Thanks again for all the help on that topic.

@berrnd berrnd closed this as completed in 9b119da Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants