-
-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"From each to each" QU conversion definitions makes QU conversion resolving take very long #2297
Comments
I really appreciate you looking into this :) I thought I was being smart by adding the QU Conversions, but evidently not! Does the unlimited conversion hierarchy mean that if there exists a QU Conversion for Cups to Millilitre and another from Millilitre to Litre, the system can now calculate Cups to Litre? If so, that's excellent, I can remove a load of my QU Conversions! |
Exactly that was the point about all that, so yes. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Here is a minimal reproducible example for this - it's basically an empty v4.0.1 database with
For the default QU conversions there is essentially a definition of "from each to each QU", so every QU has the "absolute" conversion to each other QU defined (not really each to each, but most of them). This wasn't a problem before the new recursive approach and (I guess) was used to have practically (kind of) transitive QU conversions without having more than 1 level (which was the limit before). A I've tried a couple of things but haven't had success so far in tracking down what the root cause here is. The WHERE clauses commented with Lines 55 to 67 in e2ebc03
The "each to each" QU conversion definition is no longer needed with the new transitive approach, so another option would be maybe to identify the unneeded ones and to automatically remove them (manually doing this fixes this). The question would be how to identify them. Pinging @esclear just for the case you have some time and desire to help looking into this again, the example database file is attached below. |
I think I'll have a chance to look at this tomorrow. |
I've tried to further isolate the problem based on the example database: It's something about those two units / their conversion factors:
Deleting those two units ( |
Okay, I just started looking into this. Also, this example only generated 73 rows for me, which I find interesting in that I had larger resulting tables when trying this out initially. Gotta do some debugging/benchmarking first. |
True, guess I've waited not long enough so far - terminated after 217 seconds just now on my machine. So the "Prevent cycles WHERE clauses" seem to work as expected though and it is about something else (my focus was on that part of the query when I've tried to narrow it down yesterday). |
Okay, so the cause is pretty clear to me, namely that a lot of redundant conversions are being generated (but they are discarded afterwards). I faintly remember preventing that in the recursive CTE using the |
Please don't invest too much time into this, if it's unsolvable for the current state of the query (which works beside that more than great), I would also have no problem with dropping all default QU conversions during the update (rolling that out for everybody as a migration). Practically nobody will define such "each to each conversions" again, since now there is no need for that. And for any unit list I can think of, it should be not that much work defining the needed factors again after upgrading from v3.x. |
So the issue is not really related to default QU conversions and could also happen with any other of the conversion types. The problem is that there is no way of way of knowing whether a particular conversion created by one of the recursive cases already exists. The obvious solution would be to check whether the conversion that would be created does already exist in the cases where the duplicate would be created. So yeah, as this could happen in basically any installation, I'd really like to fix this. |
Thanks a lot for your time looking into this and the explanations. I have to admit that I'm not so deep into recursive SQL CTEs at all, but I would say then somehow using a "conversion identifier" (let's take a string |
The identifier, as you have described it, would only prevent sqlite from reusing the conversion in another recursive iteration/join with this row. The problem is however, that this CTE is effectively an exhaustive search over all paths. This is way less of a problem, if all conversions form an acyclic graph. This could be solved for arbitrary data, e.g. by using an extension to sqlite, such as https://github.com/abetlen/sqlite3-bfsvtab-ext. Also, storing the conversions as a proper table and updating it using triggers might also be possible and allows for performance improvements. Another way would be to handle the calculations in PHP, where the search / calculation could be much more optimized. |
Which is what practically should be the normal case when not using the "each to each conversions approach" no longer needed, correct me if I'm wrong / if there is a simple example using a "natural form of definition" causing this. Of course not a solution, but I just want to say that it's practically maybe not a really relevant problem at the end. It definitely also happened not to everyone who upgraded so far (including me on my personal instance).
I already did this (noticed that a
Currently big parts of the general logic are done in SQL views/triggers/etc., as this was my initial main idea how to do it for this project. Everywhere throughout the PHP code kind of "low level" access to the database happens (using lessql). Means that there is currently no single PHP function like So without changing the base of everything, a PHP function calculating the resolved conversions / filling a If you want to give such a PHP function like |
Just to add: It's technically possible to call PHP functions from within SQLite queries, I already did this for one place where there was the need for taking a user setting into account in a view (code ref PHP, code ref SQL). (User settings are also stored in the database, but default settings from But the big downside of this is that it makes debugging "inside the database" very hard (since the by PHP registered user defined functions are just unknown when not "connecting through Grocy")... |
Hey guys, I have been following this discussion and I tried to optimize this query. I managed to bring it down to 50ms execution. I managed to do so by creating separate closures. One for product-specific conversions and one for default conversions. This results in tables that allow huge performance gains by pruning. Please have a look and see if you can spot any issues. I have tested on my DB ( The good thing is that the more user-defined conversions are present the more the query will prune. |
This isn't really what I had in mind. My idea is that basically whenever a conversion is added / updated / deleted, we could recompute the parts of the table that use the conversion. |
Thanks a lot @alkuzman! I've compared if the output of the view is the same as before on the default demo database, my personal one and the one @TurnrDev sent me => that's the case. I've tried if @TurnrDev's original private v3.x database passes the upgrade to v4.0.1 with the new view from @alkuzman without timeouts => that's the case. @TurnrDev: I will sent you this database shortly via mail, would be cool if you can give it a test drive if the conversion factors and such work practically how you expect it. @alkuzman: As already noticed above, a For that the qu_resolved_alkuzman_with_path.txt So from that, all looks great I would say - again thanks a lot! If nobody has any remarks, I would then merge that last state of the view, or you can of course also open a PR for it if you prefer that. But no need for a hurry on that all. |
This time it would be best if you merge the change since I haven't started the dev env (I cannot test) and just worked directly on the query. I would try to set up everything, but I guess it will take some time since I never worked with Php before. Just a quick question. Do you prefer forks and then PRs on this repo, or I can create a branch on this repo and then PR? |
That's the common normal way I guess, at least used here in the past - so that would be good. Database schema changes are no more than creating a new file ( But for this case, also existing migrations need to be touched to make the overall upgrade from v3.x work (no big thing at the end). I gladly can take care of that if everyone says that practically everything works (for me it did so far 👍). |
If anyone else (probably affected by that problem) is following this discussion: If you want to give the changes @alkuzman contributed a test drive (after making a backup if using your "production" instance!) - assuming you have a working v3.3.2 (or older) installation:
|
Hmm, I just had a look at the query in When playing around with some data, I think that I have found an issue with the conversions, which might also be present in the current query. |
Hi @esclear , I kind of expected that I am not covering all cases. If you have a case or general idea of what might be the problem, please keep me in the loop. I still haven't erased my whiteboard :). I will give it another try tomorrow. |
When I was trying to update from v3.3.2 to v4, I apparently have the same issue in my database. I adjusted the PHP Anyway, I tried new DB migration @berrnd referred to and I can mostly use v4.0.1 now. All the pages load acceptably fast, except Recipes and Meal Plan — I assume because there's many more products and QU conversions? I only have <12 recipes defined with ingredients, so there's not a lot to process there… With this new |
@mattmahn are you me? I've tried the same kinds of things by increasing the timeouts on both the php and nginx ends (using the Linux server image). I've also tried the grocy-docker image (https://github.com/grocy/grocy-docker) and had the same performance issues. Every page that uses units is much slower than before. I see some of the latest commits on main that cache the resolved unit conversions and am very exited for the next version. |
Thanks for the feedback! @mattmahn product specific conversions are not the problem here (there was another, unrelated to this problem, one which was sorted in v4.0.1), here it's about the default "each to each conversion definitions" (so the ones defined at unit level, not on product level). If the timeout problem is now gone for you, it was most likely about that - you should clean up the now unneeded "in-between" conversions, which should further improve performance (since the view in question just needs to do less). I also added that retrospectively, explained by simple practical example, to the v4.0.0 changelog (and will do that another time for the next release), in the hope it's now clear what the problem is about practically:
Maybe clearing out all default conversions and redefining them once is the even simpler/faster approach, since there are most likely not that much definitions needed now.
And also other expensive stock data calculations (like getting a product's current price, taking the "default consume rule" into account and such) are now cached (already part of v4.0.1). I've tried that on a massive database and the performance improvements were huge (with the little downside that editing/saving products and QU conversions or stock transactions now takes a little longer, since at that events the related cached data is recalculated/updated, but that's practically barely noticeable). |
Hey, I looked into the query again and after the changes made by @berrnd some duplicates were introduced. Here is the fixed version. qu_resolved_alkuzman_with_path_no_dups.txt After the comment from @esclear I also tried to review whether the query covers all the cases and I ended up writing an alternative query that joins both closures in a different way which is easier to prove that it works. But it is a bit slower. I get 90ms instead of 65ms. However, both queries return identical results, so I believe it should be correct. @esclear please let me know if you find something. |
Thanks again - I think you've shared the wrong file (?), the query is identical to your first one posted above. |
Sorry. I updated the comment with the correct file. And I optimized the alternative a bit more, with more pruning. Now it runs in around 70ms on my DB and my machine. Edit: the more QU conversions I add the gap between the queries increases in favor of the first query. |
Did another test round with the different databases used for this so far: Output is still the same on all, performance is slightly slower, as already mentioned, but still far away from being a disaster. If the alternative query covers more potential edge cases as you say, I think that's better (keeping also the new caching approach in mind, where the view itself only plays a role when changing products or QU conversions - all references needing resolved QU conversions will use the cached data (table |
I say they are identical, but I cannot prove it |
So one example where this query fails:
With the query, I get the following result:
Between those ten conversions, two are missing, namely the transitive conversions of The current view / query does calculate the correct result:
Here's a zipped version of this case in the database. |
Great work @esclear. I totally missed the case where multiple disjunctive graphs are present in the product specific conversions. I was just looking into disjunctive graphs in default conversions. Have you tried both versions of the query, and both don't work? I will not be able to revisit the query until tomorrow evening because I am traveling. But, I have a feeling that it should not be that hard to solve. |
Hey, I was thinking a bit during my flight and I realized that I made mistake doing the last closure with the default conversions instead with the product conversions (felt kind of stupid :) ). The fix now covers all the cases and works properly on all databases. As bonus it is the fastest query so far :). @esclear would be nice if you have a look (Thanks). |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Did another test with that, checking performance and output - all still looks good to me for the different test cases/database used here. Unless anyone has further remarks on that, I'm going to merge and release that the coming weekend. Thanks again for all the help on that topic. |
Before v4.0.0 that wasn’t a problem, since only 1 level of QU conversion hierarchy was supported (the rest was just ignored).
Now the levels / hierarchy are/is unlimited (and (!) such a "from each to each" definition approach is practically no longer needed), therefore that’s a problem – on resolving QU conversions the view
quantity_unit_conversions_resolved
tries to create then trillions of rows, which takes very long and produces a webserver timeout at the end (what maybe looks like that the upgrade / database migrations failed when having that during the upgrade from <= v3.3.2).So in conclusion a classic GIGO problem, normally nothing Grocy cares about, but let's somehow workaround that to at least let the upgrade to >= v4.0.0 not fail due to timeouts.
The text was updated successfully, but these errors were encountered: