New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to fetch the user categories for a book to proxy ... #316
Conversation
…data. Add a formatter function to use it.
It's been a while since I looked at this code, so this is mostly of the top off my head, but it seems that the least disruptive, would be to have user_categories_for_book() check if the field is composite itself, and if it is, call get_value_with_cache with a single PM instance common to all such calls, and otherwise use fast_field_for() |
…oxy metadata. Add a formatter function to use it." This reverts commit 29e29fd.
My logic for the above preventing recursion is that composite_getter() in db.lazy adds the current field into cache with a dummy value before calling the formatter, so it should not call user_cagtegories_for_books again, but I may be wrong. |
…mns in user_categories_for_books().
It works if I pass in the proxy_metadata object. It doesn't work if I create the proxy_metadata object in user_categories_for_books(). The reason is that the cache that composite_getter is using is not the cache of the new PM object, so it evaluates the template again. The good thing about your suggestion is that no other API in cache.py is touched. I ended up adding the current PM object as a fourth parameter to all the getters. This adds a "push" of a single object instance onto the stack at call time, which is faster than any test to see if the getter is user_categories_getter. |
I refactored your patch to:
Please check that everything still works. |
Regarding (2) above, I agree that adding an extra param to all getters Also, I'm not quite clear as to why constructing the PM object in EDIT: To expand on my confusion, if constructing the PM object in user_categories_for_books() does not work, it would imply that this sequence, works:
but this one does not
Since the first thing user_categories_for_books() does is construct the PM object, that doesn't make sense to me. |
Two issues here. First, constructing the PM object in user_categories_for_books(). This does not work because the caches are local to the PM instance. Consider the following sequence:
Infinite recursion. There is no termination condition. Passing the pm instance prevents the recursion because that forces using a common cache between the various processing objects. So yes, proxy_metadata_map must not be optional. Regarding performance: an if requires 2 to 4 instructions (load, cmp, branch), where the instructions can be complex if the comparison is something like a string. A parameter requires a single instruction (push). Thus a parameter is faster if the if test is done for 50% of the accesses. Given your changes, I suspect that most accesses are of fields in SIMPLE_GETTER, so the if will be frequently executed. I think it would be better to remove "user_categories" from SIMPLE_GETTER and do the test before fetching field metadata. I will make that change so you can see what I mean. No problem if you use the change or not. |
On Fri, Oct 24, 2014 at 11:14:35PM -0700, Charles Haley wrote:
The part that confuses me, is how are these two calls different: pm = db.get_proxy_metadata() and db.ucfb() The latter constructs a new PM object at the start. So if the latter is In other words, why does the use case of calling ucfb() from inside a |
Currently it works like neither of your examples. Backing up a bit, we see that the formatter is handed a pm instance obtained from somewhere. It uses that pm instance to fetch user_categories. PM.user_categories_getter does not create a new pm instance but instead passes itself to db.ucfb. In the course of processing, db.ucfb will use this same pm instance to evaluate any composite columns, meaning that the cache in the pm object contains data, stopping the recursion. In effect we have this sequence:
It works because the pm is created outside the recursion. It will infinitely recurse if a new PM instance is created and used anywhere inside the recursion cycle. Your examples create such an instance, so neither of them will work. |
BTW: the cached value returned by the formatter is probably the recursive composite field text, but I haven't bothered to check that. |
Maybe I'm just being dense, but The call chain involving the template function is
At stage (4), the PM instance passed into ucfb() is exactly the same Therefore, if instead, at this point no pm instance had been passed in, To put it another way, The template function has to pass in a PM instance, that much is clear. But do other, non-recursive users of this API have to also pass in a PM |
In other words, can the following call ever lead to infinite recursion: db.user_categories_for_books(book_id) given that the user_categories() template function is written so that it |
It isn't true that the PM instance is exactly the same as a new one. Looking at pm.composite_getter() we see that the cache for the current pm instance is filled in with
I am not sure what you mean by "other, non-recursive users". Regarding your second comment "can the following call ever lead to infinite recursion?", the answer is yes. If a user category makes reference to a composite column that uses the user_categories() template function then there will be infinite recursion unless the pm instance is passed along the chain. This happens because the check for recursive composite columns is defeated by creating a new pm instance. |
On Sat, Oct 25, 2014 at 12:42:30AM -0700, Charles Haley wrote:
That I can buy, but if that is the explanation for why it does not pm = db.proxy_metadata(book_id) And that means this API effectively can only be used in a very special And if that is the case, I dont think this API should be exposed in the Makes sense? |
Thinking about it, it cant even be used in the PM object, since db.get_proxy_metadata(book_id).user_categories can recurse infinitely. Some I guess it will have to be used only in the template function and the PM object will need to be changed back to returning an empty user_categories object. |
Your example will not infinitely recurse. db.ucfb(pm) will attempt to evaluate the composite column. Further down we discover that the column has never been evaluated, which will recursively call db.ucfb(pm) where pm is the original pm. At this point the cache has been prepared and the new call to db.ucfb will not recurse again. I have no problem with marking db.ucfb as internal. Re last comment: I don't see why your example will infinitely recurse. The new PM instance will be passed to db.ucfb. Another new PM instance will not be created, so the cache will work. |
Because, your explanation for why it does not recurse when called from the template function is that composite_getter() populates the cache db.get_proxy_metadata(book_id).user_categories never calls composite_getter |
Yes it does, assuming that the composite column is involved in the user category. field.get_value_with_cache() calls mi.get(colkey) which will call composite_getter. |
On Sat, Oct 25, 2014 at 01:07:51AM -0700, Charles Haley wrote:
But it does not call it at stage (4) of my function call trace above. Anyway, I think this discussion will be better served with an example db.ucfb(book_id) |
This program runs fine. import math cache = init_cache(library_path = 'd:/cbh_data/calibre.git/library.test_small') for id_ in cache.all_book_ids(): |
On Sat, Oct 25, 2014 at 01:12:20AM -0700, Charles Haley wrote:
And does for id_ in cache.all_book_ids(): fail? |
I have a metadata.db that fails if ucfb() creates a new PM instance instead of using the one passed in as an argument. I don't see a way to attach files here so I will email it separately. |
On Sat, Oct 25, 2014 at 01:33:29AM -0700, Charles Haley wrote:
That will help, thanks. That will allow me to investigate the cause an |
Oh and note, my claim is that db.ucfb(book_id) will not recurse infintely. Not that db.ucfb(book_id, pm) will not recurse infinitely even if ucfb() ignores pm. I agree that the latter will recurse infinitely. |
How are the two different? If pm is not passed as an argument then ucfb must create an instance. If ucfb ignores the argument then it must create an instance. |
On Sat, Oct 25, 2014 at 01:42:32AM -0700, Charles Haley wrote:
Because, in one case subsequent calls to ucfb() from evaluating the With the metadata.db you sent me, I see no infinite recursions with the calibre-debug -c "from calibre.library import db; db = db('/t').new_api; import pprint; pprint.pprint(db.user_categories_for_books(db.all_book_ids()))" This is with the code from my master branch. If I change ucfb() to ignore proxy_metadata_map, I get an infinite Therefore, I conclude that proxy_metadata_map can indeed be optional. |
Re: "And does for id_ in cache.all_book_ids(): fail?" I think I see at least some of what you are saying. The short answer is "no, it doesn't fail". The longer answer is that it doesn't fail because it only creates a new PMM if the one passed in is None, which will never happen except if ucfb() is called by someone other than proxy_metadata. Re your latest comment: I think we are in agreement. |
On Sat, Oct 25, 2014 at 01:52:24AM -0700, Charles Haley wrote:
So are you OK with the code as is in master, or is there anything you |
I changed the latest pull request to contain only the changes to the formatter function. Those changes should be merged. |
...metadata. Add a formatter function to use it.
@kovid: I am not completely happy with this implementation. The changes to cache.py are required to avoid infinite recursion. The method user_categories_for_book calls field_for() which calls composite_for(). The composite_for method created a new ProxyMetadata object and then attempts to evaluate the composite column in question. If that column was the one with a template calling user_categories_for_book, it recurses because the new ProxyMetadata object doesn't see the original's cache.
As you can see, I put a stop to the recursion by passing in the instance of proxy_metadata for the field_for method to use. I thing that makes me nervous with this approach is that it changes a fundamental API. I considered trying to use more basic APIs in user_categories_for_book, but that has the drawbacks of code cloning and being more difficult.
If you have an alternate idea to avoid the recursion, I am all ears.
BTW: the fact that composite_for creates a new ProxyMetadata instance raises the question of whether the caches are as effective as they should/could be. It seems that the value will be cached twice and then discarded once.