New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default table query cache accidently (?) updated after repeated searches #132

Closed
pylipp opened this Issue Apr 9, 2017 · 2 comments

Comments

2 participants
@pylipp

pylipp commented Apr 9, 2017

In a program I have a database containing two tables: the default one and an "extra" one. When querying the tables one by one in a repeated manner, I noticed that the default table's query cache is accidently updated. Consider the following script:

from tinydb import TinyDB, where

db = TinyDB("db.json")

db.table("extra").insert_multiple(
        [{"name": "foo", "value": 42}, {"name": "bar", "value": -1337}])

def search_default_table_then_rest(query_impl, *table_names):
    elements = db.search(query_impl)
    for table_name in table_names:
        elements.extend(db.table(table_name).search(query_impl))
    return elements

def search_all_tables(query_impl, *table_names):
    elements = []
    for table_name in db.tables():
        elements.extend(db.table(table_name).search(query_impl))
    return elements

# construct a simple query
greater_than = (where("value") > 0)
# define the search function (exchange with search_all_tables)
search = search_default_table_then_rest

# correctly yields one element as result
first_search = search(greater_than, "extra")
print("1st search: " + str(first_search))

# using search_default_table_then_rest:
#     query cache of default table is updated, two elements are returned!
# using search_all_tables:
#     query cache of default table is empty, one element is returned
print(db._query_cache) # forwarded to default table
print(db.table("extra")._query_cache)
second_search = search(greater_than, "extra")
print("2nd search: " + str(second_search))

# clear query caches, result is fine again
for table_name in db.tables():
    db.table(table_name).clear_cache()
third_search = search(greater_than, "extra")
print("3rd search: " + str(third_search))

# purge to avoid polluting the current directory when executing script repeatedly
db.purge_tables()

I have defined two search functions for querying the database (t.i. finding entries in both tables).

Using search_default_table_then_rest prints the output:

1st search: [{'name': 'foo', 'value': 42}]
{QueryImpl('>', ('value',), 0): [{'name': 'foo', 'value': 42}]}
{QueryImpl('>', ('value',), 0): [{'name': 'foo', 'value': 42}]} 
2nd search: [{'name': 'foo', 'value': 42}, {'name': 'foo', 'value': 42}]
3rd search: [{'name': 'foo', 'value': 42}]

Apparently, the default table's query cache is updated with elements of the extra table after the first search, hence yielding a duplicate result in the second search. Clearing the cache remedies the issue.

However, using search_all_tables (please substitute in the script) gives the correct output:

1st search: [{'value': 42, 'name': 'foo'}]
{QueryImpl('>', ('value',), 0): []}
{QueryImpl('>', ('value',), 0): [{'value': 42, 'name': 'foo'}]}
2nd search: [{'value': 42, 'name': 'foo'}]
3rd search: [{'value': 42, 'name': 'foo'}]

I was wondering whether this behaviour is intended. Obviously the issue can be avoided by selecting a function along search_all_tables, yet it could be a bit of a trap.

I'm using tinydb v3.2.1 (Python3).

Sidenote: I encountered this because my program design contains a search function that is supposed to always search the default table but search the extra table only if it was defined.

pylipp added a commit to pylipp/financeager that referenced this issue Apr 10, 2017

Workaround a tinydb bug.
- default table query cache accidently updates (see
msiemens/tinydb#132)

@msiemens msiemens closed this in a1fa290 Apr 12, 2017

@msiemens

This comment has been minimized.

Show comment
Hide comment
@msiemens

msiemens Apr 12, 2017

Owner

Hey, thanks for reporting this. I've looked into this and here's what I found: When a query is cached, TinyDB returns the list of the cached results:

        if cond in self._query_cache:
            return self._query_cache[cond]

Now, as default with Python, we return a reference to the cache list. When your code calls elements.extend(...), it operates on the cache list. Basically, you add stuff to the query cache. If you copy the results first (e.g. by using elements = db.search(query_impl)[:]), it works as expected.

I'd say that TinyDB should handle this case better as modifying the results list of a search shouldn't mess up the query cache. The fix is fairly simple (see a1fa290), we now just copy the result list before returning it. This should fix the problem.

Owner

msiemens commented Apr 12, 2017

Hey, thanks for reporting this. I've looked into this and here's what I found: When a query is cached, TinyDB returns the list of the cached results:

        if cond in self._query_cache:
            return self._query_cache[cond]

Now, as default with Python, we return a reference to the cache list. When your code calls elements.extend(...), it operates on the cache list. Basically, you add stuff to the query cache. If you copy the results first (e.g. by using elements = db.search(query_impl)[:]), it works as expected.

I'd say that TinyDB should handle this case better as modifying the results list of a search shouldn't mess up the query cache. The fix is fairly simple (see a1fa290), we now just copy the result list before returning it. This should fix the problem.

@pylipp

This comment has been minimized.

Show comment
Hide comment
@pylipp

pylipp Apr 13, 2017

Thank you for the fix and the explanation! Makes perfect sense (also I should know by now that it's likely to be references on list if something gives me a headache in Python ;)).

pylipp commented Apr 13, 2017

Thank you for the fix and the explanation! Makes perfect sense (also I should know by now that it's likely to be references on list if something gives me a headache in Python ;)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment