Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replicate legacy behavior for using cache with queries #613

Merged

Conversation

chrisrossi
Copy link
Contributor

Corrects the issue pointed out in #586, that we weren't doing the right
thing for deleted entities. Also corrects an issue noticed while fixing
that, where the cache wasn't being updated with entities from queries.
Behavior should now match legacy in both these regards.

Fixes #586

Corrects the issue pointed out in googleapis#586, that we weren't doing the right
thing for deleted entities. Also corrects an issue noticed while fixing
that, where the cache wasn't being updated with entities from queries.
Behavior should now match legacy in both these regards.

Fixes googleapis#586
@chrisrossi chrisrossi requested a review from a team February 22, 2021 23:29
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Feb 22, 2021
@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Feb 22, 2021
@@ -379,6 +379,18 @@ def _next_batch(self):
for result_pb in response.batch.entity_results
]

if result_type == RESULT_TYPE_FULL:
# If we cached a delete, remove it from the result set. This may come cause
# some queries to return less than their limit even if there are more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an underlying condition of Datastore as well. 👍

Returns:
Any: The NDB entity for this result, if it is cached, otherwise
`_KEY_NOT_IN_CACHE`. May also return `None` if entity was deleted which
will cause `None` to be recorded in the cache.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the None return true of legacy as well then? Seems a bit odd but likely not problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true of legacy as well.

@crwilcox crwilcox added the automerge Merge the pull request once unit tests and other checks pass. label Mar 23, 2021
@gcf-merge-on-green gcf-merge-on-green bot merged commit edd1185 into googleapis:master Mar 23, 2021
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 23, 2021
rwhogg added a commit to rwhogg/python-ndb that referenced this pull request Feb 22, 2023
In certain circumstances, we were not respecting use_cache
for queries, unlike legacy NDB, which is quite emphatic
about supporting them.
(See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187)

In googleapis#613 we tried to match legacy NDB behavior by updating the cache
using the results of queries. We still do that, but now we respect
use_cache, which was a valid keyword argument for Query.fetch()
and friends, but was not passed down to the context cache when
needed.

As a result, the cache could mysteriously accumulate lots of memory
usage and perhaps even cause you to hit memory limits, even if it was
seemingly disabled and it didn't look like there were any objects
holding references to your query results.
This is a problem for certain batch-style workloads where you know
you're only interested in processing a certain entity once.

Fixes googleapis#752
rwhogg added a commit to rwhogg/python-ndb that referenced this pull request Feb 22, 2023
In certain circumstances, we were not respecting use_cache
for queries, unlike legacy NDB, which is quite emphatic
about supporting them.
(See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187)

In googleapis#613 we tried to match legacy NDB behavior by updating the cache
using the results of queries. We still do that, but now we respect
use_cache, which was a valid keyword argument for Query.fetch()
and friends, but was not passed down to the context cache when
needed.

As a result, the cache could mysteriously accumulate lots of memory
usage and perhaps even cause you to hit memory limits, even if it was
seemingly disabled and it didn't look like there were any objects
holding references to your query results.
This is a problem for certain batch-style workloads where you know
you're only interested in processing a certain entity once.

Fixes googleapis#752
rwhogg added a commit to rwhogg/python-ndb that referenced this pull request Feb 22, 2023
In certain circumstances, we were not respecting use_cache
for queries, unlike legacy NDB, which is quite emphatic
about supporting them.
(See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187)

In googleapis#613 we tried to match legacy NDB behavior by updating the cache
using the results of queries. We still do that, but now we respect
use_cache, which was a valid keyword argument for Query.fetch()
and friends, but was not passed down to the context cache when
needed.

As a result, the cache could mysteriously accumulate lots of memory
usage and perhaps even cause you to hit memory limits, even if it was
seemingly disabled and it didn't look like there were any objects
holding references to your query results.
This is a problem for certain batch-style workloads where you know
you're only interested in processing a certain entity once.

Fixes googleapis#752
rwhogg added a commit that referenced this pull request Feb 27, 2023
In certain circumstances, we were not respecting use_cache
for queries, unlike legacy NDB, which is quite emphatic
about supporting them.
(See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187)

In #613 we tried to match legacy NDB behavior by updating the cache
using the results of queries. We still do that, but now we respect
use_cache, which was a valid keyword argument for Query.fetch()
and friends, but was not passed down to the context cache when
needed.

As a result, the cache could mysteriously accumulate lots of memory
usage and perhaps even cause you to hit memory limits, even if it was
seemingly disabled and it didn't look like there were any objects
holding references to your query results.
This is a problem for certain batch-style workloads where you know
you're only interested in processing a certain entity once.

Fixes #752
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NDB Queries may return None values as results due to eventual consistency
2 participants