Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully implement and test projection filtering #2352

Merged
merged 4 commits into from
Sep 28, 2017
Merged

Conversation

sutartmelson
Copy link
Contributor

Fixes #2343

:type fields: `str, list of strings, or tuple of strings for fields to be included from the
document, or dict for an inclusion or exclusion projection`.
:param overwrite: Additional document key(s) to be included or not excluded in fields.
:type overwrite: `str, list of strings, or tuple of strings for fields to be included from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that other places in Girder are flexible like this, but IMO, it's burdensome to implement this sort of argument flexibility. str, list, and tuple have distinctive behavior and semantics, and there's only one that's actually appropriate here: list. I see nothing wrong with expecting overwrite to always be passed as a list.

if fields is None:
return fields

overwrite = list(overwrite)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, we're not mutating overwrite.

# If this is a list/tuple/set, that means inclusion
return True
# Inclusion projection (str, list, or tuple)
copy = list(fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the parent API allows a str, I suppose we need to support that case here. As-is, this will turn each of the characters in the string into a list element, so you need to specially check if fields is a string and place it in a list with [fields].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug wasn't caught because the tests don't ever pass a standalone str. Since we need to support that, we should test it at least once.


:param fields: A mask for filtering result documents by key, or None to return the full
document, passed to MongoDB find() as the `projection` param.
:type fields: `str, list of strings, or tuple of strings for fields to be included from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support fields being a list (which is correct), or a str (for legacy purposes), but a tuple makes no sense semantically (IMO, that's intended for structures where the ordinal position has semantic meaning). I'd cut all the references to tuple throughout this PR. It will still work, but there's no reason we need to explicitly document that.

copy = list(fields)
for entry in overwrite:
if entry not in copy:
copy.append(entry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be simpler (and theoretically more efficient) as:

copy = list( set(fields) | set(overwrite) )

if k not in whitelist:
del doc[k]
else:
fields = list(fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to make a copy of fields here (iterating through a tuple works the same, anyway). If it's a string, we would need to cast it to a list.

Also, let's add a test for the case when fields is a string.

del doc[k]
else:
fields = list(fields)
for k in list(doc.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to iterate over keys, it's most memory-efficient to do:

for k in six.viewkeys(doc):

in Girder, or:

for k in doc.viewkeys():

in Python2, or

for k in doc.keys()

in Python3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I was having is that the dictionary is changing since I'm deleting keys. So copying doc.keys() to a list solved that problem.

del doc[k]
else:
fields = list(fields)
for k in list(doc.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is now, it's got time complexity O(doc * fields). It's possible to get it down to O(fields * log(doc)), which for fields << doc would actually matter.

@@ -1267,24 +1338,16 @@ def load(self, id, level=AccessType.ADMIN, user=None, objectId=True,

# Ensure we include access and public, they are needed by requireAccess
loadFields = copy.copy(fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to copy here, if we're doing it inside _overwriteFields.

retval = _overwriteFields(exclusionProjDict, ['newValue'])
self.assertEqual(retval, exclusionProjDict)
retval = _overwriteFields(inclusionProjDict, ['newValue'])
self.assertEqual(retval, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test the fact that retval should not be mutated by _overwriteFields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retval is never passed to _overwriteFields? I'm not really sure what that would be testing against?

Or are you saying the first argument, in this case inclusionProjDict should not be mutated by _overwriteFields?

brianhelba
brianhelba previously approved these changes Sep 27, 2017
Copy link
Member

@brianhelba brianhelba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@brianhelba
Copy link
Member

@sutartmelson Actually, there are 2 uncovered lines in _isInclusionProjection. Would it be easy to add another test statement to cover these?

@sutartmelson
Copy link
Contributor Author

@brianhelba yup, I'm on it.

@brianhelba
Copy link
Member

@manthey Can you take a look?

@manthey
Copy link
Member

manthey commented Sep 27, 2017

It looks fine.

One petty comment: at least in early versions of python newdict = original.copy() was mildly faster than copying a dict via newdict = dict(original), so to my eyes seeing the constructor is less efficient (and less obvious) than using original.copy() or dict.copy(original). I haven't checked, so it is distinctly possible that they are internally the same at this point.

Copy link
Member

@brianhelba brianhelba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more petty issues, that I noticed while I was using this code in my own PR.

:param exc: If not found, throw a ValidationException instead of
returning None.
:type fields: list or dict
:param exc: If not found, throw a ValidationException instead of returning None.
:type exc: bool
:raises ValidationException: If an invalid ObjectId is passed.
:returns: The matching document, or None if no match exists.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to for an extra newline here.

else:
loadFields = list(set(loadFields) | {'access', 'public'})
loadFields = fields
overwriteFields = {'access', 'public'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put this definition inside the if not force: block.

@brianhelba
Copy link
Member

@sutartmelson This still LGTM. Feel free to fix the nits raised by myself and @manthey in an additional commit if you'd like (I'll reapprove), or just merge this yourself if you're happy as-is.

@brianhelba
Copy link
Member

@sutartmelson Nevermind, let's merge as-is. I'll implement my suggestions in a forthcoming PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants