New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dereference #21
Fix dereference #21
Conversation
If reference is not found while dereferencing it should set to None.
- Remove ReferenceField._is_instance propery as it is useless (field is shared between model instances). - Rewrite dereferencing to fix some issues: - Dereference unhashable ids (still auto dereferencing does not work for this case). - Set reference field value to None if there is no document with such id (as auto dereferencing does). - Fix bug when some fields reference to different collections but have the same id. So that only one of that field will be set with correct value as the document_map was {id --> document}. - Dereference DBRefs in dict-like fields.
There are also some thoughts about unhashable ids:
class User(MongoModel):
name = fields.CharField(primary_key=True)
class Comment(EmbeddedMongoModel):
body = fields.CharField()
user = fields.ReferenceField(User)
class Post(MongoModel):
comments = fields.EmbeddedDocumentListField(Comment)
user = User(name='bob').save()
post = Post(comments=[
Comment('comment1',user=user),
Comment('comment2',user=user),
Comment('comment3',user=user),
Comment('comment4',user=user)]).save()
# if we dereference such post
# a reference_map for 'user' collection will be:
# {'user': ['bob', 'bob', 'bob', 'bob']} and db request will be something like:
# db.user.find({_id: {$in:['bob', 'bob', 'bob', 'bob']})
# and we can't use ``set`` instead of ``list`` as id could be unhashable... |
pymodm/dereference.py
Outdated
@@ -91,13 +95,16 @@ def _find_references(model_instance, reference_map, fields=None): | |||
|
|||
|
|||
def _resolve_references(database, reference_map): | |||
document_map = _ObjectMap() | |||
document_map = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be cleaner with a defaultdict(_ObjectMap)
.
test/test_dereference.py
Outdated
# check auto_dereferncing | ||
# hand.refresh_from_db() | ||
# self.assertIsInstance(hand.cards[0], Card) | ||
# self.assertIsInstance(hand.cards[1], Card) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed.
test/test_dereference.py
Outdated
comment.refresh_from_db() | ||
with no_auto_dereference(comment): | ||
dereference(comment) | ||
self.assertIsNone(comment.post) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to show that the id for the deleted object is still accessible if accessed without automatic dereferencing.
Hello again! Thank you for bringing these issues to our attention, and thanks again for fixing them! I propose that we hold off on scanning all dictionaries for DBRefs for now. It would probably be better to implement a DBRefField or GenericReferenceField that stores its references as DBRefs, so that we can refine our scan to only those fields that are likely to contain DBRefs instead of scanning all dictionaries. I think I know what needs to happen to make auto dereferencing work with unhashable ids:
As for the issue with reference_map, are you referring to the duplicate ids within the Thanks again for the pull request! Please let me know if I can help you in any way. |
As it would be better to implement a DBRefField or GenericReferenceField that stores its references as DBRefs.
Remove unnecessary _get_value and _set_value functions as container is always a model_instance._data which is a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work. Thanks for helping out.
pymodm/fields.py
Outdated
# if value is a dict convert in to model | ||
# so we can properly generate SON | ||
if isinstance(value, dict): | ||
return related_model.from_document(value).to_son() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that SON is going to add over a dict is a deterministic key ordering. If it's not SON to begin with, then we've already lost that, so there's no point in trying to convert it to SON here.
Probably the cases for this should be:
- Check if instance of self.related_model
- Check if instance of dict
- Raise ValidationError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right if it is a dict we can't use it in mongodb as {'a': 1, 'b': 2} and {'b': 2, 'a': 1} are different values for mongodb. So it checks:
- Is it a SON value (it has been converted already and we do not need to convert it again).
- Is it an instance.
- Is it a dict - then we try to create a model instance and then create a son (as model should always generate SON with the same keys order, doesn't it?)
So to_mongo will always generate SON object with the same keys order. If it is needed to check if SON is correctly converted to Model Instance too then we can combine dict and SON under one branch but again with converting to model first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. to_son
iterates the fields of the model class and outputs keys in a deterministic order, as you noted. That makes sense now. Carry on!
pymodm/fields.py
Outdated
|
||
# TODO: Should we raise a ValidationError as we could not | ||
# convert value to SON? | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Maybe something along the lines of:
raise ValidationError('%s is not a valid %s' % (value, self.related_model.__name__))
pymodm/fields.py
Outdated
def to_mongo(self, value): | ||
return value.to_son() | ||
return EmbeddedDocumentField._value_to_mongo(self.related_model, | ||
value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to move _value_to_mongo
to RelatedModelFieldsBase as an instance/classmethod and call it something like _model_to_document
, to make it clearer what it's doing. Then, we can call it here like self._model_to_document(...)
.
pymodm/fields.py
Outdated
@@ -1106,7 +1125,8 @@ def to_python(self, value): | |||
for item in value] | |||
|
|||
def to_mongo(self, value): | |||
return [doc.to_son() for doc in value] | |||
return [EmbeddedDocumentField._value_to_mongo(self.related_model, doc) | |||
for doc in value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to call this method in another class that isn't in this class' hierarchy. See above comment for a suggestion on how to clear this up.
pymodm/fields.py
Outdated
elif isinstance(value, self.related_model): | ||
return value | ||
try: | ||
doc = self.related_model.from_document(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just return
here and avoid setting doc = value
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can but that requires to change from_document
. And maybe that is the right way.
Here
Line 265 in ce603f5
cls_name = dct.get('_cls') |
_cls
field in value we create a cls
from that field and it does not check if the cls
is subclass of current cls
so we can do something like this without any exception:
post = User.from_document({'title': 'I am a Post', '_cls': 'Post'})
assert isinstance(post, Post) == True
Without these changes we should check if doc
is instance of related_model
or not. It could be a reference id in form of Document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For example, value
could be a dict that represents an embedded document, which is also the primary key. That makes sense now. Thank you for your explanation. I prefer it this way to changing from_document
.
pymodm/fields.py
Outdated
return value | ||
try: | ||
doc = self.related_model.from_document(value) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only ValueError
is possible here, from this line:
Line 248 in ce603f5
raise ValueError( |
ValidationError
s should be triggered by this. Are there other cases I'm not thinking of?
pymodm/fields.py
Outdated
elif self.model._mongometa._auto_dereference: | ||
# Attempt to dereference the value as an id. | ||
dereference_id = _import('pymodm.dereference.dereference_id') | ||
return dereference_id(self.related_model, value) | ||
|
||
# TODO: should we convert value to mongo here? | ||
# return self.related_model._mongometa.pk.to_mongo(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We should do that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change this test
pymodm/test/test_related_fields.py
Line 60 in ce603f5
Comment(post=1234).full_clean() |
start to fail cause it try to convert 1234 into ObjectId. As 1234 is not a valid value is that ok that full_clean does not raise ValidationError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of that test is to show that you can assign the id of a referenced document directly and it will pass validation. However, the test value "1234" is not valid. My opinions:
-
Convert using "to_python" (not to_mongo) here, which could yield a different result. Since we're asking for the python value, and not the mongo one, that's what's expected here.
-
The test is using a bad test value. We should change it to a valid ObjectId hex string like
"58b477046e32ab215dca2b57"
. -
I think there's a separate problem with validation happening in ReferenceField. Really, the test that fails with your changes should already be failing. We should just use
to_mongo
as our validation function, like we do with Decimal128Field. We don't actually need the separatevalidate_related_model
function, since checking that the pk is defined is already done into_mongo
. -
I notice that the error message strings from
validate_related_model
andto_mongo
are slightly different, so that may cause some tests to fail. Those tests should be updated to reflect the new error message.
@@ -150,6 +150,9 @@ class CardIdentity(EmbeddedMongoModel): | |||
suit = fields.IntegerField( | |||
choices=(HEARTS, DIAMONDS, SPADES, CLUBS)) | |||
|
|||
class Meta: | |||
final = final_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
pymodm/fields.py
Outdated
@@ -1213,6 +1240,4 @@ def __get__(self, inst, owner): | |||
return self | |||
|
|||
def __set__(self, inst, value): | |||
MongoModel = _import('pymodm.base.models.MongoModel') | |||
super(ReferenceField, self).__set__(inst, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't doing anything anymore, we can just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanations.
pymodm/fields.py
Outdated
# if value is a dict convert in to model | ||
# so we can properly generate SON | ||
if isinstance(value, dict): | ||
return related_model.from_document(value).to_son() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. to_son
iterates the fields of the model class and outputs keys in a deterministic order, as you noted. That makes sense now. Carry on!
pymodm/fields.py
Outdated
elif isinstance(value, self.related_model): | ||
return value | ||
try: | ||
doc = self.related_model.from_document(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For example, value
could be a dict that represents an embedded document, which is also the primary key. That makes sense now. Thank you for your explanation. I prefer it this way to changing from_document
.
Use to_mongo instead of separate function.
Merged! Many thanks for the hard work you put into this. These changes will be in the forthcoming 0.3.1 release. |
What do these changes do:
ReferenceField._is_instance
property as it is useless (field isshared between model instances).
this case as reference id is treated as dereferenced document not as an id).
None
if there is no document with suchid (as auto dereferencing does).
the same id. So that only one of that field will be set with correct
value as the document_map was {id --> document}. Example:
is within dict-like fields (
DictField
,OrderedDictField
). So dereference traverses such containers to findDBRef
values and try to dererference them.