Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recursive type checking in PynamoDB #601

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

chensjlv
Copy link

@chensjlv chensjlv commented Mar 13, 2019

Currently, MapAttribute (even for non-raw MapAttribute) and Model use different serialization and type check logic, and that creates a lot of unexpected behavior especially when it comes to nested data structures such as nested ListAttribute and MapAttribute. This commit proposes changes to move serialization and type check to AttributeContainer, and replace or add that logic to MapAttribute and Model to make the behavior consistent between the two.

Changes

  • Move serialization and type check to AttributeContainer, so
    • MapAttribute and Model share the same serialization and type check logic
  • Add new serialization to MapAttribute, while still keeps the old one for empty MapAttribute
  • Replace Model serialization and type check with the new serialization

Old behavior

  • MapAttribute is only type checked on the root level
  • Nested MapAttribute and ListAttribute don't have type correctly checked

New behavior

  • Every Attribute is recursively checked on serialization, even when it is a list of dict (ListAttributes with of parameter assigned)

Optional TODO for next step

  1. Give an option to not recursively check for performance or backward-compatibility because the old type checking is less strict

Runtime Test

Interestingly, the old ListAttribute example doesn't even work, but now it does.
https://pynamodb.readthedocs.io/en/latest/attributes.html#list-attributes

In [1]: from pynamodb.attributes import ListAttribute, NumberAttribute, MapAttribute, UnicodeAttribute
      : from pynamodb.models import Model
      :
      : class OfficeEmployeeMap(MapAttribute):
      :     office_employee_id = NumberAttribute()
      :     person = UnicodeAttribute()
      :
      : class Office(Model):
      :     class Meta:
      :         table_name = 'OfficeModel'
      :         region='us-west-2'
      :     office_id = NumberAttribute(hash_key=True)
      :     employees = ListAttribute(of=OfficeEmployeeMap)
      :
      : # Example usage:
      : emp1 = OfficeEmployeeMap(
      :     office_employee_id=123,
      :     person='justin'
      : )
      :
      : emp2 = OfficeEmployeeMap(
      :     office_employee_id=125,
      :     person='lita'
      : )
      :
      : emp4 = OfficeEmployeeMap(
      :     office_employee_id=126,
      : )
      :
      : o = Office(
      :     office_id=3,
      :     employees=[emp1, emp2, emp4]
      : )

In [2]: o.save()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-1e08ac7410ac> in <module>
----> 1 o.save()

~/Github/PynamoDB/pynamodb/models.py in save(self, condition, conditional_operator, **expected_values)
    450         """
    451         self._conditional_operator_check(conditional_operator)
--> 452         args, kwargs = self._get_save_args()
    453         if len(expected_values):
    454             kwargs.update(expected=self._build_expected_values(expected_values, PUT_FILTER_OPERATOR_MAP))

~/Github/PynamoDB/pynamodb/models.py in _get_save_args(self, attributes, null_check)
   1189         """
   1190         kwargs = {}
-> 1191         serialized = self._serialize(null_check=null_check)
   1192         hash_key = serialized.get(HASH)
   1193         range_key = serialized.get(RANGE, None)

~/Github/PynamoDB/pynamodb/models.py in _serialize(self, attr_map, null_check)
   1320         attributes = pythonic(ATTRIBUTES)
   1321         # use the new recursive type check and serialization
-> 1322         serialized_container = self._serialize_container(self, null_check)
   1323         attrs = {attributes: {}}
   1324

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, null_check)
    305                 serialized = cls._serialize_container(value, null_check)
    306             else:
--> 307                 serialized = cls._serialize_value(attr, value, null_check)
    308             if NULL in serialized:
    309                 continue

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_value(attr, value, null_check)
    284             serialized = None
    285         else:
--> 286             serialized = attr.serialize(value)
    287
    288         if serialized is None:

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    940             else:
    941                 attr_key = _get_key_for_serialize(v)
--> 942             rval.append({attr_key: attr_class.serialize(v)})
    943         return rval
    944

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    797         # if not raw MapAttribute, use the new recursive type check and serialization
    798         if not self.is_raw():
--> 799             rval = self._serialize_container(values, null_check=True)
    800         else:
    801             rval = {}

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, null_check)
    305                 serialized = cls._serialize_container(value, null_check)
    306             else:
--> 307                 serialized = cls._serialize_value(attr, value, null_check)
    308             if NULL in serialized:
    309                 continue

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_value(attr, value, null_check)
    288         if serialized is None:
    289             if not attr.null and null_check:
--> 290                 raise ValueError("Attribute '{0}' cannot be None".format(attr.attr_name))
    291             return {NULL: True}
    292

ValueError: Attribute 'person' cannot be None

Assigned as a list of dicts

In [1]: from pynamodb.attributes import ListAttribute, NumberAttribute, MapAttribute, UnicodeAttribute
      : from pynamodb.models import Model
      :
      : class OfficeEmployeeMap(MapAttribute):
      :     office_employee_id = NumberAttribute()
      :     person = UnicodeAttribute()
      :
      : class Office(Model):
      :     class Meta:
      :         table_name = 'OfficeModel'
      :         region='us-west-2'
      :     office_id = NumberAttribute(hash_key=True)
      :     employees = ListAttribute(of=OfficeEmployeeMap)
      :     test = ListAttribute(of=OfficeEmployeeMap)
      :
      : o=Office(**{'office_id':1, 'employees':[{
      :                'office_employee_id':123,
      :                'person': 'justin'
      :            }]})
      :
      : o.test = [
      :         {
      :             'office_employee_id':123,
      :             'person': 'justin'
      :         },
      :         {
      :             'office_employee_id':125,
      :             'person': 'lita'
      :
      :         },
      :         {
      :             'office_employee_id':126,
      :             # shoud fail here
      :         }
      :     ]
      :
      : o.save()

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-1-d31c1d3a7264> in <module>
     35     ]
     36
---> 37 o.save()

~/Github/PynamoDB/pynamodb/models.py in save(self, condition, conditional_operator, **expected_values)
    450         """
    451         self._conditional_operator_check(conditional_operator)
--> 452         args, kwargs = self._get_save_args()
    453         if len(expected_values):
    454             kwargs.update(expected=self._build_expected_values(expected_values, PUT_FILTER_OPERATOR_MAP))

~/Github/PynamoDB/pynamodb/models.py in _get_save_args(self, attributes, null_check)
   1189         """
   1190         kwargs = {}
-> 1191         serialized = self._serialize(null_check=null_check)
   1192         hash_key = serialized.get(HASH)
   1193         range_key = serialized.get(RANGE, None)

~/Github/PynamoDB/pynamodb/models.py in _serialize(self, attr_map, null_check)
   1319         """
   1320         attributes = pythonic(ATTRIBUTES)
-> 1321         serialized_container = self._serialize_container(type(self), self, null_check)
   1322         attrs = {attributes: {}}
   1323

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, values, null_check)
    308                 serialized = {MAP_SHORT: cls._serialize_container(type(value), value, null_check)}
    309             else:
--> 310                 serialized = cls._serialize_value(attr, value, null_check)
    311             if NULL not in serialized:
    312                 attrs[name] = serialized

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_value(attr, value, null_check)
    284             serialized = None
    285         else:
--> 286             serialized = attr.serialize(value)
    287
    288         if serialized is None:

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    940             else:
    941                 attr_key = _get_key_for_serialize(v)
--> 942             rval.append({attr_key: attr_class.serialize(v)})
    943         return rval
    944

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    798     def serialize(self, values):
    799         if not self.is_raw():
--> 800             return self._serialize_container(type(self), values, null_check=True)
    801
    802         rval = {}

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, values, null_check)
    304                 value = getattr(values, name)
    305             else:
--> 306                 value = values[name]
    307             if isinstance(value, MapAttribute) and type(value) is not MapAttribute:
    308                 serialized = {MAP_SHORT: cls._serialize_container(type(value), value, null_check)}

KeyError: 'person'

Normal case

In [3]: from pynamodb.attributes import ListAttribute, NumberAttribute, MapAttribute, UnicodeAttribute
      : from pynamodb.models import Model
      :
      : class OfficeEmployeeMap(MapAttribute):
      :     office_employee_id = NumberAttribute()
      :     person = UnicodeAttribute()
      :
      : class Office(Model):
      :     class Meta:
      :         table_name = 'OfficeModel'
      :         region='us-west-2'
      :     office_id = NumberAttribute(hash_key=True)
      :     employees = ListAttribute(of=OfficeEmployeeMap)
      :
      : # Example usage:
      : emp1 = OfficeEmployeeMap(
      :     office_employee_id=123,
      :     person='justin'
      : )
      :
      : emp2 = OfficeEmployeeMap(
      :     office_employee_id=125,
      :     person='lita'
      : )
      :
      : emp4 = OfficeEmployeeMap(
      :     office_employee_id=126,
      :     person='garrett'
      : )
      :
      : o = Office(
      :     office_id=3,
      :     employees=[emp1, emp2, emp4]
      : )

In [4]: o.save()
Out[4]: {'ConsumedCapacity': {'CapacityUnits': 1.0, 'TableName': 'OfficeModel'}}

- MapAttribute and Model share the same serialization and type check logic
- Add new serialization to MapAttribute, while still keeps the old one
- Replace Model serialization and type check with the new serialization
attrs = {}
for name, attr in attr_container.get_attributes().items():
value = getattr(attr_container, name)
if isinstance(value, MapAttribute) and type(value) != MapAttribute:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= -> is not?

@classmethod
def _serialize_container(cls, attr_container, null_check=True):
"""
Retrive attribute container recursively and do null check at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrieve

@@ -1318,17 +1318,12 @@ def _serialize(self, attr_map=False, null_check=True):
:param null_check: If True, then attributes are checked for null
"""
attributes = pythonic(ATTRIBUTES)
# use the new recursive type check and serialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - this comment will become obsolete once we merge

serialized = cls._serialize_container(value, null_check)
else:
serialized = cls._serialize_value(attr, value, null_check)
if NULL in serialized:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - I know it's been even worse in the original code, but we can do:

if NULL not in serialized:
    attrs[name] = serialized

continue

rval[attr_name] = {attr_key: serialized}
# if not raw MapAttribute, use the new recursive type check and serialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re "new" - comment will be obsolete once merged

rval[attr_name] = {attr_key: serialized}
# if not raw MapAttribute, use the new recursive type check and serialization
if not self.is_raw():
rval = self._serialize_container(values, null_check=True)
Copy link
Contributor

@ikonst ikonst Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - rval = —> return and then we can save on indent for the next lines

@@ -145,7 +145,6 @@ class MapAttribute(Generic[_KT, _VT], Attribute[Mapping[_KT, _VT]], metaclass=Ma
@overload
def __get__(self: _MT, instance: Any, owner: Any) -> _MT: ...
def is_type_safe(self, key: Any, value: Any) -> bool: ...
def validate(self) -> bool: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we changing API contract here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, what do you suggest here? recursive check as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it because the old same level check is no longer used, but can definitely add it back if it changes the behavior too much for certain users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate and is_type_safe do not start with an underscore, so they're part of the API. It's hard to say how much they're used in practice, but we should probably bump the major version if we remove them. Is it possible to separate that change from what you're doing? Is the previous way, where this method was used during serialization, not suitable anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably have been made private from the beginning, but agreed that we don't want to break the API contract. We can mark these for removal in 5.0, but let's maintain their behavior for now (even if we move the implementation elsewhere)

@chensjlv
Copy link
Author

thanks for the review, gonna update.

@ikonst
Copy link
Contributor

ikonst commented Mar 13, 2019

ok, I didn't do a functional review yet, just nitpicking :)

@chensjlv
Copy link
Author

chensjlv commented Mar 13, 2019

np, this is a big change. Just figure out I need to support ListAttribute's input as a list of dict as well.

@chensjlv
Copy link
Author

Going to sleep. We can sync tomorrow.

- it is not about the cls, it is the values
@chensjlv
Copy link
Author

chensjlv commented Mar 13, 2019

Just add the change to make it work for LIstAttribute of raw MapAttributes inputed as list of dict.
Tested on my side with a giant nested model. It is 5-level deep with some crazy nested ListAttibutes and MapAttributes, caught all the type error for non-raw MapAttributes.

Now it works on list of dicts as well

In [1]: from pynamodb.attributes import ListAttribute, NumberAttribute, MapAttribute, UnicodeAttribute
      : from pynamodb.models import Model
      :
      : class OfficeEmployeeMap(MapAttribute):
      :     office_employee_id = NumberAttribute()
      :     person = UnicodeAttribute()
      :
      : class Office(Model):
      :     class Meta:
      :         table_name = 'OfficeModel'
      :         region='us-west-2'
      :     office_id = NumberAttribute(hash_key=True)
      :     employees = ListAttribute(of=OfficeEmployeeMap)
      :     test = ListAttribute(of=OfficeEmployeeMap)
      :
      : o=Office(**{'office_id':1, 'employees':[{
      :                'office_employee_id':123,
      :                'person': 'justin'
      :            }]})
      :
      : o.test = [
      :         {
      :             'office_employee_id':123,
      :             'person': 'justin'
      :         },
      :         {
      :             'office_employee_id':125,
      :             'person': 'lita'
      :
      :         },
      :         {
      :             'office_employee_id':126,
      :             # shoud fail here
      :         }
      :     ]
      :
      : o.save()

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-1-d31c1d3a7264> in <module>
     35     ]
     36
---> 37 o.save()

~/Github/PynamoDB/pynamodb/models.py in save(self, condition, conditional_operator, **expected_values)
    450         """
    451         self._conditional_operator_check(conditional_operator)
--> 452         args, kwargs = self._get_save_args()
    453         if len(expected_values):
    454             kwargs.update(expected=self._build_expected_values(expected_values, PUT_FILTER_OPERATOR_MAP))

~/Github/PynamoDB/pynamodb/models.py in _get_save_args(self, attributes, null_check)
   1189         """
   1190         kwargs = {}
-> 1191         serialized = self._serialize(null_check=null_check)
   1192         hash_key = serialized.get(HASH)
   1193         range_key = serialized.get(RANGE, None)

~/Github/PynamoDB/pynamodb/models.py in _serialize(self, attr_map, null_check)
   1319         """
   1320         attributes = pythonic(ATTRIBUTES)
-> 1321         serialized_container = self._serialize_container(type(self), self, null_check)
   1322         attrs = {attributes: {}}
   1323

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, values, null_check)
    308                 serialized = {MAP_SHORT: cls._serialize_container(type(value), value, null_check)}
    309             else:
--> 310                 serialized = cls._serialize_value(attr, value, null_check)
    311             if NULL not in serialized:
    312                 attrs[name] = serialized

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_value(attr, value, null_check)
    284             serialized = None
    285         else:
--> 286             serialized = attr.serialize(value)
    287
    288         if serialized is None:

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    940             else:
    941                 attr_key = _get_key_for_serialize(v)
--> 942             rval.append({attr_key: attr_class.serialize(v)})
    943         return rval
    944

~/Github/PynamoDB/pynamodb/attributes.py in serialize(self, values)
    798     def serialize(self, values):
    799         if not self.is_raw():
--> 800             return self._serialize_container(type(self), values, null_check=True)
    801
    802         rval = {}

~/Github/PynamoDB/pynamodb/attributes.py in _serialize_container(cls, attr_container, values, null_check)
    304                 value = getattr(values, name)
    305             else:
--> 306                 value = values[name]
    307             if isinstance(value, MapAttribute) and type(value) is not MapAttribute:
    308                 serialized = {MAP_SHORT: cls._serialize_container(type(value), value, null_check)}

KeyError: 'person'

@chensjlv
Copy link
Author

chensjlv commented Mar 13, 2019

Also how can I run all the test cases locally? Seems like some need a bit of setup.

Yi Chen added 2 commits March 13, 2019 09:52
but both

so now serialization pass in both attribute type and its value, and
when a MapAttribute is passed in as a dict, we can still use its class's
metadata to check for the values
@chensjlv
Copy link
Author

I think I fixed all issues on top of my head, so I'm gonna stop committing and wait for suggestions.

@chensjlv
Copy link
Author

Passed all test cases.

@chensjlv chensjlv changed the title Add serialization and type check to AttributeContainer Implement recursive type checking in PynamoDB Mar 15, 2019
@@ -7,4 +7,4 @@
"""
__author__ = 'Jharrod LaFon'
__license__ = 'MIT'
__version__ = '3.3.3'
__version__ = '3.3.13'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this?

@@ -145,7 +145,6 @@ class MapAttribute(Generic[_KT, _VT], Attribute[Mapping[_KT, _VT]], metaclass=Ma
@overload
def __get__(self: _MT, instance: Any, owner: Any) -> _MT: ...
def is_type_safe(self, key: Any, value: Any) -> bool: ...
def validate(self) -> bool: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably have been made private from the beginning, but agreed that we don't want to break the API contract. We can mark these for removal in 5.0, but let's maintain their behavior for now (even if we move the implementation elsewhere)

@@ -143,6 +142,7 @@ def commit(self):
delete_items=delete_items
)
unprocessed_items = data.get(UNPROCESSED_ITEMS, {}).get(self.model.Meta.table_name)
self.pending_operations = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this? Looks like it may introduce a bug when data is None after the batch write

if not self.is_raw():
return type(self)(**deserialized_dict)
return deserialized_dict
if values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need this check?

@@ -272,6 +272,56 @@ def _set_attributes(self, **attributes):
raise ValueError("Attribute {0} specified does not exist".format(attr_name))
setattr(self, attr_name, attr_value)

@staticmethod
def _serialize_value(attr, value, null_check=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type stubs for these methods in the .pyi file?

else:
_null_check = null_check
if value is None and attr.default is not None:
if attr.default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary given the above line? Wouldn't it break defaults of [] or False?

_null_check = null_check
if value is None and attr.default is not None:
if attr.default:
value = attr.default() if callable(attr.default) else attr.default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind assigning defaults here? Why do we need a second place to do this (the first being _set_defaults)?

@chensjlv
Copy link
Author

WOW, I was wondering if this PR will ever be looked at. I will look into this again when I have time. We are still using a forked 3.x version for our product.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants