Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add key-value to schema.Dict fails silently #393

Closed
masipcat opened this issue Dec 21, 2018 · 36 comments
Closed

Add key-value to schema.Dict fails silently #393

masipcat opened this issue Dec 21, 2018 · 36 comments

Comments

@masipcat
Copy link
Contributor

Hi! Here is my weekly issue/pr :P

I'm opening this issue because I found something weird with fields of type dict with default values (maybe list are affected as well)

Example

I have a custom content-type like this:

class ICustomContentType(IItem):
    images = schema.Dict(
        key_type=schema.TextLine(),
        value_type=schema.TextLine(),
        required=False,
        default={}
    )

And I implemented an endpoint that adds a key-value pair to field images.

@configure.service(
    method="PATCH",
    name="@update-dict",
    permission="guillotina.AddContent"
)
async def update_default_dict(context, request):
    context.images['new-key'] = 'value'
    request._txn.register(context)
    assert len(context.images) == 0  # wat?

After this endpoint retrieve the object and the field images is {}.

BUT if I replace the previous code with context.images = {'new-key': 'value'} then everything works fine.

Is this a bug or I'm not using default value of field properly?

Reproduce

I implemented a test that reproduces the problem: masipcat@9be9d11

Thank you!

@vangheem
Copy link
Member

Isn't that because using default of {} means that's like a global. Can you change that to defaultFactory=dict and see if that works?

@masipcat
Copy link
Contributor Author

masipcat commented Dec 21, 2018

@vangheem still fails :( I thought that default={} was somehow "translated" internally to defaultFactory=dict

@vangheem
Copy link
Member

functionally, sure; however, if you think about just regular python, providing a dict as a default param is dangerous.

For example, you should never do this:

def foobar(opt={}):
  opt['foobar'] = 5  # this value will stay in the default param forever

You should do this instead:

def foobar(opt=None):
  if opt is None:
    opt = {}

That was my point.

@masipcat
Copy link
Contributor Author

@vangheem yeah, I know, learned this the hard way in the past... hehe.
Also, I thought that guillotina was creating a copy of the default every time __getattr__ is called to avoid sharing the same reference between objects.

@vangheem
Copy link
Member

Ahh, you're right. I forgot.

@masipcat
Copy link
Contributor Author

Hey, I think I found the problem

160  	async def update_default_dict(context, request):
161  	    context.images['new-key'] = 'value'
162  	    import pdb; pdb.set_trace()
163  ->	    request._txn.register(context)
164  	    assert len(context.images) == 0  # wat?
(Pdb) id(context.images)
140677027687088
(Pdb) id(context.images)
140677020557888
(Pdb) id(context.images)
140677027688240
(Pdb) id(context.images)
140677020557960
(Pdb) id(context.images)
140677020558032

@masipcat
Copy link
Contributor Author

(Replying your message I inspired myself)

@vangheem
Copy link
Member

ahh right, that's a terrible side affect of the default value on __getattr__ implementation.

Argh, not sure what the right answer is here.

If we set the value here: https://github.com/plone/guillotina/blob/master/guillotina/content.py#L258

with say self.__dict__[name] = value

Then you won't be able to change the default value later if you've never modified the value.

I know we've talked about this before and decided to leave current behavior.

@masipcat for your case, you're better off leaving default as None and doing a code in your code if it is none and then setting it manually to {}. Make sense?

\cc @bloodbare

@ale-rt
Copy link
Member

ale-rt commented Dec 21, 2018

I also stumble on this in the past and the most reasonable solution I found was to provide a custom __getattr__ that overrides the dexterity one:

    def __getattr__(self, name):
        super_value = super(MyClass, self).__getattr__(name)
        if name == 'my_field' and not super_value:
            self.my_field = super_value
            safeWrite(self)
        return super_value

This will make your default persist once it is looked up (it might be a write on read).

@vangheem
Copy link
Member

@ale-rt yup, actually guillotina enforces not allowing writes on GET requests though.

Also has the negative side affect of not being able to latently change a default value.

@masipcat
Copy link
Contributor Author

masipcat commented Dec 21, 2018

@masipcat for your case, you're better off leaving default as None and doing a code in your code if it is none and then setting it manually to {}. Make sense?

@vangheem I can change my code as a temporary fix and ensure that this field is always set when created but IMHO the current behavior doesn't make sense for mutable objects. I'd suggest to:

  1. set the attribute on read
    or
  2. don't allow to set a default value for mutable objects...

@vangheem
Copy link
Member

I think I'm leaning toward #2.

@masipcat
Copy link
Contributor Author

  1. A third option could be implement a proxy for lists and dicts that does a set-on-write. I don't know about the complexity of this...

@vangheem
Copy link
Member

I'm actually most in favor of the simplicity of option #1 but I know we refrained from doing that before.

If we can convince @bloodbare that we're okay with #1 :)

Also, anyone have an opinion on what kind of version bump would be required for something like this? Depending on how we look at this as internal implementation vs part of the API affects if this would require major/minor/patch number...

@masipcat
Copy link
Contributor Author

Here is my try to convince @bloodbare:

I understand the default attribute as in SQL: if no value is provided for a field during creation, we store the default value instead of NULL.

Also, I understand that the current behavior makes sense for some cases. So I'd suggest something like this:

class ...(Internface):

    default_value = schema.Placeholder(
        value={},
        object=schema.Dict()
    )

Maybe the syntax doesn't make sense but the idea is to provide a mechanism to say explicitly "if no value is set to this field, return this value instead".

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

the behavior of providing a default value in case of a null value seems a bit wired to me (at least sounds like unexpected behaviours.. nothing on the object.. but a value at runtime)

I think I have had some issues with something like this on the past.

Just my opinion is that if something is provided as default this has to be stored on the object.. (not transformed later at runtime)

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

@vangheem not sure if I'm wrong.. but is possible that we fixed something like thia on the registry?

@vangheem
Copy link
Member

@jordic I agree.

@masipcat we do already have the "missing_value" functionality. We could just tweak __getattr__ to use missing_value if provided. If default is provide, then it will make sure to also do a setattr so it doesn't hit __getattr__ next time around and is stored on the object. I actually think then that'd be the "correct" behavior.

@jordic @bloodbare what are your thoughts on ^^^

@masipcat
Copy link
Contributor Author

@vangheem I like this solution!!

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

sounds good to me!

@bloodbare
Copy link
Member

bloodbare commented Dec 21, 2018

@vangheem I don't understand why a setattr needs to happen when we do a getattr:

On a ro operation (GET) getting always the default value or the default factorized value should be enough. In my opinion, the main value of default values are migrations and upgrades of the schema, so delaying setting the value of a field on the real object its a great feature. Also keeping the size of the db small is important, if you have 20M documents a simple default field serialized is a huge amount of disc space.

On a rw operation (POST/PATCH) getting the default value or the default factorized value enables to assume that a list exists and just append to it, so it's important that in that case, we provide a mechanism that saves the recently created object. Cloud we make that if comes from defaultFactory we save on the object after getting on rw operations? If there is no defaultFactory we just return the default value (that should be used for immutable values) or missing_value. (We should also find how to not follow on that situation when we are serializing all the object as result of the operation doing a get to all fields and saving all the default values).

Is likely what you said?

R

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

The problem here is about ergonomics. As an example.. you have an schema and you add a new field but you miss to evolve it throught a migration, you end up with something that exists but fails when you try to patch it.. because it doesn't exists (it exisits because you are getting the default)this feels a bit wired (because you end with inconsistent state due a performance optimitzation?) ... correct me if I'm wrong :)

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

perhaps it's as easy as fail on boot saying that you have inconsitent schema changes.. or something like this.. but this case ahould fail explicit

@bloodbare
Copy link
Member

@jordic if you add a new field why you need to evolve it? A migration should only be needed if you are indexing it on ES, no?

If you modify a field that exists from one type to another, you MUST do a migration

If you modify the default value of a field and with the actual behavior you don't need to do anything.

@masipcat
Copy link
Contributor Author

In my opinion, the main value of default values are migrations and upgrades of the schema, so delaying setting the value of a field on the real object its a great feature.

@bloodbare If that's the purpose of default, maybe we could leave the default as it is. Then we could say that the purpose of missing_value is to set a value when is not provided. Most of the time this will happen during creation but could happen as well when doing __getattr__() for an object created before the field was defined.

@jordic
Copy link
Contributor

jordic commented Dec 21, 2018

for the first case.. because it has a default that when you GET, it works.. but if you PATCH it doesn't exists on the instance.. (is what i think).

for the second case it's something not explicit.. neither well documented and my thought on it is that it should be explicit and fail at least at startup.

@bloodbare
Copy link
Member

@masipcat I'm ok with having a field attribute that will set up its value or a factory to set the value (maybe should be a factory as this case mostly affects mutable objects). Should be on creation or on the attribute getter, agree. What I don't like is that, in any case, writing on the DB needs to be done on a GET operation.

@masipcat
Copy link
Contributor Author

What I don't like is that, in any case, writing on the DB needs to be done on a GET operation.

@bloodbare Can we prevent marking the object as dirty when the request method is get? Or check the method of the current request in the getattr to decide what to do . I don't know...

@vangheem
Copy link
Member

vangheem commented Dec 22, 2018 via email

@vangheem
Copy link
Member

I actually think simply doing a setattr when the default __getattr__ is called will work just fine.

With any deserialization, we only ever do what is being deserialized. We won't be storing complete default values on content.

On GET/serialization, we aren't saving the object.

@masipcat
Copy link
Contributor Author

@vangheem So the fix would be just this line? masipcat@ec561f9

@vangheem
Copy link
Member

Yup, that should work. Regular "setattr(self, name,value)" works too!

@masipcat
Copy link
Contributor Author

I'll change the line with setattr(self, name,value). Green light to do a PR with this solution? cc/ @bloodbare @vangheem

@vangheem
Copy link
Member

I'm fine with it :)

@bloodbare
Copy link
Member

I'm also fine with it! Sorry to missunderstood that besides setattr we were going to do _p_register.

R

@masipcat
Copy link
Contributor Author

@vangheem: Also, anyone have an opinion on what kind of version bump would be required for something like this? Depending on how we look at this as internal implementation vs part of the API affects if this would require major/minor/patch number...

@bloodbare ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants