Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instances share objects after copy or created from unpacking a dict #249

Closed
gangefors opened this issue Aug 25, 2018 · 7 comments
Closed

Comments

@gangefors
Copy link
Contributor

gangefors commented Aug 25, 2018

This is related to #154 and I'm using pydantic v0.12.1

Here is a simple example to show what I mean.

>>> class Model(pydantic.BaseModel):
...     my_dict: dict
...     
>>> model1 = Model(**{'my_dict': {'my_list': [1,2,3]}})
>>> model1
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2 = model1.copy()
>>> model2
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2.my_dict['my_list'] = []
>>> model2
<Model my_dict={'my_list': []}>
>>> model1
<Model my_dict={'my_list': []}>

Another example when created by unpacking a dictionary.

>>> my_dict = {'my_dict': {'my_list': [1,2,3]}}
>>> model1 = Model(**my_dict)
>>> model1
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2 = Model(**my_dict)
>>> model2
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2.my_dict['my_list'] = []
>>> model1
<Model my_dict={'my_list': []}>
>>> model2
<Model my_dict={'my_list': []}>

BaseModel.dict() seems to create copies correctly and can be used instead of .copy() to create a new object as seen with model3 in this example.

>>> model1 = Model(**{'my_dict': {'my_list': [1,2,3]}})
>>> model1
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2 = model1.copy()
>>> model2
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model3 = Model(**model1.dict())
>>> model3
<Model my_dict={'my_list': [1, 2, 3]}>
>>> model2.my_dict['my_list'] = []
>>> model1
<Model my_dict={'my_list': []}>
>>> model2
<Model my_dict={'my_list': []}>
>>> model3
<Model my_dict={'my_list': [1, 2, 3]}>

I think this is quite serious since you can't know if you are working with a true copy of a model or not.

I see two things that need to be fixed.

.copy() should create a deepcopy that doesn't share any objects with the model it was copied from. User created classes have to support deepcopy operations to work correctly.

BaseModel(**dict) need to make sure that all items in the dict are deepcopied from the unpacked dictionary.

@samuelcolvin
Copy link
Member

Hi, thanks for reporting.

I agree entirely that .copy() should take a deep copy.

BaseModel(**dict) is a more nuanced case. In 99% of cases people wouldn't care about coping the values and copying the values will be significantly slower. Example take the case (roughtly aiohttp but any other web framework and many other cases would be the case):

async def my_view(request):
    try:
        m = Model(**await request.json())
    except ValidationError as e:
        raise BadRequestError(**e.dict())
    ... do stuff with m

This is the primary use case for pydantic (at least for me) and I want it to remain as performant as possible (both in terms of CPU and memory). I really don't want to copy all the data from the result of .json() here.

We also can't add keyword arguments to Model.__init__() for fear of conflicting with input data. Instead I would suggest we add a class method Model.parse_copy().

For the same kinds of reason I don't really want .dict() to make a copy by default (for example .json() uses .dict() and definitely wouldn't require a copy) but here we could add an argument to deep copy the data

Does that make sense? Would you be willing to submit a PR for some or all of this?

@samuelcolvin
Copy link
Member

BTW, you can reference other issues with #<issue number> you don't have to create a markdown link to the issue, this also has the advantage of showing the reference on the referenced issue.

@gangefors
Copy link
Contributor Author

Totally agree that preformance is key.

If a true copy is needed at initialization the user can do that without the need for a seperate method. And a user can always do a deepcopy after converting dict(). So let's just focus on the initial problem I had, the copy() method (I noticed the other behaviours while testing).

Do you want to change the way copy() works or should we add a deepcopy() method to seperate the two?
Shallow copies might be enough for many use cases so changing that behaviour might not be the best call.

@samuelcolvin
Copy link
Member

I think we add a deep=False argument to copy() and are explicit in the changelog and docs about what's happening.

I doubt many people actually use copy() directly.

@gangefors
Copy link
Contributor Author

We use it during unit test where we want to have a base data model to start with and then just change what's needed for the test. That's how I ran into this issue, made a copy of the data, first test emptied a list in a dict and then the next test had nothing to check for despite copying the base data.

So I do think there is good use for it.

I'll look into doing a PR for this. Don't know then I can finish it though, but will try to have something for review in a week or so.

@samuelcolvin
Copy link
Member

Great, thanks so much. I agree it's useful.

@gangefors
Copy link
Contributor Author

Took a little longer than expected before I had time to do this, but now it's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants