Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue #5926: Ensure that html forms (multipart form data) respe… #5927

Merged
merged 4 commits into from Apr 20, 2018
Merged

Fixed issue #5926: Ensure that html forms (multipart form data) respe… #5927

merged 4 commits into from Apr 20, 2018

Conversation

anx-ckreuzberger
Copy link
Contributor

@anx-ckreuzberger anx-ckreuzberger commented Apr 9, 2018

Fixes #5926 Fixes #5807

When making multi-part form-data requests to a DRF serializer with a nested list (serializer), utils.html.parse_html_list() always returned an empty list for not provided list-fields (see #5926 for more details).

I fixed this by adding a default parameter to parse_html_list, which is returned if len(ret.keys()) == 0.

I've also added a failing test case in test_serializer_nested.py, which works with my fix but doesn't work with the current master.

@anx-ckreuzberger
Copy link
Contributor Author

I've added the tests written by @awbacker of PR #5812 with his permission.

This should increase test coverage for ListFields.

Copy link
Collaborator

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anx-ckreuzberger.

Thanks for the effort here: both the analysis and the fix.

This looks correct to me. I'd guess no-one is deliberately using the current behaviour so I'll mark it for v3.8.3.

I'm just ask @tomchristie to have a look in case he has anything to add.

@@ -12,7 +12,7 @@ def is_html_input(dictionary):
return hasattr(dictionary, 'getlist')


def parse_html_list(dictionary, prefix=''):
def parse_html_list(dictionary, prefix='', default=[]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this argument? At all call sites we're passing empty. (Maybe there are third-party users. This would maintain BC for such users.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two (and a half) reasons why I implemented it this way

Reason 1: Compatibility

The way it used to be was this:

def parse_html_list(dictionary, prefix=''):
    ret = {}
    # magic happens here :)
    return [ret[item] for item in sorted(ret)]

Now if the ret dict being empty, parse_html_list returns an empty list.

As you already said, the default=[] would help to avoid a breaking change for 3rd party libraries (and to be fair, parse_html_list

Reason 2 (and a half): Codestyle and DRY

The most obvious alternative would be this:

def parse_html_list(dictionary, prefix=''):
    ret = {}
    # magic happens here :)
    from rest_framework.fields import empty
    return [ret[item] for item in sorted(ret)] if len(ret.keys()) > 0 else empty

However, we can (should) not import empty at the top of utils/html.py, as it is a utility file, and it will lead to cyclic imports. Hence we would have to import it within the parse_html_list function. And that just looks wrong to me from multiple perspectives (code style, complexity, expectations on code).

The other (less obvious and less dry) alternative is to look at all call sites of parse_html_list and change the call from

return html.parse_html_list(dictionary, prefix=self.field_name, default=empty)

to

ret_val = html.parse_html_list(dictionary, prefix=self.field_name)
return ret_val if len(ret_val) == 0 else empty

We would have to do that at several places within the code, but it seems okay for me.

Also, I want to address your statement:

At all call sites we're passing empty

Actually, I deliberately did not touch the ListField.to_internal_value method here:
https://github.com/anx-ckreuzberger/django-rest-framework/blob/3d51bcf54ea84e5030db01bb6dbd274abbc4d0bc/rest_framework/fields.py#L1621-L1631

I am unsure if returning empty for the ListField here would cause any issues or not. I only modified those calls to parse_html_list, where to parent method returns empty in certain cases anyway, such as here:
https://github.com/anx-ckreuzberger/django-rest-framework/blob/3d51bcf54ea84e5030db01bb6dbd274abbc4d0bc/rest_framework/fields.py#L1606-L1619

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks @anx-ckreuzberger. That all make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_html_list isn't public API, and isn't subject to the deprecation policy. We certainly shouldn't use mutable arguments there either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally forgot that using [] as a mutable argument is a bad idea.

What should we do about parse_html_list? I can just switch the default=[] to default=None and I handle this differently in ListField.to_internal_value?

https://github.com/anx-ckreuzberger/django-rest-framework/blob/3d51bcf54ea84e5030db01bb6dbd274abbc4d0bc/rest_framework/fields.py#L1621-L1631

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what I'll overwrite with default=list for the function calls means exactly. I don't think there's any reason we should be using the base list type here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I didn't get my question across to you :) Would you rather want to see

def parse_html_list(dictionary, prefix='', default=list):
    if default is list:
        default = list()

or

def parse_html_list(dictionary, prefix='', default=None):
    if default is list:
        default = list()

edit:
or any of the above, but instead of list() we use []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither. I don't think if default is list makes sense - why would we be passing the bare type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the comment of @carltongibson to avoid having a mutable as a default parameter.

You could pass list itself. Then:

if default is list:
    default = list()

This allows using None as an actual default.

But I'll just set the default to None, and discard the if default is list idea :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"""
When there are no keys passed, there is no default, and required=False
The field should return an array of 1 item, blank
* Not sure if this is desired behavior, but it is logical at least
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just have to decide that this is the expected behaviour. Anybody running in to it as wrong is welcome to implement a custom solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. If you pass the ?key= there is no way to know if you meant 'blank string' or 'no value'. In the case you want a default value when that happens, you should need to set the default. If we could say "blank is not a valid value" then that would change things, but I highly doubt that would happen. Maybe a separate option allow_blank_str

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to add an option. We're far enough down the rabbit hole that we'll wait for people to hit problems before we try and address them. (Chances are that never occurs.)

I'd be happy if we just remove the last line of the docstring here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll remove the last line of the docstring here! Thanks @awbacker for providing more infos :)

@carltongibson carltongibson added this to the 3.8.3 Release milestone Apr 13, 2018
@@ -59,7 +61,7 @@ def parse_html_list(dictionary, prefix=''):
ret[index][key] = value
else:
ret[index] = MultiValueDict({key: [value]})
return [ret[item] for item in sorted(ret)]
return [ret[item] for item in sorted(ret)] if len(ret.keys()) > 0 else default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just have return [...] if ret else default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure if an empty dictionary would always be False (for both, python2 and python3), but it seems to be the case.
I'll rewrite that statement.

@anx-ckreuzberger
Copy link
Contributor Author

Alright, I've updated the files according to the review. Thanks everyone for the help!

  • parse_html_list no longer uses a mutable [] as a default param
  • ListField.to_internal_value calls parse_html_list with default=[] so it still works as it used to work

@carltongibson
Copy link
Collaborator

ListField.to_internal_value calls parse_html_list with default=[] so it still works as it used to work

Yes. Good. I think I had in mind (for no sensible reason) that you'd set the default as None but then return a list. (This would of course be silly.) This way is much better.

Thanks for the effort @anx-ckreuzberger!

@anx-ckreuzberger
Copy link
Contributor Author

Is there anything left to do for me, before weekend hits?

@carltongibson
Copy link
Collaborator

Hi @anx-ckreuzberger. No. I think you're good. 🙂. I'm basically happy to merge it but was just sitting with it for a while. (It's on my list for tomorrow really.) Thanks for the effort!

@tomchristie
Copy link
Member

I guess what might be helpful from my POV is a quick description in this thread explaining which of each of those four cases uses default=empty vs. which use default=[]. I'm happy with how the changes look codewise, but it's not obvious to me from a glance exactly why their different in different cases, or how to review if they're being set to the correct default in each case.

In other words, an explanation of why each of these four cases has the correct default now...

@tomchristie
Copy link
Member

And thanks so much for you work on this, too - it's looking good.

@anx-ckreuzberger
Copy link
Contributor Author

Will do those explanations tonight :)

@anx-ckreuzberger
Copy link
Contributor Author

I'm describing all calls of html.parse_html_list in the order that @tomchristie requested.

TL;DR: I found a potential issue in ListSerializer.to_internal_value while reviewing my changes, so please hold on merging.

1: rest_framework/fields.py - ListField

ListField.get_value

Here we are using return html.parse_html_list(dictionary, prefix=self.field_name, default=empty) to copy the behaviour for the case that the input is not a multipart-form (html input), but JSON/XML. Look at the following source code:

https://github.com/anx-ckreuzberger/django-rest-framework/blob/1cfd42d46c13bc2ad7639cc8e642ee79e8e30af7/rest_framework/fields.py#L1606-L1619

It says return dictionary.get(self.field_name, empty), meaning that if the requested field is not listed in the dictionary, it should return empty. So when parse_html_list is not able to find the requested field, it will also return empty.

ListField.to_internal_value

Here we are using data = html.parse_html_list(data, default=[]) (used to be data = html.parse_html_list(data). This part would have returned an empty list before my changes. We've changed the behaviour of parse_html_list, such that it would return None instead of an empty list, hence we are overriding this by setting the param default=[].
Furthermore, this method requires to always return a list (or something similar to a list).

2. rest_framework/serializers.py - ListSerializer

ListSerializer.get_value

Here we are using return html.parse_html_list(dictionary, prefix=self.field_name, default=empty).
Same as with ListField.get_value, ListSerializer.get_value returns emptyin the case of non multipart-form:
return dictionary.get(self.field_name, empty)
So we are just making sure that the behaviour is the same here.

ListSerializer.to_internal_value

Here are using data = html.parse_html_list(data, default=empty). This method of ListSerializer expects a list, else it will raise an Exception:
https://github.com/anx-ckreuzberger/django-rest-framework/blob/1cfd42d46c13bc2ad7639cc8e642ee79e8e30af7/rest_framework/serializers.py#L640-L646

I am a little bit unsure on why I am having default=empty here instead of default=[] (I guess it should be the same as in ListField.to_internal_value).
I will try to verify that tomorrow, so please hold on merging until I can confirm that this behaviour is right, or wrong. Maybe I'll have to add a unit test for it.

@anx-ckreuzberger
Copy link
Contributor Author

I was right in my last comment. ListSerializer.to_internal_value should not use default=empty, but default=[].

I managed to create a failing test for my implementation of ListSerializer.to_internal_value.

class TestEmptyListSerializer:
    def setup(self):
        class ExampleListSerializer(serializers.ListSerializer):
            child = serializers.IntegerField()

        self.Serializer = ExampleListSerializer

    def test_nested_serializer_with_list_json(self):
        input_data = []

        serializer = self.Serializer(data=input_data)

        assert serializer.is_valid()
        assert serializer.validated_data == []

    def test_nested_serializer_with_list_multipart(self):
        input_data = QueryDict('')
        serializer = self.Serializer(data=input_data)

        assert serializer.is_valid()
        assert serializer.validated_data == []

I can confirm that this test works with DRF 3.8.2 (before my changes), and my changes break this test.

To fix it, I just created another commit where I change ListSerializer.to_internal_value such that it uses data = html.parse_html_list(data, default=[]) (instead of default=empty).

@carltongibson
Copy link
Collaborator

Hi @anx-ckreuzberger. Thanks for this. It looks great.

@carltongibson carltongibson merged commit f148e4e into encode:master Apr 20, 2018
@rpkilby rpkilby modified the milestones: 3.8.3 Release, 3.9 Release Aug 29, 2018
pchiquet pushed a commit to pchiquet/django-rest-framework that referenced this pull request Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants