Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter without removing duplicated value in query string #126

Conversation

rennerocha
Copy link
Contributor

This should fix #125.

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #126 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage    95.3%   95.37%   +0.06%     
==========================================
  Files           7        7              
  Lines         469      476       +7     
  Branches       93       97       +4     
==========================================
+ Hits          447      454       +7     
  Misses         15       15              
  Partials        7        7
Impacted Files Coverage Δ
w3lib/url.py 97.98% <100%> (+0.07%) ⬆️

@Gallaecio
Copy link
Member

@rennerocha Could you add tests for the case where the key you are adding or replacing already exists among the parameters? Specially when it exists twice in the existing URL. I think only the first match is replaced, I’m not sure whether or not that is intended.

@rennerocha
Copy link
Contributor Author

You are right: add_or_replace_parameters("http://domain/test?arg1=v1&arg2=v2&arg1=v3", {"arg1": "v4") will return http://domain/test?arg1=v4&arg2=v2&arg1=v3 .

Not sure what should be the correct behavior. Another option is remove the duplicates and return http://domain/test?arg1=v4&arg2=v2, but I believe that removing existing parameters is exactly what we want to avoid.

Any other thoughts?

@Gallaecio
Copy link
Member

Well, given the name of the function and the previous behavior, I suspect that we should remove all existing parameters matching the specified name.

@rennerocha
Copy link
Contributor Author

@Gallaecio Did the change as you suggested. The only drawback is that it will not be possible to add duplicated values using this function, as the input is a dictionary.

w3lib/url.py Outdated
del params[name]
elif name not in changing_params:
new_args.append((name, value))
new_args.extend([(name, value) for name, value in params.items()])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for list comprehension here, only new_args.extend(params.items()) is enough.
You can use list(params.items()) to make it explicit.
Also, new_args += params.items() would be more pythonic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I removed the side-effect over the input parameters, I need this list comprehension to filter the values that I don't want (the modified ones). Replaced .extend to += anyway.

w3lib/url.py Outdated
new_args = OrderedDict(args)
new_args.update(params)
new_args = []
changing_params = list(params.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it converted to a list? a set would be more correct and .keys() method call is not required: changing_params = set(params).

w3lib/url.py Outdated
new_args = []
changing_params = list(params.keys())
for name, value in args:
if name in params.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to get the list of .keys(), it should be possible to test like if name in params if params is a dict.

w3lib/url.py Outdated
for name, value in args:
if name in params.keys():
new_args.append((name, params[name]))
del params[name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mutating params is a good idea, I'd prefer to flag seen params in another set and ignore them on line 215.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in the line of

  def _add_or_replace_parameters(url, params):
      parsed = urlsplit(url)
      args = parse_qsl(parsed.query, keep_blank_values=True)
      acc = []
      seen = set()
      for k, v in args:
          if k in params:
              if k not in seen:
                  acc.append((k, params[k]))
              seen.add(k)
          else:
              acc.append((k, v))

      acc += ((k, v) for k, v in params.items() if k not in seen)
      query = urlencode(acc)
      return urlunsplit(parsed._replace(query=query))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mutating params is a good idea, I'd prefer to flag seen params in another set and ignore them on line 215.

As the input argument is a dict, it makes sense that it shouldn't be modified by the function. I included a test case to avoid this scenario as well.

@codecov-io
Copy link

Codecov Report

Merging #126 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage    95.3%   95.37%   +0.06%     
==========================================
  Files           7        7              
  Lines         469      476       +7     
  Branches       93       97       +4     
==========================================
+ Hits          447      454       +7     
  Misses         15       15              
  Partials        7        7
Impacted Files Coverage Δ
w3lib/url.py 97.98% <100%> (+0.07%) ⬆️

@codecov-io
Copy link

codecov-io commented Jul 4, 2019

Codecov Report

Merging #126 into master will decrease coverage by 0.11%.
The diff coverage is 92.85%.

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
- Coverage    95.3%   95.19%   -0.12%     
==========================================
  Files           7        7              
  Lines         469      479      +10     
  Branches       93       98       +5     
==========================================
+ Hits          447      456       +9     
  Misses         15       15              
- Partials        7        8       +1
Impacted Files Coverage Δ
w3lib/url.py 97.52% <92.85%> (-0.4%) ⬇️

@rennerocha
Copy link
Contributor Author

@dangra and @ejulio I did some changes after your comments. Could you please review it again?

w3lib/url.py Outdated
if name in params:
if name not in seen_params:
new_args.append((name, params[name]))
seen_params.add(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💄 It does not matter much, but…

Suggested change
seen_params.add(name)
seen_params.add(name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good 💅 :)

w3lib/url.py Outdated
if name not in seen_params:
new_args.append((name, params[name]))
seen_params.add(name)
elif name not in changing_params:
Copy link
Member

@Gallaecio Gallaecio Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking the coverage data, and I may be missing something, but I think this could be replaced by an else, and the changing_params variable may be unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else block should be fine per #126 (comment) and
no need for changing_params either

@Gallaecio
Copy link
Member

@rennerocha We may have a new release soon 😀. Please let us know if you think you’ll have time to address #126 (comment) in the following days.

@rennerocha
Copy link
Contributor Author

@rennerocha We may have a new release soon grinning. Please let us know if you think you’ll have time to address #126 (comment) in the following days.

Hi! Yes, I will work on this comment in the next days. Hopefully on time for the next release!

@rennerocha
Copy link
Contributor Author

@Gallaecio I removed the nested conditional. I believe it is clearer now. Could you please review it again?

@Gallaecio
Copy link
Member

@dangra @ejulio Can you guys have a look at the latest changes, see if this is ready to merge?

@kuza
Copy link

kuza commented Oct 7, 2019

Hi guys @dangra @ejulio @Gallaecio, how about merge this pull?

@Gallaecio Gallaecio merged commit 165f165 into scrapy:master Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in NEW add_or_replace_parameter (w3lib 1.20.0)
7 participants