Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Backwards compatible per key priorities #1586

Merged

Conversation

@jdemaeyer
Copy link
Contributor

@jdemaeyer jdemaeyer commented Nov 6, 2015

This is a partial reversal of #1149 that provides per-key priorities in a backwards compatible fashion.

The backwards incompatible API change in #1149 was that when writing to dictionary settings, their contents were updated, not completely overwritten. This PR brings back the old behaviour of completely discarding the previous contents of the dictionary. If the dictionary setting written to is a BaseSettings instance, it is not overwritten but cleared and then updated (resulting in the same contents, but conserving per-key priorities).

The price for this is:

  • we cannot get rid of the _BASE settings
  • it will again be much harder to disable single components from the command line or from a spider's custom_settings (because disabling a single component requires writing down the complete dictionary without that component). @kmike had the idea that we could add some helper magic for updating (not overwriting) dictionaries, e.g. allow people to do scrapy ... -s update:EXTENSIONS={"blah": null}, or custom_settings: { "update:EXTENSIONS": {"blah": None}}. I'll implement that in a follow-up PR to discuss it and see if it makes unforeseen trouble.

As a bonus, I added backwards compatibility to the build_component_list() utility function, albeit it not being public. @rolando maybe you could check whether it is indeed backwards compatible with how you use it? Also take a look at the now-public BaseSettings.getcomposite() method. It helps in separating a utility function that's meant to convert a dictionary into a sorted list from how that dictionary is created.

@kmike
Copy link
Member

@kmike kmike commented Nov 6, 2015

@jdemaeyer there is a failure in docs check, could you please fix it?

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:fix/backwards-compatible-per-key-priorities branch from 1a35a0c to 0c97a75 Nov 6, 2015
@codecov-io
Copy link

@codecov-io codecov-io commented Nov 6, 2015

Current coverage is 82.89%

Merging #1586 into master will not affect coverage as of 5049915

Powered by Codecov. Updated on successful CI builds.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Nov 6, 2015

ugh damnit, thanks, fixed.

@rmax
Copy link
Contributor

@rmax rmax commented Nov 6, 2015

@jdemaeyer yes, our code works as expected on this PR. Thanks.

@jdemaeyer jdemaeyer mentioned this pull request Nov 9, 2015
37 of 37 tasks complete
@kmike
Copy link
Member

@kmike kmike commented Nov 9, 2015

I'm checking https://github.com/scrapy/scrapy/compare/4a9f76ebd3242b2f91e238c26ff38874e9ee1960...jdemaeyer:fix/backwards-compatible-per-key-priorities?name=fix%2Fbackwards-compatible-per-key-priorities&short_path=7d64b15#diff-7d64b15bbaac9444c9da295924bf42bf:

  • How to disable download handlers? Is this feature supported? It looks like relevant code and docs are deleted.
  • Are lists in ITEM_PIPELINES supported again after this PR?
  • getcomposite public function: I think it is better to extract "If only one argument is given, a second argument with _BASE appended will automatically be added" part into another function.
  • I think getpriority can be written a bit simpler, one option is:
if name not in self:
    return None
return self.attributes[name].priority
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Nov 9, 2015

How to disable download handlers? Is this feature supported? It looks like relevant code and docs are deleted.

Yep, it's supported and got lost while I reverted the docs to "with _BASE", will put it back in. They're disabled by setting their class to None in the DOWNLOAD_HANDLERS dict (much like all other components can be disabled).

Are lists in ITEM_PIPELINES supported again after this PR?

Nope. Though with this PR we could support it again (although it has been deprecated for over two years).

getcomposite public function: I think it is better to extract "If only one argument is given, a second argument with _BASE appended will automatically be added" part into another function.

Hm, calling getcomposite with only one argument is useless then. But I'm fine with adding a new method if you think it's more consistent.

I think getpriority can be written a bit simpler, one option is:

Looks good, we should update __getitem__() then as well.

@kmike kmike added this to the Scrapy 1.1 milestone Nov 10, 2015
@kmike
Copy link
Member

@kmike kmike commented Nov 10, 2015

Hm, calling getcomposite with only one argument is useless then. But I'm fine with adding a new method if you think it's more consistent.

Do we have use cases for a public .getcomposite method with 1+ arguments, i.e. is it used in scrapy? If not then we can drop it for now and implement only _BASE support. Do you need it for addons PR?

@kmike
Copy link
Member

@kmike kmike commented Nov 10, 2015

Nope. Though with this PR we could support it again (although it has been deprecated for over two years).

Ok, I'm fine with dropping support for ITEM_PIPELINE as a list, but we should remember to add it to the release notes. It is easy to miss because this change was not in a separate PR.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Nov 10, 2015

All done. Instead of getwithbase() (getcomposite() before) we could also go back to merging the settings in build_component_list(), as it was before #1149. But I really think that function is not where the merging should take place.

@kmike kmike changed the title Backwards compatible per key priorities [MRG+1] Backwards compatible per key priorities Nov 10, 2015
@kmike
Copy link
Member

@kmike kmike commented Nov 10, 2015

+1 to merge. //cc @curita @dangra

is the one closer to the spider.

To decide which order to assign to your middleware see the
:setting:`SPIDER_MIDDLEWARES_BASE` setting and pick a value according to where
middleware performs a different action and your middleware could depend on some

This comment has been minimized.

@curita

curita Nov 11, 2015
Member

Line 34 from the original version got deleted (https://github.com/scrapy/scrapy/pull/1586/files#diff-17a393aa4b57b33782da9b32c3422e73L34). It should be "<...> and pick a value according to where you want to insert the middleware. The order does matter because each middleware performs <...>".

This comment has been minimized.

@kmike

kmike Nov 11, 2015
Member

i also noticed that, but the sentence reads fine without this line :)

@curita
Copy link
Member

@curita curita commented Nov 11, 2015

There's a remaining _getcomposite() in scrapy/commands/runspider.py.

@curita
Copy link
Member

@curita curita commented Nov 11, 2015

Not sure what's the long term plan for *_BASE settings now, but if we plan to keep them for good we could implement DEFAULT_REQUEST_HEADERS_BASE as Paul suggested.

:type name: string
"""
compdict = self.get(name + '_BASE', {}).copy()
compdict.update(self.get(name, {}))

This comment has been minimized.

@curita

curita Nov 11, 2015
Member

Wanted to point out here that since we're promoting <name>_BASE settings to BaseSettings instances, .copy() does a deep copy of <name>_BASE, but we're passing <name> as it is when updating that copy. We could be consistent and use self.get(name, {}).copy() but doesn't matter too much since we don't have mutable values in default dict settings.

if priority >= self.priority:
if isinstance(self.value, BaseSettings):
self.value.clear()
self.value.update(value, priority=priority)

This comment has been minimized.

@curita

curita Nov 11, 2015
Member

It could be a little odd having a reference for value (gotten from .get() for example) and having it changed when setting something new, as set() was supposed to override the value, not change it. Maybe we could create a new BaseSettings instance?

This comment has been minimized.

@kmike

kmike Nov 11, 2015
Member

good catch, +1

@curita
Copy link
Member

@curita curita commented Nov 11, 2015

Pointed out some minor issues, but after reviewing these changes I still kind of prefer the approach in #1149. Here are some of my (truly subjective) thoughts about it:

  • It has better design. *_BASE settings are redundant having per-key priorities.
  • It has better usability. We usually want to update a set of values in those dict settings (not just *_BASE, I often want to update project-defined components with spider ones). I'd rather prefer updating the dictionaries by default and having a helper to occasionally override them.
  • We could improve its backward compatibility a little, by defining the old *_BASE settings with copies of the new values, but without using them.

Well, having explained myself, I don't think it's worth of a discussion really, this PR seems the way to go. Probably we should try to follow any backward compatible option if there is one, and the magic helper for updating dictionaries you were talking about is just what I'm looking for. So +1 on my side once the nitpicks I commented before get fixed 👍

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Nov 11, 2015

Damnit! :/ thanks alot for the catches/feedback everyone! I've implemented Julia's feedback. There is now no deep-copying at all in getwithbase(): If we decide to allow Python objects (instead of paths) in the component settings (6f91110, part of #1272), users will expect that these exact objects are used in Scrapy, not deep copies of them.

I'll squash commits once you've looked at the changes.

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:fix/backwards-compatible-per-key-priorities branch from 9176af4 to 4f36476 Nov 11, 2015
@curita
Copy link
Member

@curita curita commented Nov 11, 2015

Perfect! Thank you both @jdemaeyer and @kmike :) I'll be merging it now.

curita added a commit that referenced this pull request Nov 11, 2015
…key-priorities

[MRG+1] Backwards compatible per key priorities
@curita curita merged commit 54216d7 into scrapy:master Nov 11, 2015
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 93.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants