Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Per-key priorities for dict-like settings by promoting dicts to Settings instances #1149

Merged

Conversation

@jdemaeyer
Copy link
Contributor

@jdemaeyer jdemaeyer commented Apr 11, 2015

Expand settings priorities by assigning per-key priorities for the dict-like settings (e.g. DOWNLOADER_MIDDLEWARES), instead of just a single priority for the whole dictionary. This allows updating these settings from multiple locations without having to care (too much) about order. It is a prerequisite for the add-on system (#1272, #591).

There are two main updates:

  1. All default dictionary settings are promoted to the BaseSettings class (formerly Settings). They behave just like dictionaries, but honour per-key priorities when being written to.
  2. Since their rationale becomes obsolete with per-key priorities, all X_BASE settings are deprecated, with default entries now living in the X setting.

And several smaller updates:

  1. Settings is a subclass of BaseSettings. It loads the default settings and promotes dictionaries within them to BaseSettings instances.
  2. BaseSettings has a complete dictionary-like interface.
  3. All dictionary-like component/handler lists allow disabling components/handler by setting the value of a key/value pair to None. A new helper, scrapy.util.without_none_values() was introduced for this. This was previously not supported by FEED_STORAGES, FEED_EXPORTERS, and DEFAULT_REQUEST_HEADERS.
  4. The scrapy.util.build_component_list() helper has been updated according to the deprecation of _BASE settings, as the (base, custom) call signature does not make much sense anymore. It's still backwards-compatible.
  5. ITEM_PIPELINES can no longer be provided as list

Comes with many new/updated tests and documentation.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Apr 12, 2015

Implementing per-key priorities like this would render #1110 obsolete

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented May 25, 2015

I've resolved issue 1 by providing a BaseSettings class that contains all the methods. Settings is now a subclass which overwrites __init__() and loads the default setting.
Issue 3 is resolved by completely resolving a Settings-setting when the update value is not a mapping, and I've outsourced the duplicate priority string resolving (issue 4).
Issue 5 should probably get its own PR.
@curita I needed to merge master into this since automatic merging was no longer possible at some point. But now I have this huge commit in this PR. Do I wait until we're ready to merge this and then rebase before we do?

@@ -13,7 +13,7 @@ class DownloadHandlers(object):
def __init__(self, crawler):
self._handlers = {}
self._notconfigured = {}
handlers = crawler.settings.get('DOWNLOAD_HANDLERS_BASE')
handlers = crawler.settings.get('DOWNLOAD_HANDLERS_BASE', {})

This comment has been minimized.

@curita

curita May 27, 2015
Member

We could use crawler.settings.getdict() (which has {} as default already) to get both 'DOWNLOAD_HANDLERS_BASE' and 'DOWNLOAD_HANDLERS'.

This comment has been minimized.

@nramirezuy

nramirezuy May 27, 2015
Contributor

Not just that it also allows you to pass DOWNLOAD_HANDLERS from command line.

@@ -195,8 +195,7 @@ def item_scraped(self, item, spider):
return item

def _load_components(self, setting_prefix):
conf = dict(self.settings['%s_BASE' % setting_prefix])
conf.update(self.settings[setting_prefix])
conf = dict(self.settings[setting_prefix])

This comment has been minimized.

@curita

curita May 27, 2015
Member

Shouldn't we keep '_BASE' loading an updating here for backward compatibility as the other cases?

This comment has been minimized.

@curita

curita May 27, 2015
Member

Also it's probably better to replace that call for conf = self.settings.getdict(settings_prefix).

@@ -19,6 +19,12 @@
'cmdline': 40,
}

def get_settings_priority(priority):

This comment has been minimized.

# that does not map keys to values, e.g.a list). I included
# this b/c the test_cmdline/__init__.py test uses a deprecated
# configuration API (EXTENSIONS is a list) and fails otherwise.
# Should it stay in?

This comment has been minimized.

@curita

curita May 27, 2015
Member

Let's remove and isinstance(value, Mapping), build_component_list already handles the case of custom being a list or a tuple by returning it right away, without updating '_BASE'. Other settings that don't use build_component_list should break since passing lists wasn't an option before.

This comment has been minimized.

@jdemaeyer

jdemaeyer May 28, 2015
Author Contributor

The thing is that the EXTENSIONS test was only an example of what developers might actually want to do. What if they have a settings that is a BaseSettings instance, but for some reason they decide to replace it with a different kind of variable, say None. The isinstance(value, Mapping) allows for this.
I guess the alternative is to make developers actively acknowledge that they're losing the BaseSettings instance and all its values/priorites by deleting it, then inserting a new setting with the type they want. While that is more explicit, it's probably not expected behaviour, given that "type replacement" for all other types of settings does not yield exceptions.

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 2, 2015
Author Contributor

On second thought, now that only the default dicts are promoted, and given that these should never be replaced by any incompatible type, I will remove the and isinstance(value, Mapping). It also causes problems with passing json-encoded strings.

# configuration API (EXTENSIONS is a list) and fails otherwise.
# Should it stay in?
self.value.update(value, priority)
self.priority = max((priority, self.priority))

This comment has been minimized.

@curita

curita May 27, 2015
Member

Idea here is that self.priority is the highest priority within the internal BaseSettings? In that case, I think we should lookup inside the provided value instead of relying in the passed priority. Examples like this one won't get the expected priority otherwise:

> s = Settings()
> s.set('option', BaseSettings({'key': 'value'}, priority='spider'))
> s.getpriority('option')
  20  # priority 'project' instead of 'spider'

This comment has been minimized.

@jdemaeyer

jdemaeyer May 28, 2015
Author Contributor

Yep, that was the idea. I've updated the code so it works as expected, thereby introducing a maxpriority() method to the BaseSettings class since there are two places in the code where I need to find the maximum priority. It's probably not very useful for anything else though, does it clutter up the API too much?

@@ -45,14 +62,12 @@ def __str__(self):
__repr__ = __str__


class Settings(object):
class BaseSettings(MutableMapping):

This comment has been minimized.

@curita

curita May 27, 2015
Member

I really like this change 👍

@@ -88,19 +106,30 @@ def getdict(self, name, default=None):
value = json.loads(value)
return dict(value)

def getpriority(self, name):

This comment has been minimized.

@curita

curita May 27, 2015
Member

Needed method 👍

else:
if isinstance(value, dict):
value = BaseSettings(value, priority)
self.attributes[name] = SettingsAttribute(value, priority)

This comment has been minimized.

@curita

curita May 27, 2015
Member

After giving it some more thought, I think we should deal with BaseSettings instances explicitly instead of promoting dictionaries. That way users won't get unexpected or backward incompatible behaviors in their own dict settings, and we can choose specifically which settings are going to be BaseSettings by just declaring their values to be that.

We can still do some sort of implicit propagation only in the Settings class while loading the default values, checking if a setting is a dict and replacing or setting it (depending when it's done) to a BaseSettings object instead. Another option if that's not possible could it be to just import BaseSettings into default_settings.py and use it there for the dict values. For that matter maybe we could move Settings to another module within scrapy/settings/ so we get rid of the default_settings import in scrapy/settings/init.py.

This comment has been minimized.

@jdemaeyer

jdemaeyer May 28, 2015
Author Contributor

I agree. Now that we split Settings and BaseSettings explicit promotion of only the default settings seems a more sane choice.

del self.attributes[name]

def __delitem__(self, name):
self.delete(name)

This comment has been minimized.

@curita

curita May 27, 2015
Member

I'm not sure I like the new delete() method, seems like users could get confused about their usage and their differences and similarities with set(). For keeping the dict-like interface I think we could keep __delitem__ with a simpler implementation, such as deleting the mapping in self.attributes.

This comment has been minimized.

@jdemaeyer

jdemaeyer May 28, 2015
Author Contributor

I like having the option of "Delete this setting unless someone has updated it with a higher priority than what I'm doing", but would yield to the majority if that is considered useless. ;)
But I think you're right in that __delitem__ should ignore priorities and just delete from attributes

@@ -13,7 +15,7 @@ def build_component_list(base, custom):
"""
if isinstance(custom, (list, tuple)):
return custom
compdict = base.copy()
compdict = BaseSettings(base, priority = 'default')

This comment has been minimized.

@curita

curita May 27, 2015
Member

A note about coding style in this line and in general: though we're not enforcing it we try to comply with pep8 for new added code to scrapy as closely as we can (only rule we agreed to change it's the 80 chars max line width rule, we settled on a more flexible 100 chars). This pull request follows pep8 mostly, but I'd try to stick to the rule of "There shouldn't be spaces around default arguments nor immediately after parenthesis, brackets or braces".

This comment has been minimized.

@jdemaeyer

jdemaeyer May 28, 2015
Author Contributor

I definitely try to stick to PEP8, but I have to admit that I frequently break the "no spaces for keyword arguments" rule b/c it feels so unnatural to me. Will fix :)

@curita
Copy link
Member

@curita curita commented May 27, 2015

Currently DOWNLOAD_HANDLERS is the only setting that does not use the dictionary values for ordering (but instead uses them to store the path of handler classes) and thus cannot use build_component_list(). However, this may change in the future, maybe most of the code from DownloadHandlers.init() should be outsourced into a new helper function (similar to build_component_list but without ordering) in scrapy.utils.conf?

FEED_STORAGES and FEED_EXPORTERS are dictionaries with paths that don't use ordering too, and those should handle *BASE settings (DEFAULT_REQUEST_HEADERS is a dict as well, but it's used differently). I like the idea of outsourcing the common code from FEED* and DOWNLOAD_HANDLERS loading and build_component_list, and it makes sense to do so in this pr. Particularly I'd like to see a single place handling the BASE setting loading and updating with setting so we can deprecate it sometime in the future more easily. Another thing that's useful for is to ensure that all settings can be disabled by assigning them the value None. Right now I think both FEED* settings don't support this for example (and they should).

@curita I needed to merge master into this since automatic merging was no longer possible at some point. But now I have this huge commit in this PR. Do I wait until we're ready to merge this and then rebase before we do?

Don't worry, just rebase upstream/master when you can (we'll definitely need that before merging but can be done at any time you want) and force push the new git history.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented May 29, 2015

Thank you guys for the feedback!

I now have:

  1. Disabled auto-promotion of dicts, only dicts from default_settings will now be promoted to BaseSettings instances
  2. Updated the SettingsAttribute.priority behaviour for SettingsAttributes that contain a BaseSettings instance. It now always reflects the maximum of:
    • the highest priority in the BaseSettings instance
    • the highest priority ever given to SettingsAttribute.set() (even when that priority is higher than anything in BaseSettings)
  3. Updated BaseSettings.update() so it can also deal with JSON-encoded strings, this solves the issue raised by @nramirezuy and avoids downgrading BaseSettings to dicts (losing all per-key priorities) wherever possible
  4. Deprecated scrapy.utils.conf.build_component_list() since the (base, custom) signature is no longer needed. I did not want to reuse the function name though since people might be using it in their extensions.
  5. As replacement, introduced scrapy.utils.conf.build_components() that only takes a single dict, removes all entries containing None as value, then builds the list or returns the dict, depending on the parameter to_list=True.
  6. Removed all references to _BASE settings (and their merging with the non-_BASE setting) in the complete codebase, instead replacing it with calls to the function scrapy.utils.conf._get_composite_setting(settings, settingname). This function, and the calls to it, should only be in the code until we decide to finally remove all support for the _BASE settings. I guess this is not exactly what @curita had hoped for, since we will still have to replace all calls to _get_composite_setting() when we remove _BASE support, and replace them with a simple settings[settingname]. But the alternative (that I could see) would have been to include settings in the signature of build_components(), which seemed not very modular.
  7. Enabled disabling items through setting their dict value to None for all default dict settings (including the default request headers) by placing calls to build_component_list() and remove_none_values() where appropriate.

To do:

  • Update first post with summary of changes
  • Tests
    • Call Settings.update() with json string
    • build_components()
    • Updating default dicts from the command line
  • Check PEP8 compliancy
@curita curita mentioned this pull request May 29, 2015
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Jun 7, 2015

Alright, this PR is now at a stage where I'm fairly happy with it and would remove the WIP tag as soon as I've written/updated the documentation and rebased. Still very open for feedback of course :) I've updated the first post as an overview if you haven't followed this PR.

@curita I made some changes to the _map_keys() function that lives inside of build_component_list() so that it honours per-key priorities. As giving both base and custom becomes deprecated with this PR though, and per-key priorities weren't available before anyways, maybe that's not necessary and just clutters up the code?

self.settings_module = settings_module
Settings.__init__(self, **kw)

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

There is a reason for swapping these two lines?

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 17, 2015
Author Contributor

Yes, took me quite a while to figure out though ;D It became necessary after I moved the dict promotion into __init__(). Settings.__init__() uses __getitem__() during the promotion of default dictionaries to BaseSettings instances, which in turn accesses self.settings_module, so it needs to be defined before calling __init__()

# It's for internal use in the transition away from the _BASE settings and
# will be removed along with _BASE support in a future release
basename = name + "_BASE"
if basename in self.attributes:

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

nitpick: This line should be if basename in self: and other similar references could be replaced too.

warnings.warn('_BASE settings are deprecated.',
category=ScrapyDeprecationWarning)
compsett = BaseSettings(self[name + "_BASE"], priority='default')
compsett.update(self.get(name))

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

nitpick: self[name] instead of self.get(name)

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 17, 2015
Author Contributor

True. I forgot that __getitem__() doesn't throw KeyErrors either. I like all your nitpicks ;)


def maxpriority(self):
if len(self) > 0:
return max(self.getpriority(name) for name in self.attributes)

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

nitpick: for name in self instead of for name in self.attributes. I think there are other similar calls that could be replaced.

if basename in self.attributes:
warnings.warn('_BASE settings are deprecated.',
category=ScrapyDeprecationWarning)
compsett = BaseSettings(self[name + "_BASE"], priority='default')

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

I think getting this setting should be self.getdict(name + "_BASE") instead, the old deprecated _BASE could be a json dict.

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 17, 2015
Author Contributor

Since __init__() calls update(), it can deal with JSON strings

return type(custom)(convert(c) for c in custom)
compdict = BaseSettings(_map_keys(compdict), priority='default')
compdict.update(_map_keys(custom))
# End backwards support

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

I think we should remove support for the old signature, build_component_list is an internal helper for scrapy, there's no need to maintain backward compatibility for it and it makes this function slightly more difficult to understand.

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 19, 2015
Author Contributor

I'll have to update a couple of tests but I like dropping this, too

for k, v in six.iteritems(compdict):
prio = compdict.getpriority(k)
compbs.set(convert(k), v, priority=prio)
return compbs

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

It took me a bit to understand this, I think this implementation is halfway between the old behavior and the new one, though I'm probably missing something.

If you call _check_components in a compdict that is a BaseSettings instance, you're making sure that no keys convert to the same path/value, regardless of their priority. In that case adding them converted into a new BaseSettings shouldn't make any difference since there aren't any clashes, we can stick to {convert(k): v for k, v in six.iteritems(compdict)} (there's no need to return a BaseSettings too).

Another possibility should be to not call _check_components in a BaseSettings, and check manually if there are any two keys that convert to the same path and also have the same priority so we can't tell which should be overridden first (commented in the description of #1267).

It's true that there weren't per-keys priorities before, users working with old paths should get an error if any two keys clash if they switch to scrapy 1.0, but it's probably a nicer deprecation for the old paths to users skipping 1.0 and updating directly to 1.1. If possible I would prefer the second option of checking if any keys that clash have the same priority or not though I think it's not that critical, we can implement the first one and delete this if otherwise.

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 19, 2015
Author Contributor

In that case adding them converted into a new BaseSettings shouldn't make any difference since there aren't any clashes, we can stick to {convert(k): v for k, v in six.iteritems(compdict)} (there's no need to return a BaseSettings too).

Keeping track of the priorities (i.e. returning BaseSettings) was necessary when there were base and custom BaseSettings objects with different priorities. But when we drop support for the (base, custom) signature you're right, returning a simple dict should be fine.

I'll update this and include the edge case of different paths with different priorities

@@ -21,7 +21,8 @@ def _get_mwlist_from_settings(cls, settings):
category=ScrapyDeprecationWarning, stacklevel=1)
# convert old ITEM_PIPELINE list to a dict with order 500
item_pipelines = dict(zip(item_pipelines, range(500, 500+len(item_pipelines))))
return build_component_list(settings['ITEM_PIPELINES_BASE'], item_pipelines)
settings.set('ITEM_PIPELINES', item_pipelines)

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

There are a couple of issues here:

  • There's no guarantee that settings.set('ITEM_PIPELINES', item_pipelines) overrides the stored ITEM_PIPELINES, that settings.set() call should be made with a priority higher than the stored one for that key.
  • We didn't modify the settings before, and not too long ago the settings were frozen at this point, this would have thrown an error. A solution for it could be to make a copy and set ITEM_PIPELINES there.
  • This is incompatible with the goal of allowing addons to modify settings, addons won't be able to add pipelines easily if ITEM_PIPELINES is a list.

My vote is to remove all this deprecation support for lists, it was introduced way too long ago (0.20 release: fc388f4).

This comment has been minimized.

@curita

curita Jun 11, 2015
Member

I recalled that ITEM_PIPELINES can't be a list with the changes in SettingsAttribute.set(), ITEM_PIPELINES has as value a BaseSettings instance, if a user tries to set a list here it's going to throw an error when trying to merge it into those BaseSettings.

This comment has been minimized.

@jdemaeyer

jdemaeyer Jun 19, 2015
Author Contributor

Good point. Providing a list still worked when the and isinstance(value, Mapping) was still present in SettingsAttribute.set(), but I also prefer dropping support for lists, especially as it will lead to unexpected behaviour for add-on developers.

@curita
Copy link
Member

@curita curita commented Jun 11, 2015

I love the unification of the code, really like the design decisions you took there. I pointed out a couple of remaining details but I think the overall functionality is well defined, you should be able to start with the documentation.

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/settings-per-key-priorities branch from 2f288fa to 4192d8b Jun 19, 2015
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Jun 19, 2015

Alright, I think this PR is ready for final review. I've incorporated your recent feedback (nitpicks, BaseSettings handling in build_component_list, remove custom support in build_component_list, remove non-dict ITEM_PIPELINES support), written the documentation and some more tests, rebased into two feature/test/doc-complete commits, and removed the WIP tag. :)

jdemaeyer added a commit to jdemaeyer/scrapy that referenced this pull request Aug 17, 2015
@dangra
Copy link
Member

@dangra dangra commented Aug 19, 2015

+1 to merge, It needs a note about backwards incompatibilities introduced by this PR and how to update users code if possible.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Aug 20, 2015

+1 to merge, It needs a note about backwards incompatibilities introduced by this PR and how to update users code if possible.

I'm happy to add a note, would that go into news.rst? Except for ITEM_PIPELINES being no longer allowed to be a list, which was deprecated since 0.20, there shouldn't be any backwards incompatibilities though. The BaseSettings behave just like dicts, and the _BASE settings still work (and throw a ScrapyDeprecationWarning if used).

@dangra
Copy link
Member

@dangra dangra commented Aug 20, 2015

Removing item pipelines list support is fine and it is not a backward
incompatible change.

The change that worries me a bit is the new behaviour for dictionary
settings whose values are merged instead than replaced

El ago. 20, 2015 7:05, "Jakob de Maeyer" notifications@github.com
escribió:

+1 to merge, It needs a note about backwards incompatibilities introduced
by this PR and how to update users code if possible.

I'm happy to add a note, would that go into news.rst? Except for
ITEM_PIPELINES being no longer allowed to be a list, which was deprecated
since 0.20, there shouldn't be any backwards incompatibilities though.
The BaseSettings behave just like dicts, and the _BASE settings still
work (and throw a ScrapyDeprecationWarning if used).


Reply to this email directly or view it on GitHub
#1149 (comment).

compsett = BaseSettings(self[name + "_BASE"], priority='default')
compsett.update(self[name])
return compsett
else:

This comment has been minimized.

@kmike

kmike Aug 20, 2015
Member

else is not necessary, let's drop it

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 24, 2015
Author Contributor

Now that I've seen it with a little more distance, I think this whole function doesn't really achieve what I had intended.

When users override XY_BASE, they explicitly don't want any of Scrapy's defaults for that component setting. But line 206 pulls Scrapy's defaults (which now live in XY) back in, and even worse overwrites the users XY_BASE settings where they have the same keys (say when the user simply changed some orders).

I guess either line 206 could be changed so that only those keys from XYthat have a priority higher than default are considered, or support for _BASE settings could be dropped altogether with this PR. After all, they have always been marked as "never edit this", the behaviour of the dict-like settings changes with this PR anyways, and we could get rid of this non-public helper function.

def __str__(self):
return str(self.attributes)

__repr__ = __str__

This comment has been minimized.

@kmike

kmike Aug 20, 2015
Member

I think repr should be something like %s(%s) % (self.__class__, self.attributes)

This comment has been minimized.

@kmike

kmike Aug 20, 2015
Member

because otherwise it'd be hard to distinguish Settings objects from regular dicts in console

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 24, 2015
Author Contributor

Agreed, I'll change that

compdict.update(_map_keys(custom))
items = (x for x in six.iteritems(compdict) if x[1] is not None)
return [x[0] for x in sorted(items, key=itemgetter(1))]
def remove_none_values(compdict):

This comment has been minimized.

@kmike

kmike Aug 20, 2015
Member

This utility function looks generic enough for scrapy.utils.conf - what about moving it to scrapy.utils.python?

I expect remove_none_values it to remove values inplace. without_none_values? We can also extend it to handle iterables.

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 24, 2015
Author Contributor

+1 will change/extend

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Aug 26, 2015

Here's a proposed release note:

Dictionary settings are no longer overridden
--------------------------------------------

With this release, Scrapy's dictionary-like settings (e.g. ``ITEM_PIPELINES``,
see full list below) will receive per-key settings priorities (see
:ref:`topics-api-settings`). Internally, this is achieved by promoting them to
:class:`~scrapy.settings.BaseSettings` instances. Instances of this class behave
just like dictionaries, with one notable exception: When being written to, the 
:class:`~scrapy.settings.BaseSettings` instance is *updated*, not overwritten::

    >>> settings.set('ITEM_PIPELINES', {'path.one': 100})
    >>> settings.set('ITEM_PIPELINES', {'path.two': 200})
    >>> print dict(settings['ITEM_PIPELINES'])
    {'path.two': 200, 'path.one': 100}

This facilitates easy enabling, disabling, or configuring of single components.
E.g., if you want to disable the pipeline ``some.pipeline`` from the command
line, it is now sufficient to set the dictionary value for just this pipeline to
``None``::

    scrapy crawl example.com -s ITEM_PIPELINES={"some.pipeline": null}

instead of having to provide the full dictionary of all other enabled pipelines.

This update applies to all dictionary-like settings that have a default value,
namely:

* :setting:`DEFAULT_REQUEST_HEADERS`,
* :setting:`DOWNLOAD_HANDLERS`,
* :setting:`DOWNLOADER_MIDDLEWARES`,
* :setting:`EXTENSIONS`,
* :setting:`FEED_STORAGES`,
* :setting:`FEED_EXPORTERS`,
* :setting:`ITEM_PIPELINES`,
* :setting:`SPIDER_MIDDLEWARES`, and
* :setting:`SPIDER_CONTRACTS`.

If your code relies on completely replacing, not updating, one of these
settings, you can work around the new behaviour by deleting the dictionary-like
setting before writing to it::

    >>> del settings['SPIDER_MIDDLEWARES']
    >>> settings.set('SPIDER_MIDDLEWARES', {'my.middleware': 100})
    >>> print dict(settings['SPIDER_MIDDLEWARES'])
    {'my.middleware': 100}

However, keep in mind that Scrapy's standard procedure to disable components
from the dictionary-like settings is to set their value to ``None``.


Deprecation of ``_BASE`` settings
---------------------------------

The new update-not-overwrite behaviour of Scrapy's dictionary settings (see
above) renders the ``_BASE`` settings that were previously used to store
Scrapy's defaults (e.g. `DOWNLOADER_MIDDLEWARES_BASE`) obsolete.
Consequentially, Scrapy's defaults have been moved into the regular settings
(e.g. :setting:`DOWNLOADER_MIDDLEWARES`).

While setting the ``XY_BASE`` setting will still work, please update your code
to override (i.e. update) the ``XY`` setting instead, and set the value of the
default components that you wish to disable to ``None``.
@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/settings-per-key-priorities branch 2 times, most recently from 03349ff to d9577da Oct 27, 2015
@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/settings-per-key-priorities branch from d9577da to 03f1720 Oct 27, 2015
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Oct 27, 2015

Rebased onto current master and updated the function that handles backwards-compatibility for users who explicitly set _BASE settings:

They will now find the expected behaviour that they get none of Scrapy's defaults for each setting where they manually have set a _BASE setting. E.g. doing this:

# settings.py
DOWNLOADER_MIDDLEWARES_BASE = {}

will now result in no downloader middlewares being enabled (and a warning that _BASE settings are deprecated).

Not sure about the codecov test failing. It says only 95 % of this diff are hit because there are a couple of places that weren't covered before but where I switched the syntax to use the new helpers, e.g. like this:

-            valid_output_formats = (
-                list(self.settings.getdict('FEED_EXPORTERS').keys()) +
-                list(self.settings.getdict('FEED_EXPORTERS_BASE').keys())
-            )
+            feed_exporters = without_none_values(self.settings._getcomposite('FEED_EXPORTERS'))
+            valid_output_formats = feed_exporters.keys()

These should definitely be covered but I don't think it belongs into this PR.

dangra added a commit that referenced this pull request Oct 28, 2015
…priorities

[MRG+1] Per-key priorities for dict-like settings by promoting dicts to Settings instances
@dangra dangra merged commit dd9f777 into scrapy:master Oct 28, 2015
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 95.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kmike kmike mentioned this pull request Oct 30, 2015
@redapple redapple added this to the Scrapy 1.1 milestone Jan 25, 2016
@redapple
Copy link
Contributor

@redapple redapple commented Jan 25, 2016

Removed "backward-incompatible" tag after #1586 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.