New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] [image_pipeline] bring back uppercase class attributes #1989
Changes from 11 commits
4cef1a1
6c67db3
a62d4b0
d715172
ee39d11
72e4d5f
c6d1686
acbfdc6
539d34b
10b79c9
fa4d0cd
c22cc10
9818c97
ceecf3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -191,6 +191,11 @@ For the Images Pipeline, set :setting:`IMAGES_URLS_FIELD` and/or | |
If you need something more complex and want to override the custom pipeline | ||
behaviour, see :ref:`topics-media-pipeline-override`. | ||
|
||
If you have multiple image pipelines inheriting from ImagePipeline and you want to have different settings in different pipelines | ||
you can set setting keys preceded with uppercase name of your pipeline class. E.g. if your pipeline is called | ||
MyPipeline and you want to have custom IMAGES_URLS_FIELD you define setting MYPIPELINE_IMAGES_URLS_FIELD and your custom | ||
settings will be used. | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it would be valid to add something like "Otherwise the custom pipeline will inherit settings values from default pipeline." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this follows from context, but probably wont harm to add this. |
||
Additional features | ||
=================== | ||
|
@@ -214,6 +219,14 @@ specifies the delay in number of days:: | |
|
||
The default value for both settings is 90 days. | ||
|
||
If you have pipeline that subclasses FilesPipeline and you'd like to have different setting | ||
for it you can set setting keys preceded by uppercase class name. E.g. given pipeline class | ||
called MyPipeline you can set setting key: | ||
|
||
MYPIPELINE_FILES_EXPIRES = 180 | ||
|
||
and pipeline class MyPipeline will have expiration time set to 180. | ||
|
||
.. _topics-images-thumbnails: | ||
|
||
Thumbnail generation for images | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -214,23 +214,37 @@ class FilesPipeline(MediaPipeline): | |
""" | ||
|
||
MEDIA_NAME = "file" | ||
EXPIRES = 90 | ||
STORE_SCHEMES = { | ||
'': FSFilesStore, | ||
'file': FSFilesStore, | ||
's3': S3FilesStore, | ||
} | ||
DEFAULT_FILES_URLS_FIELD = 'file_urls' | ||
DEFAULT_FILES_RESULT_FIELD = 'files' | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does not fix the backward incompatibility (#1985). It must be Here arises one question: should both (e.g. I think both should be present in order to totally fix the backward incompatibility: DEFAULT_FILES_URLS_FIELD = 'file_urls'
FILES_URLS_FIELD = DEFAULT_FILES_URLS_FIELD
DEFAULT_FILES_RESULT_FIELD = 'files'
FILES_RESULT_FIELD = DEFAULT_FILES_RESULT_FIELD = 'files' Note: same in ImagePipeline. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
def __init__(self, store_uri, download_func=None, settings=None): | ||
if not store_uri: | ||
raise NotConfigured | ||
|
||
if isinstance(settings, dict) or settings is None: | ||
settings = Settings(settings) | ||
|
||
|
||
cls_name = "FilesPipeline" | ||
self.store = self._get_store(store_uri) | ||
self.expires = settings.getint('FILES_EXPIRES') | ||
self.files_urls_field = settings.get('FILES_URLS_FIELD') | ||
self.files_result_field = settings.get('FILES_RESULT_FIELD') | ||
self.expires = settings.getint( | ||
self._key_for_pipe('FILES_EXPIRES', cls_name), self.EXPIRES | ||
) | ||
if not hasattr(self, "FILES_URLS_FIELD"): | ||
self.FILES_URLS_FIELD = self.DEFAULT_FILES_URLS_FIELD | ||
if not hasattr(self, "FILES_RESULT_FIELD"): | ||
self.FILES_RESULT_FIELD = self.DEFAULT_FILES_RESULT_FIELD | ||
self.files_urls_field = settings.get( | ||
self._key_for_pipe('FILES_URLS_FIELD', cls_name), self.FILES_URLS_FIELD | ||
) | ||
self.files_result_field = settings.get( | ||
self._key_for_pipe('FILES_RESULT_FIELD', cls_name), self.FILES_RESULT_FIELD | ||
) | ||
|
||
super(FilesPipeline, self).__init__(download_func=download_func) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,25 +38,53 @@ class ImagesPipeline(FilesPipeline): | |
|
||
MEDIA_NAME = 'image' | ||
|
||
# Uppercase attributes kept for backward compatibility with code that subclasses | ||
# ImagesPipeline. They may be overridden by settings. | ||
MIN_WIDTH = 0 | ||
MIN_HEIGHT = 0 | ||
EXPIRES = 0 | ||
THUMBS = {} | ||
DEFAULT_IMAGES_URLS_FIELD = 'image_urls' | ||
DEFAULT_IMAGES_RESULT_FIELD = 'images' | ||
|
||
def __init__(self, store_uri, download_func=None, settings=None): | ||
super(ImagesPipeline, self).__init__(store_uri, settings=settings, download_func=download_func) | ||
|
||
if isinstance(settings, dict) or settings is None: | ||
settings = Settings(settings) | ||
|
||
self.expires = settings.getint('IMAGES_EXPIRES') | ||
self.images_urls_field = settings.get('IMAGES_URLS_FIELD') | ||
self.images_result_field = settings.get('IMAGES_RESULT_FIELD') | ||
self.min_width = settings.getint('IMAGES_MIN_WIDTH') | ||
self.min_height = settings.getint('IMAGES_MIN_HEIGHT') | ||
self.thumbs = settings.get('IMAGES_THUMBS') | ||
cls_name = "ImagesPipeline" | ||
self.expires = settings.getint( | ||
self._key_for_pipe('IMAGES_EXPIRES', cls_name), self.EXPIRES | ||
) | ||
if not hasattr(self, "IMAGES_RESULT_FIELD"): | ||
self.IMAGES_RESULT_FIELD = self.DEFAULT_IMAGES_RESULT_FIELD | ||
if not hasattr(self, "IMAGES_URLS_FIELD"): | ||
self.IMAGES_URLS_FIELD = self.DEFAULT_IMAGES_URLS_FIELD | ||
|
||
default_images_urls_field = getattr(self, "IMAGES_URLS_FIELD", "DEFAULT_IMAGES_URLS_FIELD") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point. This would never happen because this attribute is set one line above. It was dead code. I removed that. |
||
self.images_urls_field = settings.get( | ||
self._key_for_pipe('IMAGES_URLS_FIELD', cls_name), default_images_urls_field | ||
) | ||
default_images_result_field = getattr(self, "IMAGES_RESULT_FIELD", "DEFAULT_IMAGES_RESULT_FIELD") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the same issue, "DEFAULT_IMAGES_RESULT_FIELD" shouldn't be a default value |
||
self.images_result_field = settings.get( | ||
self._key_for_pipe('IMAGES_RESULT_FIELD', cls_name), default_images_result_field | ||
) | ||
self.min_width = settings.getint( | ||
self._key_for_pipe('IMAGES_MIN_WIDTH', cls_name), self.MIN_WIDTH | ||
) | ||
self.min_height = settings.getint( | ||
self._key_for_pipe('IMAGES_MIN_HEIGHT', cls_name), self.MIN_HEIGHT | ||
) | ||
self.thumbs = settings.get( | ||
self._key_for_pipe('IMAGES_THUMBS', cls_name), self.THUMBS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd cut some code and define resolve = partial(self._key_for_pipe, base_class_name=cls_name) and then self.thumbs = settings.get(resolve('IMAGES_THUMBS'), self.THUMBS) |
||
) | ||
|
||
@classmethod | ||
def from_settings(cls, settings): | ||
s3store = cls.STORE_SCHEMES['s3'] | ||
s3store.AWS_ACCESS_KEY_ID = settings['AWS_ACCESS_KEY_ID'] | ||
s3store.AWS_SECRET_ACCESS_KEY = settings['AWS_SECRET_ACCESS_KEY'] | ||
|
||
store_uri = settings['IMAGES_STORE'] | ||
return cls(store_uri, settings=settings) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,22 @@ def __init__(self, spider): | |
def __init__(self, download_func=None): | ||
self.download_func = download_func | ||
|
||
|
||
def _key_for_pipe(self, key, base_class_name): | ||
""" | ||
Allow setting settings for user defined MediaPipelines that inherit from base. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function doesn't allow anything by itself; I think this comment belongs either to docs or to the caller code. Also, the sentence is quite hard to read, the example below is much more clear :) Maybe it makes sense to just expand the example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A doctest can be a good way to demonstrate what the function does. |
||
|
||
User can define setting key: | ||
|
||
MYPIPELINENAME_IMAGE_SETTING_NAME = <some value> | ||
|
||
and it will override default settings and class attributes. | ||
""" | ||
class_name = self.__class__.__name__ | ||
if class_name == base_class_name: | ||
return key | ||
return "{}_{}".format(class_name.upper(), key) | ||
|
||
@classmethod | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yeah this is why I placed it there. I see no harm in keeping it in MediaPipeline, and there is potential benefit in the future so I think it's ok to keep it there. |
||
def from_crawler(cls, crawler): | ||
try: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use 80 columns limit?