-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document FilesPipeline.file_path and ImagesPipeline.file_path #3609
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3609 +/- ##
==========================================
- Coverage 84.72% 84.71% -0.02%
==========================================
Files 168 168
Lines 9460 9460
Branches 1407 1407
==========================================
- Hits 8015 8014 -1
Misses 1188 1188
- Partials 257 258 +1
|
docs/topics/media-pipeline.rst
Outdated
You can override this method to customize the download path of each file. | ||
|
||
For example, to download all files into the ``files`` folder with their | ||
original filenames:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be a bit more clear on that it is just an example, which works for a certain kind of URLs - http://example.com/foo.png
would be stored properly, but e.g. all files like http://example.com/file?name=foo.png
would be stored as "file", overwriting each other. Possible solutions:
- say that it is an example, which works on websites where file URLs follow certain rules
- make it more robust (but less readable) by adding full URL hash to the path, in addition to guessed file name.
A full implementation should probably also check Content-Disposition header, to get a file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve gone with option 1, and also tried to make it obvious that the example implementation does not take subdirectories into account either.
docs/topics/media-pipeline.rst
Outdated
import os | ||
from urllib.parse import urlparse | ||
|
||
def file_path(self, request, response, info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I found this definition wouldn't work, without assigning default values to the keyword arguments response
and info
:
def file_path(self, request, response, info): | |
def file_path(self, request, response=None, info=None): |
Just like in:
scrapy/scrapy/pipelines/files.py
Line 460 in b364bfb
def file_path(self, request, response=None, info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll look into it in detail later, but if I recall right those two fields are only needed when also using some deprecated methods in subclasses. Are you using the latest Scrapy version? (1.6.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified, that is only needed to support subclasses of these pipelines that still use the deprecated file_key
method.
f2f3234
to
0802375
Compare
docs/topics/media-pipeline.rst
Outdated
from urllib.parse import urlparse | ||
|
||
def file_path(self, request, response, info): | ||
return 'files/' + os.path.basename(urlparse(request.url).path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it a bit more clear, could you please add some boilerplate, e.g.
# ...
class MyFilesPipeline(FilesPipeline):
# ...
def file_path(self, request, response, info):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve gone with the whole implementation, as it only required a few more lines.
Looks good, thanks @Gallaecio! |
Fixes #2253