-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilesDownloader with GCS downloading updtodate files again #4346
Comments
The pipeline just relies on the |
Well I have checked on an example, and this date seemed right to me |
@lblanche how will it be the easiest way to check this? Should I just create an empty bucket and put it into the spider settings? |
@wRAR Yes thats what I did |
Hello, can I work on this issue? |
@michalp2213 sure |
I cannot reproduce this bug. @lblanche, are you sure you set up permissions for the bucket correctly? The very first time I've tried reproducing it I got a setup where the service account I used had write permissions, but for some reason calling |
Thanks @michalp2213 for the investigation and the fix! |
@lblanche: if you have a chance, please test this again with the |
Unfortunately I do not have access to this project anymore. I think I trust @michalp2213 answer |
Description
It seems that when using Google Cloud Storage, the Files pipeline does not have the expected behavior regarding up to date files.
Steps to Reproduce
git clone https://github.com/QYQ323/python.git
scrapy crawl examples
FILE_STORE
insettings.py
to a gcs bucketFILES_STORE = 'gs://mybucket/'
Expected behavior:
Files should not be downloaded again when running the spider consecutively. If a file is allready on GCS (same folder), it should not be downloaded (provided it was uploaded less than 90 days ago)
Actual behavior:
Everytime the spider is launched every file is downloaded again.
Reproduces how often: 100%
Versions
Scrapy : 1.8.0
lxml : 4.5.0.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.5.2
w3lib : 1.21.0
Twisted : 19.10.0
Python : 3.8.1 (default, Jan 8 2020, 16:15:59) - [Clang 4.0.1 (tags/RELEASE_401/final)]
pyOpenSSL : 19.1.0 (OpenSSL 1.1.1d 10 Sep 2019)
cryptography : 2.8
Platform : macOS-10.15.3-x86_64-i386-64bit
The text was updated successfully, but these errors were encountered: