Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add status (downloaded, uptodate) to files information #4486

Merged
merged 2 commits into from May 18, 2020
Merged

Add status (downloaded, uptodate) to files information #4486

merged 2 commits into from May 18, 2020

Conversation

ilias-ant
Copy link
Contributor

@ilias-ant ilias-ant commented Apr 12, 2020

  • adds file status to the downloaded files information, as emitted by media pipelines.
  • updates the documentation accordingly.

File status can be a very useful metric to many Scrapy-based applications, especially the ones that heavily rely on FilesPipeline or ImagesPipeline. This extra information available can be utilized with various (and creative) ways, enabling the end-user to have more control to his file processing algorithms, creating more synergy with other features such as file expiration policy or file_status_count statistics.

fixes #2893

@ilias-ant ilias-ant changed the title Add status (downloaded, uptodate) to files information #2893 Add status (downloaded, uptodate) to files information Apr 12, 2020
@Gallaecio
Copy link
Member

Gallaecio commented Apr 15, 2020

Do you think you could add tests that cover the usage of the other two statuses (or modify existing tests to include checks for them, if there are tests where they may be already covered but not checked through an assert)?

@ilias-ant
Copy link
Contributor Author

ilias-ant commented Apr 15, 2020

Gladly! I'll need a bit of time to familiarize myself with the respective test suite and find a proper way to set up the necessary logic to "create" uptodate, cached images - any quick general suggestion on that? :)

@Gallaecio
Copy link
Member

Gallaecio commented Apr 15, 2020

If it can be done with a bit of mock usage, I would go that way.

Worst case scenario, for complex tests that require running spiders, we have test spiders that are run with subprocess (see, for example, test_crawler.py). I guess that is also a possibility. But I would try to avoid having to setup some mock server if possible.

@codecov
Copy link

codecov bot commented Apr 19, 2020

Codecov Report

Merging #4486 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4486      +/-   ##
==========================================
- Coverage   84.79%   84.79%   -0.01%     
==========================================
  Files         164      164              
  Lines        9887     9880       -7     
  Branches     1468     1464       -4     
==========================================
- Hits         8384     8378       -6     
+ Misses       1248     1246       -2     
- Partials      255      256       +1     
Impacted Files Coverage Δ
scrapy/pipelines/files.py 61.66% <100.00%> (ø)
scrapy/utils/trackref.py 82.85% <0.00%> (-2.86%) ⬇️
scrapy/core/downloader/handlers/http11.py 92.59% <0.00%> (-0.32%) ⬇️
scrapy/commands/fetch.py 89.13% <0.00%> (-0.24%) ⬇️
scrapy/linkextractors/lxmlhtml.py 92.10% <0.00%> (-0.21%) ⬇️
scrapy/core/downloader/webclient.py 97.95% <0.00%> (-0.03%) ⬇️
scrapy/item.py 98.57% <0.00%> (ø)
scrapy/commands/parse.py 20.21% <0.00%> (ø)
scrapy/pipelines/media.py 97.29% <0.00%> (ø)
scrapy/http/request/__init__.py 100.00% <0.00%> (ø)
... and 3 more

@ilias-ant
Copy link
Contributor Author

ilias-ant commented Apr 19, 2020

I have covered all 3 available status(es):

  • by enriching existing tests for the cases of downloaded, uptodate.
  • by creating a new test for the case of cached, providing a suitable mock response. I've covered this case in the most "unittest" case possible, as i couldn't find a way to produce a more "functional" test.

I'd be glad to hear any additional feedback on this, in case i missed or misinterpreted something from the earlier comments :)

@kmike kmike merged commit febe82a into scrapy:master May 18, 2020
2 checks passed
@kmike
Copy link
Member

kmike commented May 18, 2020

Thanks @ilias-ant!

@kmike kmike added this to the v2.2 milestone May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants