Skip to content

Support Path Objects Issue #5739 #5801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 4, 2023
Merged

Conversation

alexpdev
Copy link
Contributor

Located a couple of places where using Path object raises an exception. Relates to Issue #5739.

The issue can be reproduced by using a Path object in the "IMAGES_STORE" or "FILES_STORE" settings when configuring a file or image pipeline.

Code to reproduce:

from pathlib import Path
import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    custom_settings = {
        "IMAGES_URLS_FIELD": 'image_urls',
        "IMAGES_RESULT_FIELD": 'images',
        "IMAGES_STORE": Path.home() / 'Images',   # <-- here
        "ITEM_PIPELINES": {
            "scrapy.pipelines.images.ImagesPipeline": 1,
        }
    }

The unit tests I added will also trigger the error when tested without the changes.

And the exception that is raised:

2023-01-15 07:18:46 [twisted] CRITICAL:
Traceback (most recent call last):
  File "...\venv\lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)

  ...
  File "...\venv\lib\site-packages\scrapy\pipelines\files.py", line 383, in _get_store
    return store_cls(uri)
  File "...\venv\lib\site-packages\scrapy\pipelines\files.py", line 43, in __init__
    if '://' in basedir:
TypeError: argument of type 'WindowsPath' is not iterable

@codecov
Copy link

codecov bot commented Jan 25, 2023

Codecov Report

Merging #5801 (7268f08) into master (2b3a8f0) will decrease coverage by 0.16%.
The diff coverage is 100.00%.

❗ Current head 7268f08 differs from pull request most recent head 59f5250. Consider uploading reports for the commit 59f5250 to get more accurate results

@@            Coverage Diff             @@
##           master    #5801      +/-   ##
==========================================
- Coverage   88.95%   88.79%   -0.16%     
==========================================
  Files         162      162              
  Lines       11011    11016       +5     
  Branches     1796     1796              
==========================================
- Hits         9795     9782      -13     
- Misses        938      954      +16     
- Partials      278      280       +2     
Impacted Files Coverage Δ
scrapy/pipelines/files.py 71.90% <100.00%> (+0.47%) ⬆️
scrapy/robotstxt.py 75.30% <0.00%> (-13.59%) ⬇️
scrapy/utils/test.py 60.81% <0.00%> (-5.41%) ⬇️
scrapy/core/downloader/__init__.py 90.97% <0.00%> (-1.51%) ⬇️
scrapy/core/http2/stream.py 91.32% <0.00%> (-0.58%) ⬇️

@wRAR wRAR closed this Feb 14, 2023
@wRAR wRAR reopened this Feb 14, 2023
Copy link
Member

@wRAR wRAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@wRAR wRAR merged commit 88eec7a into scrapy:master Mar 4, 2023
@alexpdev alexpdev deleted the path_object_error_#5739 branch March 5, 2023 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants