Skip to content

crawl -o and -O options does not work with absolute paths on Windows #5969

Closed
@Prometheus3375

Description

@Prometheus3375

Absolute paths on Windows has : as a part of disk names which leads to unexpected behaviour.

Steps to Reproduce

Case 1: run command scrapy crawl spider_name -O "C:\path\to\output.json"

Case 2: run command scrapy crawl spider_name -O "C:\path\to\output.json:json"

Expected behavior

For both commands the results of the crawling must be saved to C:\path\to\output.json file.

Actual behavior

Case 1: error

Unrecognized output format '\path\to\output.json'. Set a supported one (('json', 'jsonlines', 'jsonl', 'jl', 'csv', 'xml', 'marshal', 'pickle')) after a colon at the end of the output URI (i.e. -o/-O <URI>:<FORMAT>) or as a file extension.

Case 2: no errors, but file is not created. Note: in this example I use system disk, although file is not created if any other disk is used (i.e., permissions in not the issue).

Reproduces how often

Always

Versions

Scrapy       : 2.9.0
lxml         : 4.9.2.0
libxml2      : 2.9.12
cssselect    : 1.2.0
parsel       : 1.8.1
w3lib        : 2.1.1
Twisted      : 22.10.0
Python       : 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
pyOpenSSL    : 23.2.0 (OpenSSL 3.1.1 30 May 2023)
cryptography : 41.0.1
Platform     : Windows-10-10.0.19044-SP0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions