Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for windows absolute pathlib.Path objects in FeedExporter #5939

Merged
merged 9 commits into from Jun 22, 2023

Conversation

alexpdev
Copy link
Contributor

This PR is related to #5739 as well as some conflicting information in the docs.

On the Feed Exports page, in the FEEDS subsection it mentions that you can use a URI or a pathlib.Path object as the target.

FEEDS
-----
.. versionadded:: 2.1
Default: ``{}``
A dictionary in which every key is a feed URI (or a :class:`pathlib.Path`
object) and each value is a nested dictionary containing configuration
parameters for the specific feed.
This setting is required for enabling the feed export feature.

Earlier on the same page however under the Storage Backends section it mentions that you can use a standard string path as long as its an absolute path, but only on Unix systems.

Local filesystem
----------------
The feeds are stored in the local filesystem.
- URI scheme: ``file``
- Example URI: ``file:///tmp/export.csv``
- Required external libraries: none
Note that for the local filesystem storage (only) you can omit the scheme if
you specify an absolute path like ``/tmp/export.csv``. This only works on Unix
systems though.

Both of these statements are accurate however, if one were to miss the second statement and only read the first you might think, as I did, that using an absolute pathlib.Path object on windows would work fine, when in fact it doesn't. For example using a Path(__file__).parent / "items.json" fails when used on windows OS.

My proposed solution to this, which is implemented in this PR is to simply get rid of the "only on Unix" caveat and make it work for windows as well. I achieved this by adding a conditional statement in the FeedExporter.__init__ method that tests if the uri used in the settings is a Path object, and if it is then to convert it to a filesystem uri.

I also included an accompanying unit test that would fail prior to the changes.

@codecov
Copy link

codecov bot commented May 26, 2023

Codecov Report

Merging #5939 (584ab43) into master (1b78e48) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 584ab43 differs from pull request most recent head 30c2693. Consider uploading reports for the commit 30c2693 to get more accurate results

@@           Coverage Diff           @@
##           master    #5939   +/-   ##
=======================================
  Coverage   88.88%   88.88%           
=======================================
  Files         162      162           
  Lines       11243    11244    +1     
  Branches     1826     1826           
=======================================
+ Hits         9993     9994    +1     
  Misses        965      965           
  Partials      285      285           
Impacted Files Coverage Δ
scrapy/extensions/feedexport.py 95.82% <100.00%> (+0.01%) ⬆️

@wRAR
Copy link
Member

wRAR commented May 26, 2023

Nice, should the doc be updated though?

@alexpdev
Copy link
Contributor Author

alexpdev commented May 26, 2023

Nice, should the doc be updated though?

I think both of those statements are still True. Because you still cannot use a string absolute path on windows. This only allows for using the absolute pathlib.Path. Maybe there is a way that I could allow for string absolute window paths as well. I will check.


Edit: I did some experimenting and was unable to come up with a way to support a string absolute path on windows. However I did make changes in the documentation that clarifies that using the a pathlib.Path object is supported.

Let me know if I should make any further changes.

@Gallaecio
Copy link
Member

Thanks!

@Gallaecio Gallaecio merged commit 04ee330 into scrapy:master Jun 22, 2023
42 checks passed
@alexpdev alexpdev deleted the feedexport_windows_pathlib_#5739 branch June 22, 2023 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants