Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace os.path with pathlib? #4916

Closed
eumiro opened this issue Dec 5, 2020 · 2 comments
Closed

Replace os.path with pathlib? #4916

eumiro opened this issue Dec 5, 2020 · 2 comments

Comments

@eumiro
Copy link

eumiro commented Dec 5, 2020

Since scrapy is a Python 3.6+ project, it could use the pathlib.Path objects and methods to manipulate directories and files instead of the string manipulators in os.path.* and gain a lot of code readability. I have converted most of the code of scrapy to pathlib.Path, but a single PR would be too complex to review. Are you interested in this change? Would you review my PRs step by step? Any ideas?

@nyov
Copy link
Contributor

nyov commented Dec 13, 2020

From what it looks like to me, much of scrapy's pathlib usage would just be turning strings into Path objects, only to join() them and turn them back into strings - which wouldn't cater to the benefits of Path objects at all.
But I could be wrong. Since you say you already converted code, you probably know better than me if that perception is false.

But to answer your question - I guess I wouldn't care, as long as:

  • it's not degrading codebase readability -- nice if you say it would actually enhance it by some margin
  • it's not using Path objects in a way or place where it would impact performance or memory footprint of a running crawler negatively, compared to existing code
  • it's not changing functionality; such as creating absolute paths (Path.resolve()), when the code didn't use or require absolute paths before

Further, I don't think a single PR should be too complex for review. It would be nice instead to see all the repeating patterns together, to find refactoring opportunities. To me, that looks like the single big benefit of doing this change: you might find repeating patterns like makedir_if_not_exists(mydir) to replace os.path.exists/os.makedirs combos, etc.
But of course you can split the changes over several commits to the PR branch (or "patch-queue", if you like), to make it more digestible.

Some people depending on github web reviews might be unable to review more complex patchsets, I'm aware; but we've had a few big ones for PRs in the past, and I believe scrapy's maintainers do have the skill to locally check out patches and use QoL tools like hub/gh/... for code reviews. So this shouldn't pose a limitation for you, thankfully.
If you look at PR's like #1149 or #1586, you'll see how they've grown over time (and I'm just picking some old ones so they don't get confused by the unrelated mention).

@Gallaecio
Copy link
Member

#5682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants