Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add filewalks #32

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Oct 24, 2020

migrated from nim-lang/Nim#15598 (refs nim-lang/Nim#15598 (comment))

links

@juancarlospaco
Copy link
Contributor

But you already have globs on stdlib, I think we should stop adding 10 walks to Nim,
try to fix stdlib globs instead please and do not add even more, DRY,
ideally we should have 2 or 3 walks max but that work properly, even if it needs to have when defined(): blocks.

Code looks Ok but still...

@ghost
Copy link

ghost commented Oct 27, 2020

@juancarlospaco no, you missed the discussion, addition of globs to the stdlib was rejected because for this module it makes more sense to put it into Fusion. See nim-lang/Nim#15598

@juancarlospaco
Copy link
Contributor

@Yardanico https://nim-lang.github.io/Nim/globs.html

@ghost
Copy link

ghost commented Oct 27, 2020

@juancarlospaco seems like a bug though, it wasn't merged. You can see that it's not in https://github.com/nim-lang/Nim/tree/devel/lib/std

@timotheecour
Copy link
Member Author

timotheecour commented Oct 27, 2020

@Yardanico nim-lang.github.io/Nim/globs.html

this is from lib/std/private/globs.nim which I had added in nim-lang/Nim#14501 for kochdocs, it's in private hence not intended for public use, just like since, etc. nim docs should make that clear (still useful to show docs for those, just like we have docs for compiler).

But you already have globs on stdlib, I think we should stop adding 10 walks to Nim,

since it's not public and I started relying on it from outside (in fusion in #24 do add doc generation to fusion) as well as for future intended use, it was time to make it a proper module, hence my original attempt to add it to stdlib here nim-lang/Nim#15598. Note that making it a separate nimble project (or using pkg/globs) would not allow using it in either kochdocs nor from within fusion.

I wrote this because the existing walkDirs, walkDir, walkDirRec are not flexible enough, in particular walkDirRec doesn't allow efficiently traversing large directory structures with a filter (a very common use case), and lack important options commonly found in other libraries or tools (eg linux find program) and its API is lacking (eg yielded paths don't specify their kind). It would also allow simplifying code and avoiding common pitfalls when re-implementing such functionality in other tools eg see nimgrep PR nim-lang/Nim#15612 (comment)

See also design discussion and comparison against pkg/globs in nim-lang/Nim#15598, it was designed with pkg/globs in hindsight.

walkFiles is designed to be both simple to use yet general and flexible enough to fit most use cases thanks to optional callbacks.

@timotheecour
Copy link
Member Author

timotheecour commented Nov 13, 2020

ping @Araq ; generally useful and would help with things like nimgrep (eg nim-lang/Nim#15962 (comment)) which now has to change to slightly slower closure iterators to avoid code bloat from multiple yields

as well as in other places, eg would avoid the ugly import std/private/globs from within fusion which i did as a least worst option in #24

@juancarlospaco
Copy link
Contributor

LGTM 👍

@timotheecour
Copy link
Member Author

PTAL, rebased for conflict bitrot

src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
Copy link
Member Author

@timotheecour timotheecour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Outdated Show resolved Hide resolved
src/fusion/filewalks.nim Show resolved Hide resolved
@timotheecour
Copy link
Member Author

replying to @Araq nim-lang/Nim#16709 (comment)

What I dislike about your walkPathsOpt is the callbacks, iterators that require callbacks are alien.

there is precedent in stdlib, eg sequtils.filter: iterator filter*[T](s: openArray[T], pred: proc(x: T): bool {.closure.}): T =, and I don't see what's wrong with callbacks in iterators.

Filtering should always be an if inside the iteration

that's not what follow: FollowCallback is about. follow controls whether to recurse down a directory, not whether to yield an entry. You cannot emulate this in user code in iterator body, eg:

for e in walkPaths(src):
  if e.path.lastPathPart != ".git": doSomeWork(e)

this would be very inefficient as it'd recurse down ".git" only to be filtered out later by the iterator body.

SortCmpCallback also cannot be emulated in user code in the iteration body, nor can the BFS vs DFS iteration order.

I intentionally did not include a yieldFilter because leaf-level filtering should be in iteration body, see this comment in the PR:

* a yieldFilter, regex match etc isn't needed because caller can filter at
  call site, without loss of generality, unlike `follow`; this simplifies the API.

, else you might as well make the iterator a proc that takes callbacks.

that would give a crippled, foreign-looking API, that doesn't interoperate with stdlib, eg:
with this PR, this works:

import std/[sequtils,sugar]
let s1 = toSeq(walkPaths(dir))
let s2 = collect: (for e in walkPaths(dir): e.path)

with a proc, this wouldn't work. Likewise, iterator allows the usual break/continue/return/yield control flow, a proc doesn't (nim-lang/Nim#15655 is a separate topic).

Also, the new better walkPathsOpt should be in a module of its own.

that's already the case in this PR (and was also the case in the stdlib version of this PR nim-lang/Nim#15598)

@juancarlospaco
Copy link
Contributor

So... how this is going ?.

@timotheecour
Copy link
Member Author

So... how this is going ?.

@Araq this should be moved back to a PR against stdlib (as was intially the case refs nim-lang/Nim#15598). This is 100% stdlib territory, general purpose enough, and would benefit existing code in tools/ and elsewhere that reimplement this (poorly, with duplicate yield and worse API)

The main thing missing in this API is resumable error handling (via a callback to handle IO errors), but this can be done in followup work.

@Araq
Copy link
Member

Araq commented Apr 30, 2021

The API is alien, we use named parameters instead of the "object builder" pattern. It's also not clear to me why "sorting" is important, looking at https://docs.python.org/3/library/glob.html for a comparison I see none of these shenanigans. Also, ideally new APIs embrace the AbsoluteFile/Dir/etc design, I'm tired of type-unsafe path handling code.

The main thing missing in this API is resumable error handling (via a callback to handle IO errors), but this can be done in followup work.

For me it's the one problem of the old API that is worth solving.

@timotheecour
Copy link
Member Author

timotheecour commented May 18, 2021

just replying on this point for now, will followup with other points later:

It's also not clear to me why "sorting" is important

because it's easy to support efficiently in the iterator (as i did) with O(N) space where N=max entries per dir, impossible to support efficiently in the callee (it'd require O(P) where P = total number of files/dirs listed recursively), and most of all because it's useful; a quick google search reveals people keep searching for this (with bad solutions, involving having to first get all results and then sort)

here's a post from today:
https://discord.com/channels/371759389889003530/371759389889003532/844205260049612820

Clonkk[Matrix]
BOT
 — Today at 6:30 AM
Is it possible to order walkDir alphabetically ?
Vindaar — Today at 6:32 AM
I suppose no, because it's probably yields by inode number or something like this. I guess you will have to store the files in a temp seq and sort that instead

(although that person talks about walkDir, where caller can easily solve this with a toSeq.sorted, the main problem is recursive listing)

@Araq
Copy link
Member

Araq commented May 19, 2021

because it's easy to support efficiently in the iterator (as i did) with O(N) space where N=max entries per dir, impossible to support efficiently in the callee

That's convincing, thanks. However, we could simply always enable it and document that it does sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants