Design and refactoring / general code quality #382

alexmaco · 2019-01-04T17:50:10Z

This is intended as a tracking issue for general code quality and design discussions. @sharkdp feel free to close this when you think quality has reached an overall satisfactory level.

To start, I'll quote a recent comment by @sharkdp from a discussion on a PR thread:

There are many things I have in mind that would hopefully help to improve the code quality:

Group related functions into impl blocks of appropriate structs such that we don't have to pass around so many parameters.

Think about a unified interface for filters (file type, file size, modification data, file extension, ..).
Maybe a trait-based interface where the filters would be stored in a Vec<Box<dyn Filter>>.

Extract some of the functionality like the OsStr-based path handling into separate libraries / crates.

The text was updated successfully, but these errors were encountered:

alexmaco · 2019-01-04T18:09:30Z

First off, I think I can poke at the dyn Filter idea. This will probably require 2 traits, a Filter and a MetadataFilter, the idea being that the dyn Filter ones will be called first, and then, if necessary, the metadata obtained and the dyn MetadataFilter ones called until one fails.

What I'm unsure about, is the performance aspect. A quick look around makes me suspect that, although virtual call penalty is negligible on modern/desktop CPUs with indirect branch predictors, smaller CPUs, including Atom, don't have one. So until it can be measured (hopefully on several CPUs, i only have access to some i7s), I think a defensive choice is to go with regular jumps inside a function, perhaps partitioning out the different sections for readability.

sharkdp · 2019-01-05T09:05:50Z

Thank you for creating this ticket!

First off, I think I can poke at the dyn Filter idea. This will probably require 2 traits, a Filter and a MetadataFilter, the idea being that the dyn Filter ones will be called first, and then, if necessary, the metadata obtained and the dyn MetadataFilter ones called until one fails.

Exactly what I was thinking!

What I'm unsure about, is the performance aspect. A quick look around makes me suspect that, although virtual call penalty is negligible on modern/desktop CPUs with indirect branch predictors, smaller CPUs, including Atom, don't have one.

I thought about it as well, but I don't think it will be a problem. Yes, these calls will be in the "hot" part of the code, but my guess would be that we are IO-limited anyway and the virtual function call overhead will hopefully be negligible. But we should definitely measure it, yes 👍.

I think a defensive choice is to go with regular jumps inside a function, perhaps partitioning out the different sections for readability.

That would definitely also be an improvement. Anything that reduces the size of that monolithic worker thread logic.

sharkdp · 2020-04-16T08:29:00Z

I did a lot of refactoring for v8.0. The whole module structure has changed, things have been moved around and renamed. The error handling is completely new.

The main points listed in the first comment are still not addressed, but it should be much more pleasant to work with fds code now. I'm going to leave this ticket open.

sharkdp · 2021-08-08T21:25:54Z

I think this would be an excellent place for new contributors to start. There are many places where some refactoring work could improve the code quality a lot.

If you want to work on this, PLEASE submit small PRs that can be reviewed easily.

sharkdp · 2021-08-09T06:20:50Z

Another thing that would need some refactoring is the tests. It would be great to split these into multiple files. We could also think about using assert_cmd instead of our own custom testmod logic.

jacobchrismarsh · 2021-08-10T03:57:43Z

I like this idea. I'll take a stab at refactoring some tests via assert_cmd 😄

Relates to sharkdp#382

Relates to #382

sharkdp · 2021-08-10T19:08:23Z

I like this idea. I'll take a stab at refactoring some tests via assert_cmd smile

Awesome! Maybe really do just a few first. To see if that works out as a concept.

Asha20 · 2021-08-11T17:43:58Z

I think we can tidy up spawn_senders quite a bit, but since it's a large chunk of code I figured I'd post my thoughts here first, start a discussion and get some feedback.

Perhaps we could refactor it to look something like this (code might not be 100% correct, but you get the general idea).

In a separate file, we define our traits and filter structs:

trait Filter {
    fn filter(entry: &DirEntry) -> Option<ignore::WalkState>;
}

trait MetadataFilter {
    fn filter(metadata: &Metadata) -> Option<ignore::WalkState>;
}

// structs and their trait impls also go here

Then, we refactor this part of spawn_senders like this:

let entry = ...;

let filters: Vec<Box<dyn Filter>> = vec![
    MinDepth::new(config.min_depth),
    // other filters here
];

let result = filters
    .iter()
    .map(|x| x.filter(entry))
    .find_map(|x| x);

if let result = Some(x) {
    return x;
}

let metadata = entry.metadata();

let metadata_filters: Vec<Box<dyn MetadataFilter>> = vec![
    OwnerConstraint::new(config.owner_constraint),
    SizeConstraint::new(config.size_constraints),
    // other metadata filters here
]

return metadata_filters
    .iter()
    .map(|x| x.filter(entry))
    .find_map(|x| x)
    .unwrap_or(ignore::WalkState::Continue);

Basically, the goal is to unify the filtering interface (like mentioned above), so that we can use the right iterator tool to find the first failing filter and exit early.

What do you think?

tavianator · 2021-08-11T20:12:13Z

I like that approach, and I don't think the performance issues with indirect calls mentioned above will be significant. (You could avoid Box<dyn> with something like filter.chain(filter).chain(filter)...) A couple random points:

let result = filters
    .iter()
    .find_map(|x| x.filter(entry));  // No need for .map().find_map()

if let Some(x) = result { // This was backwards
    return x;
}

Asha20 · 2021-08-11T20:22:22Z

I can never remember the if let syntax haha; it just feels so backwards!

In any case, can I start working on a PR or should we wait for more opinions?

tavianator · 2021-08-11T20:28:06Z

If you want to cook up a PR that would be great! That way we can see what the improvement really looks like.

yashwant-nagarjuna · 2022-05-08T00:02:30Z

Another thing that would need some refactoring is the tests. It would be great to split these into multiple files. We could also think about using assert_cmd instead of our own custom testmod logic.

Is this still open? If so, I can start working on it. 😄

tmccombs · 2022-05-10T03:44:14Z

I believe it is

1Dragoon · 2022-07-09T01:55:20Z

Has anybody considered the idea of refactoring this into being essentially two different projects:

A back end crate that other rust code can use
A front end that implements what the users currently see

The reason I ask is because prior to finding out that this existed, I was working on my own multi-threaded copy tool, and part of its implementation was having a multi-threaded file scanning capability, along with implementation for globbing, path expansion, etc, but the only part I've completed is a very primitive version of the scan function, with nothing else. Though now that I've seen this it would make more sense to fork this and then just implement the copy functionality (along with progress tracking) on top. My current work is here:

https://github.com/1Dragoon/fcp

(It's a slow moving project, and was my motivation for recently extending indicatif's functionality to include stateful tracking)

Was going to do something similar for move, delete, etc, though with the added twist of detecting file locks and prompting the user to either close the programs that have those files open, or offering to just close them for the user, prompting for interactive privilege escalation if needed (on windows at least, where credentials can be entered out of band thus obviating security concerns.) That work can be found here if anybody is interested:

https://github.com/1Dragoon/locky

tmccombs · 2022-07-09T06:20:24Z

This was brought up before in #203.

I don't think it's completely out of the question, but I'd like to understand better what the value of fd as a library has over using the ignore crate directly.

alexmaco mentioned this issue Jan 4, 2019

Preserve non utf8 names #309

Closed

sharkdp added the idea label Jan 8, 2019

hazelweakly mentioned this issue Jan 10, 2019

Design directions for fd; explicit goals/non-goal w.r.t being a find alternative #390

Closed

Repository owner deleted a comment from hazelweakly Jan 10, 2019

alexmaco mentioned this issue Jan 26, 2019

Some cleanup and refactor #398

Merged

sharkdp added this to the v8.0 milestone Apr 2, 2020

sharkdp removed this from the v8.0 milestone Apr 16, 2020

sharkdp added help wanted and removed idea labels Dec 1, 2020

sharkdp added the good first issue label Aug 8, 2021

sharkdp changed the title ~~Design and refactoring discussion~~ Design and refactoring / general code quality Aug 8, 2021

sharkdp pinned this issue Aug 8, 2021

tmccombs added a commit to tmccombs/fd that referenced this issue Aug 10, 2021

Refactor file types check to be on impl of FileTypes

9e11b2c

Relates to sharkdp#382

tmccombs mentioned this issue Aug 10, 2021

Refactor file types check to be on impl of FileTypes #827

Merged

sharkdp pushed a commit that referenced this issue Aug 10, 2021

Refactor file types check to be on impl of FileTypes

115ae93

Relates to #382

Asha20 mentioned this issue Aug 12, 2021

Refactor filters in spawn_senders #829

Open

niklasmohrin mentioned this issue Aug 21, 2021

Some readability refactorings #833

Merged

1 task

sharkdp unpinned this issue Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and refactoring / general code quality #382

Design and refactoring / general code quality #382

alexmaco commented Jan 4, 2019

alexmaco commented Jan 4, 2019 •

edited

Loading

sharkdp commented Jan 5, 2019 •

edited

Loading

sharkdp commented Apr 16, 2020

sharkdp commented Aug 8, 2021

sharkdp commented Aug 9, 2021

jacobchrismarsh commented Aug 10, 2021

sharkdp commented Aug 10, 2021

Asha20 commented Aug 11, 2021 •

edited

Loading

tavianator commented Aug 11, 2021

Asha20 commented Aug 11, 2021

tavianator commented Aug 11, 2021

yashwant-nagarjuna commented May 8, 2022

tmccombs commented May 10, 2022

1Dragoon commented Jul 9, 2022

tmccombs commented Jul 9, 2022

Design and refactoring / general code quality #382

Design and refactoring / general code quality #382

Comments

alexmaco commented Jan 4, 2019

alexmaco commented Jan 4, 2019 • edited Loading

sharkdp commented Jan 5, 2019 • edited Loading

sharkdp commented Apr 16, 2020

sharkdp commented Aug 8, 2021

sharkdp commented Aug 9, 2021

jacobchrismarsh commented Aug 10, 2021

sharkdp commented Aug 10, 2021

Asha20 commented Aug 11, 2021 • edited Loading

tavianator commented Aug 11, 2021

Asha20 commented Aug 11, 2021

tavianator commented Aug 11, 2021

yashwant-nagarjuna commented May 8, 2022

tmccombs commented May 10, 2022

1Dragoon commented Jul 9, 2022

tmccombs commented Jul 9, 2022

alexmaco commented Jan 4, 2019 •

edited

Loading

sharkdp commented Jan 5, 2019 •

edited

Loading

Asha20 commented Aug 11, 2021 •

edited

Loading