Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and refactoring / general code quality #382

Open
alexmaco opened this issue Jan 4, 2019 · 15 comments
Open

Design and refactoring / general code quality #382

alexmaco opened this issue Jan 4, 2019 · 15 comments

Comments

@alexmaco
Copy link
Contributor

alexmaco commented Jan 4, 2019

This is intended as a tracking issue for general code quality and design discussions. @sharkdp feel free to close this when you think quality has reached an overall satisfactory level.

To start, I'll quote a recent comment by @sharkdp from a discussion on a PR thread:

There are many things I have in mind that would hopefully help to improve the code quality:

  • Group related functions into impl blocks of appropriate structs such that we don't have to pass around so many parameters.
  • Think about a unified interface for filters (file type, file size, modification data, file extension, ..).
    Maybe a trait-based interface where the filters would be stored in a Vec<Box<dyn Filter>>.
  • Extract some of the functionality like the OsStr-based path handling into separate libraries / crates.
@alexmaco
Copy link
Contributor Author

alexmaco commented Jan 4, 2019

First off, I think I can poke at the dyn Filter idea. This will probably require 2 traits, a Filter and a MetadataFilter, the idea being that the dyn Filter ones will be called first, and then, if necessary, the metadata obtained and the dyn MetadataFilter ones called until one fails.

What I'm unsure about, is the performance aspect. A quick look around makes me suspect that, although virtual call penalty is negligible on modern/desktop CPUs with indirect branch predictors, smaller CPUs, including Atom, don't have one. So until it can be measured (hopefully on several CPUs, i only have access to some i7s), I think a defensive choice is to go with regular jumps inside a function, perhaps partitioning out the different sections for readability.

@sharkdp
Copy link
Owner

sharkdp commented Jan 5, 2019

Thank you for creating this ticket!

First off, I think I can poke at the dyn Filter idea. This will probably require 2 traits, a Filter and a MetadataFilter, the idea being that the dyn Filter ones will be called first, and then, if necessary, the metadata obtained and the dyn MetadataFilter ones called until one fails.

Exactly what I was thinking!

What I'm unsure about, is the performance aspect. A quick look around makes me suspect that, although virtual call penalty is negligible on modern/desktop CPUs with indirect branch predictors, smaller CPUs, including Atom, don't have one.

I thought about it as well, but I don't think it will be a problem. Yes, these calls will be in the "hot" part of the code, but my guess would be that we are IO-limited anyway and the virtual function call overhead will hopefully be negligible. But we should definitely measure it, yes 👍.

I think a defensive choice is to go with regular jumps inside a function, perhaps partitioning out the different sections for readability.

That would definitely also be an improvement. Anything that reduces the size of that monolithic worker thread logic.

@sharkdp
Copy link
Owner

sharkdp commented Apr 16, 2020

I did a lot of refactoring for v8.0. The whole module structure has changed, things have been moved around and renamed. The error handling is completely new.

The main points listed in the first comment are still not addressed, but it should be much more pleasant to work with fds code now. I'm going to leave this ticket open.

@sharkdp sharkdp removed this from the v8.0 milestone Apr 16, 2020
@sharkdp sharkdp added help wanted and removed idea labels Dec 1, 2020
@sharkdp
Copy link
Owner

sharkdp commented Aug 8, 2021

I think this would be an excellent place for new contributors to start. There are many places where some refactoring work could improve the code quality a lot.

If you want to work on this, PLEASE submit small PRs that can be reviewed easily.

@sharkdp sharkdp changed the title Design and refactoring discussion Design and refactoring / general code quality Aug 8, 2021
@sharkdp sharkdp pinned this issue Aug 8, 2021
@sharkdp
Copy link
Owner

sharkdp commented Aug 9, 2021

Another thing that would need some refactoring is the tests. It would be great to split these into multiple files. We could also think about using assert_cmd instead of our own custom testmod logic.

@jacobchrismarsh
Copy link

I like this idea. I'll take a stab at refactoring some tests via assert_cmd 😄

@sharkdp
Copy link
Owner

sharkdp commented Aug 10, 2021

I like this idea. I'll take a stab at refactoring some tests via assert_cmd smile

Awesome! Maybe really do just a few first. To see if that works out as a concept.

@Asha20
Copy link
Contributor

Asha20 commented Aug 11, 2021

I think we can tidy up spawn_senders quite a bit, but since it's a large chunk of code I figured I'd post my thoughts here first, start a discussion and get some feedback.

Perhaps we could refactor it to look something like this (code might not be 100% correct, but you get the general idea).

In a separate file, we define our traits and filter structs:

trait Filter {
    fn filter(entry: &DirEntry) -> Option<ignore::WalkState>;
}

trait MetadataFilter {
    fn filter(metadata: &Metadata) -> Option<ignore::WalkState>;
}

// structs and their trait impls also go here

Then, we refactor this part of spawn_senders like this:

let entry = ...;

let filters: Vec<Box<dyn Filter>> = vec![
    MinDepth::new(config.min_depth),
    // other filters here
];

let result = filters
    .iter()
    .map(|x| x.filter(entry))
    .find_map(|x| x);

if let result = Some(x) {
    return x;
}

let metadata = entry.metadata();

let metadata_filters: Vec<Box<dyn MetadataFilter>> = vec![
    OwnerConstraint::new(config.owner_constraint),
    SizeConstraint::new(config.size_constraints),
    // other metadata filters here
]

return metadata_filters
    .iter()
    .map(|x| x.filter(entry))
    .find_map(|x| x)
    .unwrap_or(ignore::WalkState::Continue);

Basically, the goal is to unify the filtering interface (like mentioned above), so that we can use the right iterator tool to find the first failing filter and exit early.

What do you think?

@tavianator
Copy link
Collaborator

I like that approach, and I don't think the performance issues with indirect calls mentioned above will be significant. (You could avoid Box<dyn> with something like filter.chain(filter).chain(filter)...) A couple random points:

let result = filters
    .iter()
    .find_map(|x| x.filter(entry));  // No need for .map().find_map()

if let Some(x) = result { // This was backwards
    return x;
}

@Asha20
Copy link
Contributor

Asha20 commented Aug 11, 2021

I can never remember the if let syntax haha; it just feels so backwards!

In any case, can I start working on a PR or should we wait for more opinions?

@tavianator
Copy link
Collaborator

If you want to cook up a PR that would be great! That way we can see what the improvement really looks like.

@yashwant-nagarjuna
Copy link

Another thing that would need some refactoring is the tests. It would be great to split these into multiple files. We could also think about using assert_cmd instead of our own custom testmod logic.

Is this still open? If so, I can start working on it. 😄

@tmccombs
Copy link
Collaborator

I believe it is

@1Dragoon
Copy link

1Dragoon commented Jul 9, 2022

Has anybody considered the idea of refactoring this into being essentially two different projects:

  1. A back end crate that other rust code can use
  2. A front end that implements what the users currently see

The reason I ask is because prior to finding out that this existed, I was working on my own multi-threaded copy tool, and part of its implementation was having a multi-threaded file scanning capability, along with implementation for globbing, path expansion, etc, but the only part I've completed is a very primitive version of the scan function, with nothing else. Though now that I've seen this it would make more sense to fork this and then just implement the copy functionality (along with progress tracking) on top. My current work is here:

https://github.com/1Dragoon/fcp

(It's a slow moving project, and was my motivation for recently extending indicatif's functionality to include stateful tracking)

Was going to do something similar for move, delete, etc, though with the added twist of detecting file locks and prompting the user to either close the programs that have those files open, or offering to just close them for the user, prompting for interactive privilege escalation if needed (on windows at least, where credentials can be entered out of band thus obviating security concerns.) That work can be found here if anybody is interested:

https://github.com/1Dragoon/locky

@tmccombs
Copy link
Collaborator

tmccombs commented Jul 9, 2022

This was brought up before in #203.

I don't think it's completely out of the question, but I'd like to understand better what the value of fd as a library has over using the ignore crate directly.

@sharkdp sharkdp unpinned this issue Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants