fix issue 1145 #1146

tommady · 2022-12-14T09:27:12Z

fix issue 1145

according to the GNU doc

The default makefile names `GNUmakefile', `makefile' and `Makefile'

results

DriesMeerman

Might be nice to put the starts_with calls into an array of files with certain prefixes

src/info/filetype.rs

marbx · 2023-02-22T17:59:00Z

Could you please create a file makefile123 and run this PR, as shown in your screenshots? Because makefile123 startswith makefile the PR treats it as a makefile.

…

On Mon, Feb 20, 2023, 17:40 tommady ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/info/filetype.rs <#1146 (comment)>: > @@ -22,9 +22,10 @@ impl FileExtensions { #[allow(clippy::case_sensitive_file_extension_comparisons)] fn is_immediate(&self, file: &File<'_>) -> bool { file.name.to_lowercase().starts_with("readme") || + file.name.to_lowercase().starts_with("makefile") || I just followed the GNU doc default description as I mentioned in the PR. Other requests I think is beyond this PRs scope. — Reply to this email directly, view it on GitHub <#1146 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAYRE6CMEPYKBUEOBVXFODWYOM6TANCNFSM6AAAAAAS6HFP2A> . You are receiving this because you commented.Message ID: ***@***.***>

tommady · 2023-02-23T04:17:26Z

oh! I see what you meant ahah
sure will fix that, thank you!

tommady · 2023-02-23T07:32:46Z

new result

marbx · 2023-02-23T20:31:13Z

Your commit 92f8eba fixes what I meant.
Thank you for you swift reply :-)

I have a question regarding performance.

The function name_is_one_of() is called with 35 file names and is implemented with

exa/src/fs/file.rs

Line 502 in f3ca1fe

choices.contains(&&self.name[..])

choices is the only parameter of the function and hold the file names:

exa/src/fs/file.rs

Line 501 in f3ca1fe

pub fn name_is_one_of(&self, choices: &[&str]) -> bool {

I am beginner of Rust, so reading the Standard library, &[T] is a "shared slice".

The method contains() of the slice primitive has a comment that says

This operation is O(n).
Note that if you have a sorted slice, binary_search may be faster.

Is this the method used?

"O(n)" means linear search.
I have the impression that 35 file names could be large enough to justify sorting and using a binary search.

And for the sake of performance, sorting could be done by sorting the array in the source.

What do you think?

tommady · 2023-02-24T02:17:40Z

Your commit 92f8eba fixes what I meant. Thank you for you swift reply :-)

I have a question regarding performance.

The function name_is_one_of() is called with 35 file names and is implemented with

exa/src/fs/file.rs

Line 502 in f3ca1fe

choices.contains(&&self.name[..])

choices is the only parameter of the function and hold the file names:

exa/src/fs/file.rs

Line 501 in f3ca1fe

pub fn name_is_one_of(&self, choices: &[&str]) -> bool {

I am beginner of Rust, so reading the Standard library, &[T] is a "shared slice".

The method contains() of the slice primitive has a comment that says

This operation is O(n).
Note that if you have a sorted slice, binary_search may be faster.

Is this the method used?

"O(n)" means linear search. I have the impression that 35 file names could be large enough to justify sorting and using a binary search.

And for the sake of performance, sorting could be done by sorting the array in the source.

What do you think?

ya, I agree with your point.
I will do that!

BTW, I am a rust beginner too~
thanks for your suggestion!

marbx · 2023-02-25T17:20:50Z

@tommady, I appreciate the collaboration!
The sorted, vertical list from ba9c762 is much more readable then the unsorted list across 6 lines, I think.

I also think that is goes without saying, that any new filename needs to be added at its sorted positions.
I leave it to the exa maintainers whether a comment like this would make sense:

// binary_search() requires sorted file names

marbx · 2023-02-25T17:39:16Z

On a second thought:
the function name_is_one_of() is currently used only here and this PR would makes the function unused.
Unused code is a bad thing.

As there could be other users, I would like to suggest to move the binary search into the function, together with a break-even calculation.

Along the lines:
if length of choices is below 20: contains(), else: binary_search()

So the potential other users would profit, too.

Final argument: the name_is_one_of() function is currently "too simple". It cyclomatic complexity is probably 1.
In other words: it could use some "flesh" :-)

marbx · 2023-02-25T19:00:46Z

Also, we must not use binary_search() on only 35 file names!
It is 16% slower than contains()

I benchmarked contains() and binary_search() on 35 and 85 words:
https://github.com/marbx/LearnRust/blob/main/BenchmarkContainsBinarySearch/src/lib.rs

test tests::bench_binary_search_with_35_words ... bench:          25 ns/iter (+/- 0)
test tests::bench_binary_search_with_85_words ... bench:          33 ns/iter (+/- 1)
test tests::bench_contains_with_35_word       ... bench:          21 ns/iter (+/- 0)
test tests::bench_contains_with_85_words      ... bench:          58 ns/iter (+/- 5)
test tests::bench_xor                         ... bench:          93 ns/iter (+/- 2)

Break-even must be between 35 and 85, I will try to find it in the next days.

marbx · 2023-02-25T19:28:07Z

Renaming the functions changes the benchmarked times!

test tests::bench_35_words_binary_search ... bench:          25 ns/iter (+/- 0)
test tests::bench_35_words_contains      ... bench:          25 ns/iter (+/- 0)
test tests::bench_85_words_binary_search ... bench:          33 ns/iter (+/- 1)
test tests::bench_85_words_contains      ... bench:          63 ns/iter (+/- 1)
test tests::bench_xor                    ... bench:          92 ns/iter (+/- 3)

tommady · 2023-02-25T19:52:25Z

wow
so based on your comments, this PR's filenames are only 34 which means we should use the contains method right?

and thank you for your survey which made me learn so much.

marbx · 2023-02-25T20:54:51Z

I thought so 1 hour ago, but I realized that in the "real world" there are much more files which are not in the list. The number above come from a benchmark "1 file absent, 1 file present".
This is unrealistic

So I changed the benchmark to: "4 files absent, 1 file present" and got:

test tests::bench_35_words_binary_search ... bench:          72 ns/iter (+/- 1)
test tests::bench_35_words_contains      ... bench:          78 ns/iter (+/- 8)
test tests::bench_85_words_binary_search ... bench:          99 ns/iter (+/- 2)
test tests::bench_85_words_contains      ... bench:         138 ns/iter (+/- 1)

This I understand, because contains() must go though the whole list, before it can conclude that the file name is absent, so this benchmark hurts it a lot more than binary_search().

As you see, this benchmark is in favor of binary_search() already for 35 file names.
I would say in the "real world", there are more then 4 times files outside of the list.

In short: binary_search() is safer.

Could you please run and play with the benchmark? I don't know how trustworthy the measurements are.

I just realize: for a directory (or tree listing) larger then 35, one would have to reverse the search and (binary) search 35 file names in the directory/listing.

If you have a directory of 100 files, exa currently searches 100 times over a list of 35 entries.
While it is enough to search 35 times in a list of 100 entries.

tommady · 2023-02-25T22:02:12Z

or just a stupid idea, we can make the immediate files into a static hashmap which will lead every search O(1).

WDYT?

marbx · 2023-02-26T10:53:50Z

Not stupid idea at all - I got the same.

A hashmap is the fastest of the three methods, even if created within each benchmark function. So hashmap is the way forward.

test tests::bench_35_words_binary_search         ... bench:          72 ns/iter (+/- 5)
test tests::bench_35_words_contains              ... bench:          69 ns/iter (+/- 27)
test tests::bench_35_words_hashmap_local_mutable ... bench:          57 ns/iter (+/- 1)
test tests::bench_85_words_binary_search         ... bench:          99 ns/iter (+/- 2)
test tests::bench_85_words_contains              ... bench:         138 ns/iter (+/- 2)
test tests::bench_85_words_hashmap_local_mutable ... bench:          57 ns/iter (+/- 4)
test tests::bench_xor                            ... bench:          93 ns/iter (+/- 3)

The measurement supports the O(1) expectation.

it is easy to setup a local, mutable hashmap.
I have not yet figured out how to setup a static, global, unmutable hashmap.

tommady · 2023-02-26T12:19:45Z

I saw the exa already uses the dependency "lazy_static",
so ideally it is not hard to archive the goal, let me do a new commit!

marbx · 2023-02-26T12:25:58Z

Rust-PHF is a library to generate efficient lookup tables at compile time using perfect hash functions.

Unfortunately, PHF is slower than the dynamically created hashmap!

test tests::bench_35_words_binary_search         ... bench:          74 ns/iter (+/- 2)
test tests::bench_35_words_contains              ... bench:          76 ns/iter (+/- 1)
test tests::bench_35_words_hashmap_local_mutable ... bench:          59 ns/iter (+/- 1)
test tests::bench_35_words_hashmap_static_phf    ... bench:          71 ns/iter (+/- 3)
test tests::bench_35_words_static_phf_set        ... bench:          70 ns/iter (+/- 1)
test tests::bench_85_words_binary_search         ... bench:          98 ns/iter (+/- 2)
test tests::bench_85_words_contains              ... bench:         228 ns/iter (+/- 24)
test tests::bench_85_words_hashmap_local_mutable ... bench:          59 ns/iter (+/- 1)
test tests::bench_85_words_hashmap_static_phf    ... bench:          71 ns/iter (+/- 1)
test tests::bench_85_words_static_phf_set        ... bench:          70 ns/iter (+/- 2)
test tests::bench_xor                            ... bench:          91 ns/iter (+/- 0)

What am I doing wrong?
https://github.com/marbx/LearnRust/blob/b41ff2cd5c4c73118c2fea9e2e8ba39f5e1a3425/BenchmarkContainsBinarySearch/src/lib.rs#L510

tommady · 2023-02-26T12:38:20Z

or a more simple one?
try the simple match method.
https://www.reddit.com/r/rust/comments/5mnj3y/which_has_better_performance_a_hashmap_or_a/

marbx · 2023-02-26T13:41:05Z

The first time, the lazy_static HashMap is as fast as the dynamically created one.
This makes lazy_static HashMap the (current) winner.

Although, the second time there is no speed up.
This may have to to with the benchmark.

I'll try match

test tests::bench_35_words_binary_search         ... bench:          73 ns/iter (+/- 3)
test tests::bench_35_words_contains              ... bench:          94 ns/iter (+/- 10)
test tests::bench_35_words_hashmap_local_mutable ... bench:          60 ns/iter (+/- 3)
test tests::bench_35_words_hashmap_static_phf    ... bench:          70 ns/iter (+/- 2)
test tests::bench_35_words_lazy_hashset_1        ... bench:          60 ns/iter (+/- 4)
test tests::bench_35_words_lazy_hashset_2        ... bench:          60 ns/iter (+/- 1)
test tests::bench_35_words_static_phf_set        ... bench:          70 ns/iter (+/- 3)
test tests::bench_85_words_binary_search         ... bench:          99 ns/iter (+/- 3)
test tests::bench_85_words_contains              ... bench:         138 ns/iter (+/- 4)
test tests::bench_85_words_hashmap_local_mutable ... bench:          58 ns/iter (+/- 3)
test tests::bench_85_words_hashmap_static_phf    ... bench:          70 ns/iter (+/- 1)
test tests::bench_85_words_lazy_hashset_1        ... bench:          59 ns/iter (+/- 2)
test tests::bench_85_words_lazy_hashset_2        ... bench:          59 ns/iter (+/- 3)
test tests::bench_85_words_static_phf_set        ... bench:          70 ns/iter (+/- 2)
test tests::bench_xor                            ... bench:          91 ns/iter (+/- 0)

marbx · 2023-02-26T14:00:26Z

Match takes 0ns for 35 words, and 121ns for 85 words, double as for hash
This makes no sense.

test tests::bench_35_words_binary_search         ... bench:          72 ns/iter (+/- 3)
test tests::bench_35_words_contains              ... bench:          78 ns/iter (+/- 3)
test tests::bench_35_words_hashmap_local_mutable ... bench:          59 ns/iter (+/- 1)
test tests::bench_35_words_hashmap_static_phf    ... bench:          70 ns/iter (+/- 2)
test tests::bench_35_words_lazy_hashset_1        ... bench:          58 ns/iter (+/- 1)
test tests::bench_35_words_lazy_hashset_2        ... bench:          59 ns/iter (+/- 4)
test tests::bench_35_words_match                 ... bench:           0 ns/iter (+/- 0)
test tests::bench_35_words_static_phf_set        ... bench:          70 ns/iter (+/- 2)
test tests::bench_85_words_binary_search         ... bench:          99 ns/iter (+/- 4)
test tests::bench_85_words_contains              ... bench:         229 ns/iter (+/- 13)
test tests::bench_85_words_hashmap_local_mutable ... bench:          58 ns/iter (+/- 4)
test tests::bench_85_words_hashmap_static_phf    ... bench:          70 ns/iter (+/- 2)
test tests::bench_85_words_lazy_hashset_1        ... bench:          58 ns/iter (+/- 1)
test tests::bench_85_words_lazy_hashset_2        ... bench:          58 ns/iter (+/- 2)
test tests::bench_85_words_match                 ... bench:         121 ns/iter (+/- 6)
test tests::bench_85_words_static_phf_set        ... bench:          69 ns/iter (+/- 2)
test tests::bench_xor                            ... bench:          92 ns/iter (+/- 5)

marbx

matches! macro increases readability.

tommady · 2023-02-26T16:05:03Z

Ya I think we should leave all those surveys for the maintainer to do the judgement.

Clear winner for this PR (which the file names are below 35) is the matches macro with the fastest result with no doubts.

ariasuni · 2023-02-28T09:47:53Z

Does it do a difference in term of final executable size?

tommady · 2023-03-04T15:29:33Z

Does it do a difference in term of final executable size?

from this PR's modification, it added three strings

`GNUmakefile'
`makefile'
`Makefile'

is this the reason?

1stDimension · 2023-04-27T12:03:40Z

The issues with CI are not present in current master c697d06. I believe if you rebase your branch onto it the checks will pass. 1stDimension#3 I tried to create similar environment in my fork.

ariasuni · 2023-09-08T17:23:37Z

Closing this, since exa is unmaintained (see #1243), and this has been merged in the active fork eza. Thanks!

fix issue 1145

5c7620e

DriesMeerman approved these changes Dec 15, 2022

View reviewed changes

marbx reviewed Feb 19, 2023

View reviewed changes

src/info/filetype.rs Outdated Show resolved Hide resolved

address comment

92f8eba

address comment with using the binary search

ba9c762

use matches to speed up comparsion performance

9101535

marbx reviewed Feb 26, 2023

View reviewed changes

sbatial mentioned this pull request Jul 29, 2023

All the open pulls from exa syphar/zetta#11

Open

63 tasks

sbatial mentioned this pull request Jul 29, 2023

All the exa issues eza-community/eza#65

Closed

63 tasks

cafkafk mentioned this pull request Jul 29, 2023

refactor: filetype.rs eza-community/eza#28

Merged

5 tasks

ariasuni force-pushed the master branch from 36c003b to fb05c42 Compare September 5, 2023 23:14

ariasuni closed this Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix issue 1145 #1146

fix issue 1145 #1146

tommady commented Dec 14, 2022 •

edited

Loading

DriesMeerman left a comment

marbx commented Feb 22, 2023 via email

tommady commented Feb 23, 2023

tommady commented Feb 23, 2023

marbx commented Feb 23, 2023 •

edited

Loading

tommady commented Feb 24, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

tommady commented Feb 25, 2023

marbx commented Feb 25, 2023

tommady commented Feb 25, 2023

marbx commented Feb 26, 2023

tommady commented Feb 26, 2023

marbx commented Feb 26, 2023

tommady commented Feb 26, 2023

marbx commented Feb 26, 2023

marbx commented Feb 26, 2023

marbx left a comment

tommady commented Feb 26, 2023 •

edited

Loading

ariasuni commented Feb 28, 2023

tommady commented Mar 4, 2023

1stDimension commented Apr 27, 2023

ariasuni commented Sep 8, 2023

fix issue 1145 #1146

fix issue 1145 #1146

Conversation

tommady commented Dec 14, 2022 • edited Loading

DriesMeerman left a comment

Choose a reason for hiding this comment

marbx commented Feb 22, 2023 via email

tommady commented Feb 23, 2023

tommady commented Feb 23, 2023

marbx commented Feb 23, 2023 • edited Loading

tommady commented Feb 24, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

marbx commented Feb 25, 2023

tommady commented Feb 25, 2023

marbx commented Feb 25, 2023

tommady commented Feb 25, 2023

marbx commented Feb 26, 2023

tommady commented Feb 26, 2023

marbx commented Feb 26, 2023

tommady commented Feb 26, 2023

marbx commented Feb 26, 2023

marbx commented Feb 26, 2023

marbx left a comment

Choose a reason for hiding this comment

tommady commented Feb 26, 2023 • edited Loading

ariasuni commented Feb 28, 2023

tommady commented Mar 4, 2023

1stDimension commented Apr 27, 2023

ariasuni commented Sep 8, 2023

tommady commented Dec 14, 2022 •

edited

Loading

marbx commented Feb 23, 2023 •

edited

Loading

tommady commented Feb 26, 2023 •

edited

Loading