Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting behaves really weird #942

Open
sagehane opened this issue Sep 21, 2022 · 8 comments
Open

Sorting behaves really weird #942

sagehane opened this issue Sep 21, 2022 · 8 comments

Comments

@sagehane
Copy link

Notes:

lf version:

$ lf --version
r27

Platform:

$ nixos-version
22.11.20220916.da6a058 (Raccoon)

I have LC_COLLATE set to C.

Expected/Desired Behaviour:

Ideally, lf should probably sort things similarly to sort by respecting system locale, or at least just sort in a sensible manner.

For example, I would expect lf to sort the files in a directory similarly to what sort does:

$ ls | sort
0
0-0
00
000
000-0
000-001
0000
000000
00a
0a
a

Actual Behaviour:

lf currently sorts this directory in the following order:

 000-001
 000000
 00a
 0a
 00
 000-0
 0000
 000
 0
 0-0
 a

I don't see why 000 should be listed before 0 but after 00.


For clarity, all the files were created with touch and they shouldn't be directories or have some properties that affect their order other than their names.

@kmarius
Copy link
Contributor

kmarius commented Sep 22, 2022

That's a quirk of natural sorting. You can get lexicographical sorting with set sortby name.

@sagehane
Copy link
Author

sagehane commented Sep 23, 2022

Never heard of that, but how does that explain stuff like the order of:

  • 00a -> 000, 0a -> 00, 0 -> a
  • 000000 -> 00 -> 0000 -> 000 -> 0
  • 000-0 -> 0000 but 00 -> 0-0

It's not even sorted properly; it's just inconsistent. This doesn't seem to be the result of "multi-digit numbers are treated atomically, i.e., as if they were a single character."


Also: set sortby name does seem to work as intended, thanks.

@kmarius
Copy link
Contributor

kmarius commented Sep 23, 2022

It does look like the natural sort here is not up to spec. I checked with a C implementation I had lying around and the order is

0
0-0
0a
00
00a
000
000-0
0000
00000
a

@lahwaacz
Copy link

Just checked and lf sorts these files as your C implementation. I'm on a different platform, though: Arch Linux

@kmarius
Copy link
Contributor

kmarius commented Sep 24, 2022

I think the comparison function naturalLess in misc.go is broken, repeatedly doing set reverse! will give different results.

@sagehane
Copy link
Author

sagehane commented Sep 24, 2022

Sorry for going off-topic but I found it interesting that natural sort in the C implementation above seem to be treating 0s as something with value. In that 0 != 00. If not, nothing should be able to satisfy x where 0a < 00a, 0a < x < 00a.

I guess it's kind of logical in the sense that 0 < 00 in that 1 < 11 should also hold. But does this mean that 10 < 011?
Would your implementation sort something like 010, 011, 10, 11 as:

10
11
010
011

From the description by Wikipedia (I have yet to read an actual spec for natural sort, couldn't find one), I would expect it to sort like:

10
010
11
011

@kmarius
Copy link
Contributor

kmarius commented Sep 24, 2022

@sagehane They are actually sorted as

010
011
10
11

which seems odd. I checked the implementation (taken from https://github.com/sourcefrog/natsort/blob/master/strnatcmp.c) and while it mentions it is skipping leading zeroes, it doesn't actually do it.

Some things a natural sort could handle are listed here: https://rosettacode.org/wiki/Natural_sorting

@sagehane
Copy link
Author

Welp, I find it inconsistent that the C implementation would presumably sort something like:

01
001
0001
1

I guess the takeaway is that natural sorting is hard to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants