Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work incorrectly #26

Closed
Alsan opened this issue Aug 31, 2022 · 4 comments
Closed

Work incorrectly #26

Alsan opened this issue Aug 31, 2022 · 4 comments

Comments

@Alsan
Copy link

Alsan commented Aug 31, 2022

The command fc-list : family | huniq could not produce the expected result as fc-list : family | sort -u, and I can confirm that the piped lines are terminated by 0x0a, as the example of echo -e "foo\nbar\nfoo\nbaz" | huniq.

@koraa
Copy link
Owner

koraa commented Aug 31, 2022

The command echo -e "foo\nbar\nfoo\nbaz" produces a trailing newline character. This is preserved by huniq and huniq -t.

$ echo -e "foo\nbar\nfoo\nbaz" | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  66 6f 6f 0a 62 61 7a 0a  |foo.bar.foo.baz.|
$ echo -e "foo\nbar\nfoo\nbaz" | huniq | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  62 61 7a 0a              |foo.bar.baz.|
$ echo -e "foo\nbar\nfoo\nbaz" | huniq -t | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  62 61 7a 0a              |foo.bar.baz.|

The command echo -n -e "foo\nbar\nfoo\nbaz" (-n option) produces no trailing newline. A trailing newline is added by huniq; this can be avoided by using huniq -t.

$ echo -n -e "foo\nbar\nfoo\nbaz" | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  66 6f 6f 0a 62 61 7a     |foo.bar.foo.baz|
$ echo -n -e "foo\nbar\nfoo\nbaz" | huniq | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  62 61 7a 0a              |foo.bar.baz.|
$ echo -n -e "foo\nbar\nfoo\nbaz" | huniq -t | hexdump -C
00000000  66 6f 6f 0a 62 61 72 0a  62 61 7a                 |foo.bar.baz|

Sounds to me like you need to use -t. Does this solve your problem?

@Alsan
Copy link
Author

Alsan commented Sep 1, 2022

Seems not. I have these tests:

test 1: get the font list, then sort the list, and get top 3 from the list

fc-list : family | sort -u | head -n 3

and the output:

1942 report
3270Medium Nerd Font
3270Medium Nerd Font Mono

test 2: get the font list, then sort the list using huniq -t, and get top 3 from the list

fc-list : family | huniq -t | head -n 3

the output:

Noto Sans Canadian Aboriginal,Noto Sans CanAborig Th
Sarasa UI J,更紗ゴシック UI J
Noto Sans Gurmukhi
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', library/std/src/io/stdio.rs:1015:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Note that the broken pipe warning seemed to be another problem.

The hex dump of the both previous commands:

hex dump using sort -u:

fc-list : family | sort -u | head -n 3 | hexdump -C
00000000  31 39 34 32 20 72 65 70  6f 72 74 0a 33 32 37 30  |1942 report.3270|
00000010  4d 65 64 69 75 6d 20 4e  65 72 64 20 46 6f 6e 74  |Medium Nerd Font|
00000020  0a 33 32 37 30 4d 65 64  69 75 6d 20 4e 65 72 64  |.3270Medium Nerd|
00000030  20 46 6f 6e 74 20 4d 6f  6e 6f 0a                 | Font Mono.|
0000003b

hex dump using huniq -t

fc-list : family | huniq -t | head -n 3 | hexdump -C
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', library/std/src/io/stdio.rs:1015:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
00000000  4e 6f 74 6f 20 53 61 6e  73 20 43 61 6e 61 64 69  |Noto Sans Canadi|
00000010  61 6e 20 41 62 6f 72 69  67 69 6e 61 6c 2c 4e 6f  |an Aboriginal,No|
00000020  74 6f 20 53 61 6e 73 20  43 61 6e 41 62 6f 72 69  |to Sans CanAbori|
00000030  67 20 54 68 0a 53 61 72  61 73 61 20 55 49 20 4a  |g Th.Sarasa UI J|
00000040  2c e6 9b b4 e7 b4 97 e3  82 b4 e3 82 b7 e3 83 83  |,...............|
00000050  e3 82 af 20 55 49 20 4a  0a 4e 6f 74 6f 20 53 61  |... UI J.Noto Sa|
00000060  6e 73 20 47 75 72 6d 75  6b 68 69 0a              |ns Gurmukhi.|
0000006c

As you can see, both commands produces "line feed seperated" lines, but clearly the 'huniq' didn't sort the list.

I've had a thougth about does it using the space as the seperator, and tried using echo -n -e "foo bar bar\nbar\nfoo\nbaz" | huniq, for the test, and which produced the expected result, therefore, it' not the case.

@koraa
Copy link
Owner

koraa commented Sep 1, 2022

Huniq doesn't sort lists; the goal of huniq is not to sort it's output, merely to remove duplicate lines, without sorting them.

This is useful because sometimes not touching the output order is actually what you need; on the other hand huniq is more efficient than sort|uniq because huniq uses hash tables…

If you need sorted output you can use the normal sort | uniq or huniq | sort; the latter may be a bit more efficient because it reduces the amount of work for sort.

@koraa
Copy link
Owner

koraa commented Sep 28, 2022

Thank you for submitting this issue; I am not sure if further action is needed. Feel free to comment if this needs to be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants