Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beat xxd #66

Closed
mike239x opened this issue Jul 9, 2019 · 7 comments · Fixed by #73
Closed

Beat xxd #66

mike239x opened this issue Jul 9, 2019 · 7 comments · Fixed by #73

Comments

@mike239x
Copy link

mike239x commented Jul 9, 2019

I did a bit of benchmarking and I can't help but notice that xxd is faster than hexyl.
On my machine on a file of about 700M:

$ time xxd myfile > /dev/null

real	0m43.245s
user	0m42.950s
sys	0m0.272s

$ time hexyl --color=never --no-squeezing --border=none myfile > /dev/null

real	1m10.967s
user	1m1.371s
sys	0m9.592s

It would be nice to beat xxd in speed... I got no idea how to do it though.

@sharkdp
Copy link
Owner

sharkdp commented Jul 9, 2019

Thank you for the feedback.

I agree, it would be nice. But not more. I don't really see a problem with the current speed, as I don't think that performance (at the current level) is critical for a hexdump tool. hexyl processes around 10 MiB of data per second. It outputs text to the screen at a speed that is much faster than terminal emulators can handle (in terminator, hexyl is a factor of 5 slower when I write to the TTY).

In which real-world use case would we really need it to be faster?

@mike239x
Copy link
Author

I tried to find a "real-world use case" but failed. I would say it is an ideological thing... Something in lines "software shouldn't get slower with time, but faster".

I'll take a look into the source code in my free time, maybe (though unlikely) I'll find the way to improve it :)

@sharkdp
Copy link
Owner

sharkdp commented Jul 10, 2019

"software shouldn't get slower with time, but faster"

I would agree. But hexyl is about adding additional functionality (the colorized output). It's not trying to be a 1:1 replacement for xxd.

@remexre
Copy link

remexre commented Jul 13, 2019

Real-world use case -- I've got a pretty big file that's mostly zeroes, with a k or so of nonzero data. Reading from /tmp, hexyl takes 55.791s, hexdump -C takes 1.091s, xxd >/dev/null takes 40.528s.

@sharkdp
Copy link
Owner

sharkdp commented Jul 13, 2019

@remexre Thank you.

If someone wants to work on this, here is a reproducible benchmark (I'm using hyperfine):

#!/bin/bash

dd if=/dev/zero    bs=10M count=1 >  data
dd if=/dev/urandom bs=1k  count=1 >> data

hyperfine --warmup 3 \
    'hexyl data' \
    'hexyl --no-squeezing data' \
    'hexdump -C data' \
    'hexdump -C --no-squeezing data' \
    'xxd data' \
    --export-markdown results.md
Command Mean [s] Min [s] Max [s] Relative
hexyl data 1.037 ± 0.023 1.014 1.078 63.1
hexyl --no-squeezing data 1.289 ± 0.022 1.261 1.319 78.4
hexdump -C data 0.016 ± 0.001 0.016 0.018 1.0
hexdump -C --no-squeezing data 1.921 ± 0.014 1.902 1.943 116.8
xxd data 0.707 ± 0.008 0.701 0.729 43.0

Apparently, hexdumps "squeezing" mode is really good.

@sharkdp sharkdp mentioned this issue Oct 8, 2019
@sharkdp
Copy link
Owner

sharkdp commented Oct 8, 2019

see #73

@fmillion
Copy link

Real-world use case -- I've got a pretty big file that's mostly zeroes, with a k or so of nonzero data. Reading from /tmp, hexyl takes 55.791s, hexdump -C takes 1.091s, xxd >/dev/null takes 40.528s.

Old commit, but here's another use case for posterity. I want to compare two disk images, and I want to not only see where data differs, but also what the differing data is, in a hexdump format. To do that, I like using tools like this to produce a plaintext version of the data that can then be diffed. Storing the huge files isn't an issue (either diff can be piped directly, or the huge files can be stored on a compressed filesystem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants