Replace all uses of format!() #23

lilyball · 2019-01-10T05:55:43Z

Replace all uses of format!()

Previously we were allocating 2 strings per byte, plus another for the offset. Instead we can just write all this directly to stdout.

Also replace the calculation of the ANSI escape code for every single byte. This produces a significant speedup.

I tested this by ensuring the output of dumping a 1.3MB file (the hexyl binary itself) was byte-for-byte identical to without this PR.

Benchmark output:

Benchmark #1: ./original_hexyl ./original_hexyl
  Time (mean ± σ):      1.462 s ±  0.005 s    [User: 1.413 s, System: 0.046 s]
  Range (min … max):    1.456 s …  1.469 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     289.6 ms ±   1.5 ms    [User: 243.8 ms, System: 44.1 ms]
  Range (min … max):   288.2 ms … 293.1 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    5.05 ± 0.03 times faster than './original_hexyl ./original_hexyl'

sharkdp · 2019-01-10T07:23:13Z

Thank you very much for your contribution!

I'm not too happy with this solution. If we go with this, I think we should generate this lookup table somehow instead of having this huge array in the source code.

Couldn't we apply the same method as you did in #24 and replace the format!/write! combination with a single write!? Maybe this is not quite as fast, but it's not like hexyl is really a performance-critical program. It would be great if it were a little faster, but its surely not a big issue (Printing a 1MB hex dump in the terminal would produce 60,000 lines of output... and most of the execution time would actually be spent in the terminal emulator).

lilyball · 2019-01-10T07:27:24Z

We could use a single write!(), it was just slightly faster to do it this way and I didn't think a single lookup table would be problematic. It could of course be moved elsewhere in the file (or even into its own file). It could also be generated in code (say, when creating the Printer) if desired. I'll benchmark the difference between this and just a write! in a minute.

lilyball · 2019-01-10T07:30:01Z

Here's the difference I see:

Benchmark #1: ./hexyl_lookup_table ./original_hexyl
  Time (mean ± σ):      1.171 s ±  0.005 s    [User: 1.122 s, System: 0.046 s]
  Range (min … max):    1.165 s …  1.181 s
 
Benchmark #2: ./hexyl_write ./original_hexyl
  Time (mean ± σ):      1.216 s ±  0.015 s    [User: 1.165 s, System: 0.047 s]
  Range (min … max):    1.197 s …  1.233 s
 
Summary
  './hexyl_lookup_table ./original_hexyl' ran
    1.04 ± 0.01 times faster than './hexyl_write ./original_hexyl'

Another benefit of using a lookup table is we could stuff the ANSI color escapes in there as well (though in this case we'd definitely want to generate it at runtime), which would avoid having to categorize each byte in order to color it and avoid having to synthesize ANSI escapes for each byte.

sharkdp · 2019-01-10T07:32:10Z

Another benefit of using a lookup table is we could stuff the ANSI color escapes in there as well (though in this case we'd definitely want to generate it at runtime), which would avoid having to categorize each byte in order to color it and avoid having to synthesize ANSI escapes for each byte.

That is true, yes. That could be a more significant speedup.

In this case, I'd like to structure the lookup-table login into a new file and struct.

lilyball · 2019-01-10T07:37:45Z

Wow, getting rid of the ANSI escape calculation per byte is a huge win:

Benchmark #1: ./hexyl_lookup_table ./original_hexyl
  Time (mean ± σ):      1.193 s ±  0.005 s    [User: 1.143 s, System: 0.046 s]
  Range (min … max):    1.188 s …  1.206 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     739.4 ms ±   1.7 ms    [User: 691.2 ms, System: 45.8 ms]
  Range (min … max):   737.4 ms … 743.8 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    1.61 ± 0.01 times faster than './hexyl_lookup_table ./original_hexyl'

I'll push that momentarily.

sharkdp · 2019-01-10T07:41:02Z

Awesome! Thank you very much.

This looks clean enough to leave it in the main file. And we can still refactor later..

sharkdp · 2019-01-10T07:41:45Z

How much do we gain by using get_unchecked? Could we just use [..] and get rid of the unsafe block?

lilyball · 2019-01-10T07:42:42Z

If we apply the same trick to the char version we get even faster:

Benchmark #1: ./original_hexyl ./original_hexyl
  Time (mean ± σ):      1.462 s ±  0.005 s    [User: 1.413 s, System: 0.046 s]
  Range (min … max):    1.456 s …  1.469 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     289.6 ms ±   1.5 ms    [User: 243.8 ms, System: 44.1 ms]
  Range (min … max):   288.2 ms … 293.1 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    5.05 ± 0.03 times faster than './original_hexyl ./original_hexyl'

I'll test the non-unsafe version in a minute.

lilyball · 2019-01-10T07:44:57Z

The unsafe is just noise.

Previously we were allocating 2 strings per byte, plus another for the offset. Instead we can just write all this directly to stdout. Also replace the calculation of the ANSI escape code for every single byte. This produces a significant speedup.

sharkdp · 2019-01-10T07:46:51Z

Fantastic, thank you very much for these real time updates! 😄

I'd like to get this merged so I can include it in v0.3.1, which I'd like to release now. Happy to work on even more improvements in other PRs.

lilyball · 2019-01-10T07:47:11Z

Sounds good to me. I need to go to bed anyway.

sharkdp · 2019-01-10T08:02:28Z

We're faster than hexdump -C now 🎆 😄

▶ hyperfine 'hexyl $(which hexyl)' 'hexdump -C $(which hexyl)'
Benchmark #1: hexyl $(which hexyl)
  Time (mean ± σ):     169.6 ms ±   3.4 ms    [User: 147.7 ms, System: 21.7 ms]
  Range (min … max):   165.4 ms … 179.4 ms    17 runs
 
Benchmark #2: hexdump -C $(which hexyl)
  Time (mean ± σ):     190.0 ms ±   2.5 ms    [User: 188.2 ms, System: 1.7 ms]
  Range (min … max):   186.1 ms … 193.8 ms    15 runs
 
Summary
  'hexyl $(which hexyl)' ran
    1.12 ± 0.03 times faster than 'hexdump -C $(which hexyl)'

lilyball mentioned this pull request Jan 10, 2019

Stop putting background color on spaces #25

Closed

lilyball force-pushed the remove_format branch from ce909cc to 4ae0bf5 Compare January 10, 2019 06:35

lilyball force-pushed the remove_format branch 2 times, most recently from f9d9aaa to 3c9ae4f Compare January 10, 2019 07:39

lilyball force-pushed the remove_format branch from 3c9ae4f to ceaabb2 Compare January 10, 2019 07:43

Replace all uses of format!()

3d17daa

Previously we were allocating 2 strings per byte, plus another for the offset. Instead we can just write all this directly to stdout. Also replace the calculation of the ANSI escape code for every single byte. This produces a significant speedup.

lilyball force-pushed the remove_format branch from ceaabb2 to 3d17daa Compare January 10, 2019 07:45

sharkdp merged commit afb6669 into sharkdp:master Jan 10, 2019

lilyball deleted the remove_format branch January 10, 2019 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace all uses of format!() #23

Replace all uses of format!() #23

lilyball commented Jan 10, 2019 •

edited

Loading

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019 •

edited

Loading

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

Replace all uses of format!() #23

Replace all uses of format!() #23

Conversation

lilyball commented Jan 10, 2019 • edited Loading

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019 • edited Loading

lilyball commented Jan 10, 2019

sharkdp commented Jan 10, 2019

lilyball commented Jan 10, 2019 •

edited

Loading

sharkdp commented Jan 10, 2019 •

edited

Loading