Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace all uses of format!() #23

Merged
merged 1 commit into from
Jan 10, 2019
Merged

Conversation

lilyball
Copy link
Contributor

@lilyball lilyball commented Jan 10, 2019

Replace all uses of format!()

Previously we were allocating 2 strings per byte, plus another for the offset. Instead we can just write all this directly to stdout.

Also replace the calculation of the ANSI escape code for every single byte. This produces a significant speedup.

I tested this by ensuring the output of dumping a 1.3MB file (the hexyl binary itself) was byte-for-byte identical to without this PR.

Benchmark output:

Benchmark #1: ./original_hexyl ./original_hexyl
  Time (mean ± σ):      1.462 s ±  0.005 s    [User: 1.413 s, System: 0.046 s]
  Range (min … max):    1.456 s …  1.469 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     289.6 ms ±   1.5 ms    [User: 243.8 ms, System: 44.1 ms]
  Range (min … max):   288.2 ms … 293.1 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    5.05 ± 0.03 times faster than './original_hexyl ./original_hexyl'

@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

Thank you very much for your contribution!

I'm not too happy with this solution. If we go with this, I think we should generate this lookup table somehow instead of having this huge array in the source code.

Couldn't we apply the same method as you did in #24 and replace the format!/write! combination with a single write!? Maybe this is not quite as fast, but it's not like hexyl is really a performance-critical program. It would be great if it were a little faster, but its surely not a big issue (Printing a 1MB hex dump in the terminal would produce 60,000 lines of output... and most of the execution time would actually be spent in the terminal emulator).

@lilyball
Copy link
Contributor Author

We could use a single write!(), it was just slightly faster to do it this way and I didn't think a single lookup table would be problematic. It could of course be moved elsewhere in the file (or even into its own file). It could also be generated in code (say, when creating the Printer) if desired. I'll benchmark the difference between this and just a write! in a minute.

@lilyball
Copy link
Contributor Author

Here's the difference I see:

Benchmark #1: ./hexyl_lookup_table ./original_hexyl
  Time (mean ± σ):      1.171 s ±  0.005 s    [User: 1.122 s, System: 0.046 s]
  Range (min … max):    1.165 s …  1.181 s
 
Benchmark #2: ./hexyl_write ./original_hexyl
  Time (mean ± σ):      1.216 s ±  0.015 s    [User: 1.165 s, System: 0.047 s]
  Range (min … max):    1.197 s …  1.233 s
 
Summary
  './hexyl_lookup_table ./original_hexyl' ran
    1.04 ± 0.01 times faster than './hexyl_write ./original_hexyl'

Another benefit of using a lookup table is we could stuff the ANSI color escapes in there as well (though in this case we'd definitely want to generate it at runtime), which would avoid having to categorize each byte in order to color it and avoid having to synthesize ANSI escapes for each byte.

@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

Another benefit of using a lookup table is we could stuff the ANSI color escapes in there as well (though in this case we'd definitely want to generate it at runtime), which would avoid having to categorize each byte in order to color it and avoid having to synthesize ANSI escapes for each byte.

That is true, yes. That could be a more significant speedup.

In this case, I'd like to structure the lookup-table login into a new file and struct.

@lilyball
Copy link
Contributor Author

Wow, getting rid of the ANSI escape calculation per byte is a huge win:

Benchmark #1: ./hexyl_lookup_table ./original_hexyl
  Time (mean ± σ):      1.193 s ±  0.005 s    [User: 1.143 s, System: 0.046 s]
  Range (min … max):    1.188 s …  1.206 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     739.4 ms ±   1.7 ms    [User: 691.2 ms, System: 45.8 ms]
  Range (min … max):   737.4 ms … 743.8 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    1.61 ± 0.01 times faster than './hexyl_lookup_table ./original_hexyl'

I'll push that momentarily.

@lilyball lilyball force-pushed the remove_format branch 2 times, most recently from f9d9aaa to 3c9ae4f Compare January 10, 2019 07:39
@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

Awesome! Thank you very much.

This looks clean enough to leave it in the main file. And we can still refactor later..

@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

How much do we gain by using get_unchecked? Could we just use [..] and get rid of the unsafe block?

@lilyball
Copy link
Contributor Author

If we apply the same trick to the char version we get even faster:

Benchmark #1: ./original_hexyl ./original_hexyl
  Time (mean ± σ):      1.462 s ±  0.005 s    [User: 1.413 s, System: 0.046 s]
  Range (min … max):    1.456 s …  1.469 s
 
Benchmark #2: target/release/hexyl ./original_hexyl
  Time (mean ± σ):     289.6 ms ±   1.5 ms    [User: 243.8 ms, System: 44.1 ms]
  Range (min … max):   288.2 ms … 293.1 ms
 
Summary
  'target/release/hexyl ./original_hexyl' ran
    5.05 ± 0.03 times faster than './original_hexyl ./original_hexyl'

I'll test the non-unsafe version in a minute.

@lilyball
Copy link
Contributor Author

The unsafe is just noise.

Previously we were allocating 2 strings per byte, plus another for the
offset. Instead we can just write all this directly to stdout.

Also replace the calculation of the ANSI escape code for every single
byte. This produces a significant speedup.
@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

Fantastic, thank you very much for these real time updates! 😄

I'd like to get this merged so I can include it in v0.3.1, which I'd like to release now. Happy to work on even more improvements in other PRs.

@lilyball
Copy link
Contributor Author

Sounds good to me. I need to go to bed anyway.

@sharkdp sharkdp merged commit afb6669 into sharkdp:master Jan 10, 2019
@sharkdp
Copy link
Owner

sharkdp commented Jan 10, 2019

We're faster than hexdump -C now 🎆 😄

▶ hyperfine 'hexyl $(which hexyl)' 'hexdump -C $(which hexyl)'
Benchmark #1: hexyl $(which hexyl)
  Time (mean ± σ):     169.6 ms ±   3.4 ms    [User: 147.7 ms, System: 21.7 ms]
  Range (min … max):   165.4 ms … 179.4 ms    17 runs
 
Benchmark #2: hexdump -C $(which hexyl)
  Time (mean ± σ):     190.0 ms ±   2.5 ms    [User: 188.2 ms, System: 1.7 ms]
  Range (min … max):   186.1 ms … 193.8 ms    15 runs
 
Summary
  'hexyl $(which hexyl)' ran
    1.12 ± 0.03 times faster than 'hexdump -C $(which hexyl)'

@lilyball lilyball deleted the remove_format branch January 10, 2019 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants