Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brilirs #189

Merged
merged 5 commits into from
Apr 7, 2022
Merged

Brilirs #189

merged 5 commits into from
Apr 7, 2022

Conversation

Pat-Lafon
Copy link
Contributor

Most of this pr is cleanup via:

  • 3539fd3: Mostly comments and stylistic changes.
  • 51779f9: Reworks build.rs to have a better control flow and to not create a tab completions file if it already exists.
  • 7c3822c: Addressing #![allow(clippy::too_many_arguments)] by refactoring the interpreter to use a State struct. This is similar to the State struct used in brili except there is only one instance which is passed by &mut.
  • e03f0be: Previously, indexing was being done by keeping track of u32 values and then casting to usize. This just now uses usize values directly.

I've been sporadically adding more comments to the code base as I've forgotten what my past self meant by some of it... see https://xkcd.com/1421/

2dfee99 tackles the second half of optimizing function calls in brilirs started in #188. Previously, functions were stored in a HashMap<String, BBFunction> in BBProgram. This meant that on each function call, the string name needs to be hashed and looked up in the map. Since all of the functions are know statically, we can instead create a vec<BBFunction> to be indexed into in a similar optimization for what is done with the variable names of instructions(originally introduced in the blog post https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/faster-interpreter/).

The performance onfunction_call.bril.

brili Old brilirs Proposed brilirs
7.1s ± .1 s 328 ms ± 2ms 251 ms ± 2 ms

In comparison to #188, there is a much bigger reduction in run time which was surprising to me. It could be that hashing and indexing into a map are more expensive that I had thought. It could also be that this benchmark is too simple to for there to be a significant benefit from #188. Since it just calling the same function repeatedly, each allocation is the same size(same number of variables on the stack) which allows the allocator to be fast whereas the performance of #188 shouldn't be as affected by changes in stack size.

Either way, good benchmarking is non-trivial and all of these performance numbers should be taken with a grain of salt.

@sampsyo
Copy link
Owner

sampsyo commented Apr 7, 2022

Super nice; this all looks great! Since the win is pretty big for the "function calls only" benchmark, I guess it might be educational to know what the performance impact is on "normal" benchmarks as well… it might be measurable, despite the relative rarity of function calls in other benchmarks.

In any case, this is a great way to extend the index-based interpretation philosophy beyond just variables! 👏

@sampsyo sampsyo merged commit fa46e23 into sampsyo:main Apr 7, 2022
@Pat-Lafon
Copy link
Contributor Author

So I'm only considering benchmarks with a dynamic instruction count greater than ~750,000 which currently is about what you need to get the lower range of brilirs above 5ms. Maybe one could use a different benchmarking tool and a less noisy setup to be more precise.

benchmark brili Old brilirs Current brilirs
ackermann.bril 252ms ± 2ms 13ms ± 1ms 11 ms ± 1 ms
eight-queens.bril 226ms ± 2ms 8ms ± 1ms 7ms ± 1ms
mat-mul.bril 404ms ± 3ms 14ms ± 1ms 14ms ± 1ms
function-call.bril 7.1s ± .1s 330ms ± 2ms 252ms ± 2ms
alloc_large.bril 1.2s ± .1s 694ms ± 3ms 692ms ± 2ms

I've rounded everything up to the relevant significant figure. eight-queens and mat-mul both see no benefit which makes sense since they use looping instead of recursion.

I've thrown in alloc_large.bril which is actually a test case but has some very surprising performance behavior. It is unusual in that it is a small program that doesn't use recursion and creates a lot of memory allocations. brilirs's performance makes some sense in that there hasn't been any attempt at optimizing how it manages the heap(It's basically a copy paste of brili's implementation). However, I think it's important to note that function-call uses about 10x #dynamic instructions so it's both that brilirs does quite poorly and that brili does surprisingly well with this kind of program.

Basically, you are what you optimize for.

@Pat-Lafon Pat-Lafon deleted the brilirs branch April 7, 2022 16:52
@sampsyo
Copy link
Owner

sampsyo commented Apr 7, 2022

That all makes a lot of sense! Thanks for adding the extra color here, though; that's definitely interesting to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants