-
Notifications
You must be signed in to change notification settings - Fork 68
Used some standard Nim features making it faster #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Well, this definitely goes beyond "naive" implementation. Let's have it a separate implementation. |
|
I wouldn't really say it's that far from naïve, basically a port from the C++ version with the But a separate implementation is fine. Should I just create a separate file in the Nim folder? |
|
@PMunch is there a perf difference between proc {.inline.} and templates here? |
|
I am still using Nim 0.18 and here are my results:
We are starting to have a tight compentiton on this edge :) Please, put this solution in a separate file. |
|
Nothing considerable: |
FWIW, I've looked at the code and the changes, and IMO this shouldn't be a separate implementation, but the (only) implementation used. |
|
I prefer to use the rule of using the least powerful construct available between proc/macro/templates.
|
|
Pfft, D using no D runtime and only C stdlib is cheating 😃 My Nim version uses only standard Nim features and the garbage collector. Just kidding, good job of the D guys to be able to push it that far, would be interesting to see if I could push Nim even further, but then it wouldn't really be Nim. The Nim code you have there is something you could expect to find in a library or the stdlib. In fact, save for the stdout.write, this is pure Nim which can run on the JS target as well. Performance isn't super great though (native JavaScript code: 1.340s, 49MiB; Nim JavaScript code: 2.855s, 25MiB). |
|
That's true @mratsim, both have their pros and cons (maybe you wanted to run it twice?). In this case though the performance is about the same, templates appear to be slightly faster, but that could just be measuring inaccuracy. |
I completely agree that it is the code quality that you expect to see in stdlib and best libraries out there, but our original idea was to see where one could arrive in each language with initial implementation (i.e. before you start profiling it, if that will ever even happen). Yet, given a huge interest in the topic (I have merged 22 PRs in the last 24 hours), I am going to create a separate table in the main README, where we can show off all the |
|
The key speedup for D happens to be unrelated to no-RT/C stdlib, it's about making |
|
Aah, I thought about doing that as well. But I couldn't be bothered. Might have a peek at your implementation and see if I can copy it over to Nim to see if we get any further speedup. @frol I get your point, and I agree that |
|
@PMunch Well, none of the other solutions (except C++ which is used for the "bare metal" implementation base-line) breaks the common API and that is the design. If that is changed, we will have to redo everything from scratch. |
From the readme:
Well, the "initial implementation" hugely depends on the person writing the code. Somebody less experienced would write something slow and/or memory inefficient (been there, done that), while somebody more experienced would write something similar to @PMunch's version (call it "idiomatic a.k.a. common practices" Nim-version, if you will) right from the start. I still think this is not "show off all the dirty tricks" version, but the regular one. |
Please, let's have two versions:
|
|
BTW, is there a way to get a statically linked executable with Nim? |
|
Statically linked to what? BTW, here is what dirty tricks looks like in Nim: #32 |
|
@frol Simillar to how you do it in C, pass |
To all the runtimes (including libc), like |
Use *Node instead of Node (same as D/C++). This changes performance from: real 0m3.840s user 0m4.647s sys 0m0.087s to: real 0m0.653s user 0m0.709s sys 0m0.014s On my machine (core i7-6700).
…ation options to speed Swift solution by 20%
* Make gitignore more local, and add .vs directory. * (minor) reformat project file * Don't use prerelease sdk; appears if anything to have slightly higher startup cost * Allow using C# 7.1 features - particularly tuples. * Inline/simplify * Remove redundant namespace qualification * Remove redundant initializers * Use expression-bodied-constructor * Use var * Elide redundant arguments * More var * Use expression-bodied members for one-liners * Use else-if to emphasize mutually exclusive options * Use tuples rather than special purpose class * Mark non-instantiable class static * Update readme
…ead of a notEmpty boolean field
- Avoids using 'return' as specified in NEP1 - Merge 'let' and 'var' statements into blocks NEP1: https://nim-lang.org/docs/nep1.html
The Tree object is fairly lightweight, so there shouldn't be any good reason to place it on the heap instead of the stack.
This makes Nim faster than C++ raw-pointer version
|
Sorry about that, didn't mean for all those commits to end up there.. |
|
I updated the manual version to also use my optimisations. And moved my optimisations into |
|
@PMunch Thank you! |
|
@mratsim +1 |
|
@yglukhov I think the problem was that it didn't follow the same pattern as the others since it takes in var arguments instead of returning a compound type. If it's just the templates I guess we could change it. I notice a very slight dip in performance from 0.282s to 0.291s averaged over 1000 runs (minimum was 0.193s and 0.195s respectively). |
That is exactly right. |
|
Why is Nim with |
|
@PMunch I have just restructured the existing README and rerunning all the results at the moment. Bumped into Go: #42 (comment) |
Changed a couple of procedures into templates, made the code a bit cleaner by removing superfluous
result =statements. Also switched to echo no newline like the C++ version, and switched to taking in editableNodes instead of having to return tuples ofNodes (similar to how C++ uses pointers).This makes Nim faster than C++ raw-pointer version when run with the
markAndSweepGC, and the standard GC uses less memory than C++ raw-pointers (at least in my testing, averaged over 100 runs) but it is a bit slower.On the devel branch the
markAndSweepGC also drops to ~800KiB of memory usage, but leaving it at 5MiB for now is fine if you don't want to try that.EDIT: By the way, to enable LTO compile like this:
nim c -d:release --gc:markAndSweep --passC:-flto main.nim