Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance should be measured and improved #8

Open
kwalcock opened this issue Sep 26, 2022 · 8 comments
Open

Performance should be measured and improved #8

kwalcock opened this issue Sep 26, 2022 · 8 comments

Comments

@kwalcock
Copy link
Owner

No description provided.

@kwalcock
Copy link
Owner Author

kwalcock commented Sep 26, 2022

Time to tokenize 4 sentences 10,000 times. Scala looped just 1,000 times and then was multiplied by 10. Scala used the j4rs interface which is still subject to improvement.

Language Time Notes
Python 57 sec serial
Scala 1112 sec serial
Scala 680 sec parallel, 4 threads with sentences in parallel
Scala 126 sec parallel, 8 threads with documents in parallel

Code for Python is here and code for Scala is here.

FYI @MihaiSurdeanu

@MihaiSurdeanu
Copy link

Thanks! This is incredibly bad :)
How can a thin Rust wrapper be so much slower than Python?

@kwalcock
Copy link
Owner Author

I believe this particular thin Rust wrapper serializes everything to json text and then deserializes it, and that includes converting individual ints of an array to text to Integer to int, etc. I decided to try a straight jni version. However, I think it would be useful to try out the interface already (as soon as I can publish it) while waiting for a faster version, because there is so much else downstream that needs to be tried out and might not work.

@MihaiSurdeanu
Copy link

Agreed on both points!

@kwalcock
Copy link
Owner Author

kwalcock commented Oct 3, 2022

A straight JNI version is faster than J4rs, but it looks like the key is to use the release version rather than the debug version. In C programs the difference is usually fairly minimal, like 2x, but here for Rust, the speedup is about 16 times! It is now on par with Python. 43 ~~ 45 is within the variation of runs.

Language Variation Build Threads Time (sec) Notes
Python N/A N/A 1 43
Scala JNI debug 1 717
Scala JNI debug 4 439 by sentence
Scala JNI debug 8 92 by document
Scala JNI release 1 45
Scala JNI release 4 46 by sentence
Scala JNI release 8 21 by document

@MihaiSurdeanu
Copy link

Awesome!!

However, the multi-threaded version is not showing the expected speedup. Do you think JNI has some syncs in there that we are not aware of?

@kwalcock
Copy link
Owner Author

kwalcock commented Oct 3, 2022

For the release version above, I was still multiplying by 10 and maybe in that 2.1 seconds there wasn't enough space to make a difference or some of my processors were just busy with other things. Here are some more measurements that show a 5x speedup. The "by sentence" parallelism isn't the best test because there is one long sentence out of the four and the other threads have to wait for that to finish. The parallelism is also applied in an inner loop so that the overhead is incurred 10,000 times. The "by document" parallelism should hopefully approach the number of processors. 5 seems close enough to 8 that I don't think something I've done is getting in the way.

Language Variation Build Threads Time (sec) Notes
Scala JNI release 1 36
Scala JNI release 1 35
Scala JNI release 4 31 by sentence
Scala JNI release 4 30 by sentence
Scala JNI release 8 7 by document
Scala JNI release 8 7 by document

@MihaiSurdeanu
Copy link

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants