New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Eigen Library to Remove BLAS Dependency #8
Comments
Really happy you're looking at this! We'd like to use Eigen in the machine learning library, |
OK, great. Hope I can help. I have to check that out. We want to remove the depencecy on BLAS. I assume that this means we want to remove numpy as a requirement as well? |
numpy doesn't require you to have a BLAS library. It fills in its own, with a basic C implementation. I don't usually rely on numpy for performance critical stuff, but it's a nice type to return, so we'll probably keep the dependency. What we want to avoid is a situation where to make the library perform adequately, you have to modify your system, or compile some code, etc. We want |
OK thank you for the detail on the build and numpy. Here is the feature branch: https://github.com/init-random/sense2vec/tree/eigen-integration |
Had no idea this would look so simple! We need to come up with a benchmark for this. Maybe compute just fetch the similarity results for the top N words? |
Just the initial commit... but i does not look like it should be too bad. |
There are no benchmark examples sorry. I can have a look if it's confusing? It should just take the most frequent N words from the vocabulary and run the similarity queries for them. |
Not confusing, just wanted to follow a standard if there were existing benchmarks. Happy to take a look. |
What corpus are you using for merge_text.py? Is that what you want the benchmark off of? |
We actually don't need a corpus here, just the trained model. You've downloaded that right? We want to benchmark the similarity queries. The corpus is the Reddit comment corpus. |
Got it... No, I had not downloaded the model yet, I had built my own. I'll do that, thanks. |
I added a simple eigen benchmark. Basically it grabs the top 50, 100, 500, and 1000 most frequent model terms and finds the top 50 most similar tokens for each. This iterates for each word in the top N to find the top 50 most similar for each word. If you were looking for something different, let me know.
|
Thanks, I just ran a comparison: math.h:
eigen:
I would expect eigen to perform as good or better. Any idea what could be wrong? |
I'll look into this to see where the differences may be. Thanks. |
I wrote a C script to just calculate the time of the dot product. The math.h implementation seems 13x more efficient. Each iteration is is 500000 dot product calculations of a 128 length float vector. Maybe it is the creation of the eigen vector that takes the time
|
It looks like for the eigen implementation, the dot product is about 40% of the compute time and 60% for the creation of the vectors (30% for each vector). So, even if there was no overhead for the creation of the vectors the eigen dot product would still be about 1 sec (2.6 * 40%), which is still 5x the math.h implementation. |
Blas timings below. Have you looked at FLENS? http://apfel.mathematik.uni-ulm.de/~lehn/FLENS/index.html Do you think it is worthwhile to look into this?
|
Looking at the code I suspect eigen copies the vector every time Alternatively, could you run the timings on dot/norm of eigen vs. math.h outside sense2vec? I can help you later then integrating it more deeply into sense2vec. |
No, we didn't look into FLENS, yet. Any reliable performance numbers would be great to make a good decision here. |
FLENS looks like it's asking for system BLAS for optimisation. We're anxious to avoid that. Have you seen the implementations here? https://bitbucket.org/eigen/eigen/src/c3c494ec0a006d25dd6e6d65864b0fb51fe4da56/blas/?at=default This issue is starting to block release of the neural network code in |
The timing provided are standalone, outside sense2vec. In essence the code is
I can take a look into FLENS timings. Also, I think it has its own BLAS implementation. From http://apfel.mathematik.uni-ulm.de/~lehn/FLENS/index.html -- FLENS gives you generic implementation of BLAS I can install and see. Numpy is a dependency. What if there is a blas used there, np.config.show()? Would it be OK to link against this blas and otherwise use eigen or flens? I suppose that is not the case for |
FLENS did not need to be linked with a BLAS implementation. Timings below, which are much better than eigen. Also, this library has the option to be linked with BLAS (-DWITH_OPENBLAS and other blas implementations supported), which look very good. Compiler needs -std=c++11 flag, not sure if this is an issue.
And when linked with openblas.
|
Please take a look at the simd branch (https://github.com/spacy-io/sense2vec/tree/simd). It relies on |
Great. Interested in taking a look. I'll create a new branch for this implementation. |
Just to follow up on this. Here are the timings for SIMD.
Here is the code I used for this comparison.
The
|
Numbers are milliseconds per iteratio? The results from your benchmark don't align with mine. I assume you overlooked something. Two ideas:
You should be very suspicious as long as the simd approach is slower than a naive math.h implementation. |
Timings are seconds for the complete 500000 iterations, but you are correct on both accounts. I'll re-run the simd times this evening with the proper compile flags. Thanks. |
The first step will be to integrate the Eigen library into the codebase and have both BLAS and Eigen paths. When formalized we can then remove the BLAS dependency.
The text was updated successfully, but these errors were encountered: