Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed kgraph, caching train/test data #21

Merged
merged 1 commit into from
Jun 9, 2016
Merged

Conversation

aaalgo
Copy link
Contributor

@aaalgo aaalgo commented Jun 9, 2016

  • Fixed KGraph performance issue (substantial changes to pykgraph implementation to remove the per-batch query overhead, which cannot be amortized with evaluated with batch-size 1).
  • Allow saving and reusing KGraph index.
  • Caching train test data to speedup evaluation.

@@ -213,18 +213,28 @@ def __init__(self, metric, P):
self._metric = metric

def fit(self, X):
os.environ['OMP_THREAD_LIMIT'] = '40'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if i love this, but i guess it's fine for now

i'm trying to make up my mind whether we should limit ann-benchmarks to 1 core or not. one idea i've been thinking about is let's not impose any limitations but run query using a thread pool – only downside is if thrashing kills the L2/L3 cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for index construction which follows existing code of NmslibReuseIndex. Search is still single threaded (I'll post about construction time separately.)
Regarding search time parallelization, KGraph batch search achieves 3.4x speedup on a 4 core CPU with hyperthreading on the glove dataset.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i know it's only for index construction. I think it's fine to use multiple threads for index construction. Maybe I'll add support for that in Annoy too

@erikbern
Copy link
Owner

erikbern commented Jun 9, 2016

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants