-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qihui/store many vectors #76
Qihui/store many vectors #76
Conversation
xieqihui
commented
Feb 7, 2018
- Implement store_many_vectors() for RedisStorage() using pipeline(). This has significantly increased speed.
- Add store_many_vectors() support for Storage() and MemoryStorage() by simply using a for loop of store_vector().
…. Significantly increase the speed
nearpy/storage/storage_memory.py
Outdated
Store a batch of vectors. | ||
Stores vector and JSON-serializable data in bucket with specified key. | ||
""" | ||
for idx, v in enumerate(vs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest
if data is None:
data = itertools.repeat(data)
for v, k, d in zip(vs, bucket_keys, data):
self.store_vector(hash_name, k, v, d)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way you can use this code as default implementation at the base Storage
class. I am not sure if this is a good idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have revised the storage_memory following your suggestion.
However, I think it may not be good to implement this as default implementation at the base Storage. At least we find that Redis can use a more natural way to achieve it more efficiently. Leaving store_many_vectors() as an abstract method may force future contributors to think about efficient ways when implementing for other storage options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xieqihui I agree. Let's keep it an abstract method.
""" | ||
if data is None: | ||
data = itertools.repeat(data) | ||
for v, k, d in zip(vs, bucket_keys, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better performance in Python 2 use
from future.builtins import zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this line in commit 8d97980
nearpy/storage/storage_redis.py
Outdated
""" | ||
with self.redis_object.pipeline() as pipeline: | ||
for idx, v in enumerate(vs): | ||
redis_key = self._format_redis_key(hash_name, bucket_keys[idx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest extracting internal method _add_vector(self, hash_name, bucket_keys, vs, data, redis)
and using it as
def store_vector(self, hash_name, bucket_key, v, data):
self._add_vector(hash_name, bucket_key, v, data, self.redis)
def store_many_vectors(self, hash_name, bucket_keys, vs, data):
with self.redis_object.pipeline() as pipeline:
for bucket_key, v in zip(bucket_keys, vs):
self._add_vector(hash_name, bucket_key, v, data, pipeline)
pipeline.execute()
It should help with code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. I haven't added unit tests for this PR. Pls give me one week's time, i will finish it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xieqihui I was inactive for a long time here as well, no problem. We all have real jobs right :)
Take your time! Anything between 1 and 4 weeks would be nice, so that we move one a bit.
Thanks for responding so fast!
Cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pixelogik Two of the issues I pointed out in comments are still here and I would like to see them fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amorgun Thanks for the feedback. It's in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion. I included this modification in commit 4e001a1
@pixelogik @amorgun I have addressed all your feedback and added test cases. However, the CI checks failed in setting up environment for Python 3.3 tests. It doesn't seem to be related to my code. Could you please help check it? |
@pixelogik The request looks good to me. I fixed the issue with CI in #80 |