Qihui/store many vectors #76

xieqihui · 2018-02-07T14:11:48Z

Implement store_many_vectors() for RedisStorage() using pipeline(). This has significantly increased speed.
Add store_many_vectors() support for Storage() and MemoryStorage() by simply using a for loop of store_vector().

…. Significantly increase the speed

amorgun · 2018-02-07T14:42:40Z

nearpy/storage/storage_memory.py

+        Store a batch of vectors.
+        Stores vector and JSON-serializable data in bucket with specified key.
+        """
+        for idx, v in enumerate(vs):


I suggest

if data is None: data = itertools.repeat(data) for v, k, d in zip(vs, bucket_keys, data): self.store_vector(hash_name, k, v, d)

By the way you can use this code as default implementation at the base Storage class. I am not sure if this is a good idea though.

I have revised the storage_memory following your suggestion.
However, I think it may not be good to implement this as default implementation at the base Storage. At least we find that Redis can use a more natural way to achieve it more efficiently. Leaving store_many_vectors() as an abstract method may force future contributors to think about efficient ways when implementing for other storage options.

@xieqihui I agree. Let's keep it an abstract method.

amorgun · 2018-02-12T08:12:22Z

nearpy/storage/storage_memory.py

+        """
+        if data is None:
+            data = itertools.repeat(data)
+        for v, k, d in zip(vs, bucket_keys, data):


For better performance in Python 2 use

from future.builtins import zip

Added this line in commit 8d97980

amorgun · 2018-02-13T08:08:27Z

nearpy/storage/storage_redis.py

+        """
+        with self.redis_object.pipeline() as pipeline:
+            for idx, v in enumerate(vs):
+                redis_key = self._format_redis_key(hash_name, bucket_keys[idx])


I suggest extracting internal method _add_vector(self, hash_name, bucket_keys, vs, data, redis) and using it as

def store_vector(self, hash_name, bucket_key, v, data): self._add_vector(hash_name, bucket_key, v, data, self.redis) def store_many_vectors(self, hash_name, bucket_keys, vs, data): with self.redis_object.pipeline() as pipeline: for bucket_key, v in zip(bucket_keys, vs): self._add_vector(hash_name, bucket_key, v, data, pipeline) pipeline.execute()

It should help with code duplication.

@amorgun @xieqihui Are you happy with the state of the pull request? Would merge it in then.

Sorry for the long delay. I haven't added unit tests for this PR. Pls give me one week's time, i will finish it up.

@xieqihui I was inactive for a long time here as well, no problem. We all have real jobs right :)

Take your time! Anything between 1 and 4 weeks would be nice, so that we move one a bit.

Thanks for responding so fast!

Cheers

@pixelogik Two of the issues I pointed out in comments are still here and I would like to see them fixed.

@amorgun Thanks for the feedback. It's in progress.

Thanks for this suggestion. I included this modification in commit 4e001a1

xieqihui · 2018-08-06T10:28:45Z

@pixelogik @amorgun I have addressed all your feedback and added test cases. However, the CI checks failed in setting up environment for Python 3.3 tests. It doesn't seem to be related to my code. Could you please help check it?

pixelogik · 2018-08-06T19:04:31Z

@xieqihui I have no time this week because of work. Do you have capacity @amorgun ?

amorgun · 2018-08-07T09:47:49Z

@pixelogik The request looks good to me. I fixed the issue with CI in #80

xieqihuiPG added 2 commits February 7, 2018 22:06

add Engine.store_many_vectors() for uploading to Redis using pipeline…

5cffee9

…. Significantly increase the speed

add store_many_vectors() to Storage() and MemoryStorage()

8c33cb5

xieqihui mentioned this pull request Feb 7, 2018

Please fix the Redis store_many_vectors_case #75

Open

amorgun reviewed Feb 7, 2018

View reviewed changes

improve store_many_vectors in MemoryStorage

9616a23

amorgun reviewed Feb 12, 2018

View reviewed changes

amorgun reviewed Feb 13, 2018

View reviewed changes

xieqihuiPG added 4 commits August 6, 2018 17:29

import future zip

8d97980

add _add_vector() in storage_redis.py

4e001a1

add test cases for store_many_vectors() for RedisStorageTest

936e9bb

import future zip in storage tests

efb6d13

pixelogik merged commit efb6d13 into pixelogik:master Oct 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qihui/store many vectors #76

Qihui/store many vectors #76

xieqihui commented Feb 7, 2018

amorgun Feb 7, 2018

amorgun Feb 7, 2018 •

edited

Loading

xieqihui Feb 11, 2018

pixelogik Feb 11, 2018

amorgun Feb 12, 2018

xieqihui Aug 6, 2018

amorgun Feb 13, 2018

pixelogik Jul 18, 2018

xieqihui Jul 18, 2018

pixelogik Jul 18, 2018

amorgun Jul 18, 2018

pixelogik Jul 18, 2018

xieqihui Aug 6, 2018

xieqihui commented Aug 6, 2018

pixelogik commented Aug 6, 2018

amorgun commented Aug 7, 2018

Qihui/store many vectors #76

Qihui/store many vectors #76

Conversation

xieqihui commented Feb 7, 2018

Choose a reason for hiding this comment

amorgun Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xieqihui commented Aug 6, 2018

pixelogik commented Aug 6, 2018

amorgun commented Aug 7, 2018

amorgun Feb 7, 2018 •

edited

Loading