Python3/Anaconda compatability #12

huu4ontocord · 2017-03-12T18:49:32Z

I got it working for Anaconda3 by doing the following:

In cluster_pruning.py

123c123
< records_index = np.arange(features.shape[0])

        records_index = list(np.arange(features.shape[0]))

131c131
< np.arange(clusters_selection.shape[0]))

                             list(np.arange(clusters_selection.shape[0])))

223c223
< if feature <> None and record <> None:

    if feature != None and record != None:

273a274

                    elements = list(elements)

In matrix_distance.py:

123c123,124
< arg_index = np.random.choice(len(scores), k, replace=False)

            lenScores = len(scores)
            arg_index = np.random.choice(lenScores, min(lenScores, k), replace=False)

329a331

In init.py:

7c7
< from cluster_pruning import ClusterIndex, MultiClusterIndex

from .cluster_pruning import ClusterIndex, MultiClusterIndex

I you should just create two more files for ClusterIndex and MultiClusterIndex. Otherwise it will cause issues with importing in Python3 and backwards compatability for Python2

The text was updated successfully, but these errors were encountered:

spencebeecher · 2017-03-13T15:36:08Z

Wow thanks! Ill take a look.

Do you have any intuition for why this change needs to happen?
records_index = np.arange(features.shape[0])
to
records_index = list(np.arange(features.shape[0]))

The other changes you suggest should be compatible. And this line
if feature != None and record != None:
should be something like
if (not feature is None) and (not record is None):
Thanks again!!!

huu4ontocord · 2017-03-18T00:10:09Z

"np.range" produces an iterator. It's like "range" in python3. You need to wrap a "list" function around it.

Btw, check out https://github.com/known-ai/KeyedVectorsANN

I folded your code into Gensim's KeyedVectors.

It was easier to fold all the code it into one file, but I can refactor to use the pysparnn package when its compatible with python 3. I made some changes to add a new method "most_similar", and storing indexes as the records_data instead of the actual words. This saves some space.

My model 260MB, and I'd like to find out how to reduce this size. I suspect it's mostly duplicates of the matrices.

Feel free to email me directly at ontocord@gmail.com

spencebeecher · 2017-03-18T21:35:54Z

Thanks @known-ai ! I made the requested changes in this diff - 1f976fa

spencebeecher · 2017-03-18T21:36:04Z

Ill - send you an email.

spencebeecher · 2017-03-18T22:04:47Z

I am not sure that there is much extra that is kept around in memory.

Check this modification to DenseMatrix
dense_matrix-Copy1.pdf which also includes a study of data sizes.

input features matrix is about the size of the ClusterIndex data structure
You can reduce memory footprint by 4x (so long as your data can fit well into an int16) - see the DenseIntCosineDistance class.

huu4ontocord · 2017-03-19T00:42:20Z

Cool! I will try it Spence. I am experimenting with a different selection of the clusters based on a derived ontology from the vectors. I'll check out the paper! Huu

…

On Mar 18, 2017, at 6:04 PM, Spence Beecher ***@***.***> wrote: I am not sure that there is much extra that is kept around in memory. Check this modification to DenseMatrix dense_matrix-Copy1.pdf which also includes a study of data sizes. input features matrix is about the size of the ClusterIndex data structure You can reduce memory footprint by 4x (so long as your data can fit well into an int16) - see the DenseIntCosineDistance class. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

spencebeecher · 2017-03-19T15:23:45Z

^ very cool. I think there is probably a 'better' (for some def of better) way to pick the clusters other than random. I am going to leave this open but ill close it in 2 weeks if the thread dies down.

spencebeecher closed this as completed May 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python3/Anaconda compatability #12

Python3/Anaconda compatability #12

huu4ontocord commented Mar 12, 2017

spencebeecher commented Mar 13, 2017

huu4ontocord commented Mar 18, 2017

spencebeecher commented Mar 18, 2017

spencebeecher commented Mar 18, 2017 •

edited

spencebeecher commented Mar 18, 2017

huu4ontocord commented Mar 19, 2017 via email

spencebeecher commented Mar 19, 2017

Python3/Anaconda compatability #12

Python3/Anaconda compatability #12

Comments

huu4ontocord commented Mar 12, 2017

123c123 < records_index = np.arange(features.shape[0])

131c131 < np.arange(clusters_selection.shape[0]))

223c223 < if feature <> None and record <> None:

123c123,124 < arg_index = np.random.choice(len(scores), k, replace=False)

7c7 < from cluster_pruning import ClusterIndex, MultiClusterIndex

spencebeecher commented Mar 13, 2017

huu4ontocord commented Mar 18, 2017

spencebeecher commented Mar 18, 2017

spencebeecher commented Mar 18, 2017 • edited

spencebeecher commented Mar 18, 2017

huu4ontocord commented Mar 19, 2017 via email

spencebeecher commented Mar 19, 2017

123c123
< records_index = np.arange(features.shape[0])

131c131
< np.arange(clusters_selection.shape[0]))

223c223
< if feature <> None and record <> None:

123c123,124
< arg_index = np.random.choice(len(scores), k, replace=False)

7c7
< from cluster_pruning import ClusterIndex, MultiClusterIndex

spencebeecher commented Mar 18, 2017 •

edited