Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Large data sets cannot be used. #7

Closed
younghj opened this issue Oct 7, 2016 · 2 comments
Closed

Large data sets cannot be used. #7

younghj opened this issue Oct 7, 2016 · 2 comments
Labels

Comments

@younghj
Copy link

younghj commented Oct 7, 2016

When I run the script with facebookresearch's fasttext, it cannot acommodate it.
But the sample script seems to work well.
On this line of the example script:
cp = snn.ClusterIndex(feat, data_to_return)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-61d2fb78a7b1> in <module>()
----> 1 cp = snn.ClusterIndex(feat, data_to_return)

/home/jung4351/pysparnn/pysparnn/cluster_pruning.py in __init__(self, sparse_features, records_data, distance_type, matrix_size, parent)
    144             records_index = np.arange(sparse_features.shape[0])
    145             clusters_size = min(self.matrix_size, num_records)
--> 146             clusters_selection = random.sample(records_index, clusters_size)
    147             clusters_selection = sparse_features[clusters_selection]
    148

/usr/lib/python3.4/random.py in sample(self, population, k)
    309             population = tuple(population)
    310         if not isinstance(population, _Sequence):
--> 311             raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
    312         randbelow = self._randbelow
    313         n = len(population)

TypeError: Population must be a sequence or set.  For dicts, use list(d).
@spencebeecher
Copy link
Contributor

Hi @younghj ! Could you please povide me a little more information? What data file are you referencing and how are your constructing the 'feat' variable?

Feel free to attach a notebook file!

@spencebeecher
Copy link
Contributor

I am marking this as invalid until you supply more info. Thanks!

From a closer inspection it appears you might be passing a dictionary and not a list as the data_to_return variable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants