Fix sparse input data (missing attributes). #2

merged 1 commit into from Sep 27, 2012


None yet
2 participants

jaley commented Sep 27, 2012

When adding data using make-instance, only the first n attributes appear to exist, where n in the size of the input data vector. This means input data can't be sparse, although the code to follow appears to support it by explicitly labeling the values as pairs?

It seems like the right thing to do as get the number of attributes from the dataset instead? That's what this patch does. It might also be nice to support usage of SparseInstance instead of only Instance? I couldn't figure out a nice way to do this though!

The type hint should probably be weka.core.Instances since that is where .numAttributes originates from and it would allow this fn to be used with plain Instances objects as well.

Ah, I just noticed that the type hint was on the original code as well and you just copied it over. It would be better to use Instances but I'll just merge this in since it won't hurt anything. Thanks!

bmabey commented Sep 27, 2012

I have never used SparseInstance but that would be a nice thing to support especially considering how memory inefficient weka is at storing things. It seems that in order to support creating SparseInstance the creating function would need to take a map instead of a vector and then we could use the dataset info to get the indices for use the this constructor: SparseInstance(double weight, double[] attValues, int[] indices, int maxNumValues).

bmabey merged commit 9ad6014 into leadtune:master Sep 27, 2012

bmabey commented Sep 27, 2012

BTW, with this patch how can you be sure that the attributes values passed in will correspond to the correct attributes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment