When adding data using make-instance, only the first n attributes appear to exist, where n in the size of the input data vector. This means input data can't be sparse, although the code to follow appears to support it by explicitly labeling the values as pairs?
It seems like the right thing to do as get the number of attributes from the dataset instead? That's what this patch does. It might also be nice to support usage of SparseInstance instead of only Instance? I couldn't figure out a nice way to do this though!
Create new instances to have as many attributes as the dataset, not t…
…he (potentially sparse) input vectors.
The type hint should probably be weka.core.Instances since that is where .numAttributes originates from and it would allow this fn to be used with plain Instances objects as well.
Ah, I just noticed that the type hint was on the original code as well and you just copied it over. It would be better to use Instances but I'll just merge this in since it won't hurt anything. Thanks!
I have never used SparseInstance but that would be a nice thing to support especially considering how memory inefficient weka is at storing things. It seems that in order to support creating SparseInstance the creating function would need to take a map instead of a vector and then we could use the dataset info to get the indices for use the this constructor: SparseInstance(double weight, double attValues, int indices, int maxNumValues).
SparseInstance(double weight, double attValues, int indices, int maxNumValues)
BTW, with this patch how can you be sure that the attributes values passed in will correspond to the correct attributes?