Join GitHub today
[MRG] EHN handling sparse matrices whenever possible #316
What does this implement/fix? Explain your changes.
Enhancement to handle sparse matrices (+pandas dataframe, list, etc.).
ADASYN and SMOTE and ClusterCentroids cannot be supported straightforwardly. I need more thought about it.
Any other comments?
@@ Coverage Diff @@ ## master #316 +/- ## ========================================== - Coverage 98.19% 97.96% -0.23% ========================================== Files 66 66 Lines 3978 3924 -54 ========================================== - Hits 3906 3844 -62 - Misses 72 80 +8
It is a bit tricky for the
For SMOTE, we can compute effectively the sparse matrix but then I am not sure it makes sense to have a lot of zero there. It would make sense inside a categorical-SMOTE I think.
For the review, I did not modified the test and added a common test in which we check that dense sampling provide the same thing than a sparse sampling.
Regarding the ClusterCentroids, I think that a PR making a nearest-neighbors voting instead of generating the centroids will be good.
I can imagine a parameter
@chkoar WDYT? I know that you were thinking about that since a while.