Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
SparseArray compatibility issues in e.g. sklearn. Series.shape and SparseArray.shape output not consistent. #21126
import pandas as pd import numpy as np spdf = pd.DataFrame(np.random.rand(5,5)>0.7).astype(float).to_sparse(fill_value=0) spdf.shape == spdf.values.shape
Using packages that internally translate pandas dataframes to arrays (like sklearn) requires vectors and matrices to be specific sizes. Using sparse data matrices in pandas produces unpredictable results as internals of sklearn will translate pandas to SparseArray arrays and check sizes. The size of
Also depending on the order of columns selection (after or before applying values) gives different results as the output of
I would expect the last statement in the code above to return True for all vectors in spdf.