Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ENH: add sparse op for int64 dtypes #13848
Conversation
sinhrks
added Enhancement Numeric Sparse
labels
Jul 30, 2016
sinhrks
added this to the
0.19.0
milestone
Jul 30, 2016
sinhrks
referenced
this pull request
Jul 30, 2016
Merged
ENH: Sparse int64 and bool dtype support enhancement #13849
sinhrks
added the
Dtypes
label
Jul 30, 2016
codecov-io
commented
Jul 30, 2016
•
Current coverage is 85.28% (diff: 98.00%)@@ master #13848 diff @@
==========================================
Files 139 139
Lines 50020 50046 +26
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42657 42682 +25
- Misses 7363 7364 +1
Partials 0 0
|
|
Haven't had a chance to look through the code yet, but what are the rules around alignment and potentially recasting the dtype? import numpy as np
import pandas as pd
s1 = pd.SparseSeries(np.arange(4), dtype=np.int64, fill_value=0)
s2 = pd.SparseSeries(np.arange(4), index=range(1, 5), dtype=np.int64, fill_value=0)
s1 + s1 # OK
s1 + s2 # errorTraceback (most recent call last):
File "script.py", line 8, in <module>
s1 + s2 # error
File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 56, in wrapper
return _sparse_series_op(self, other, op, name)
File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 81, in _sparse_series_op
series=True)
File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/array.py", line 119, in _sparse_array_op
sparse_op = getattr(splib, opname)
AttributeError: module 'pandas._sparse' has no attribute 'sparse_add_float64' |
|
@TomAugspurger The latter case looks work on my branch, the error seems to show that sparse.pyx is not re-compiled properly. I'm adding more tests related to alignment:) |
|
My bad, just got to that section of the code. Recompiled and it does indeed work |
jreback
commented on an outdated diff
Aug 1, 2016
| @@ -301,6 +301,29 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci | ||
| ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`) | ||
| +.. _whatsnew_0190.sparse: | ||
| + | ||
| +Sparse changes | ||
| +~~~~~~~~~~~~~~ | ||
| + | ||
| +These changes conform sparse data to support more dtypes, and for work to make a smoother experience with data handling. |
|
|
jreback
commented on an outdated diff
Aug 1, 2016
| @@ -301,6 +301,29 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci | ||
| ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`) | ||
| +.. _whatsnew_0190.sparse: | ||
| + | ||
| +Sparse changes | ||
| +~~~~~~~~~~~~~~ | ||
| + | ||
| +These changes conform sparse data to support more dtypes, and for work to make a smoother experience with data handling. | ||
| + | ||
| +- Sparse data structure now can preserve ``dtype`` after arithmetic op (:issue:`13848`) | ||
| + |
|
|
jreback
commented on the diff
Aug 1, 2016
| @@ -420,7 +459,12 @@ def astype(self, dtype=None): | ||
| dtype = np.dtype(dtype) | ||
| if dtype is not None and dtype not in (np.float_, float): | ||
| raise TypeError('Can only support floating point data for now') | ||
| - return self.copy() | ||
| + | ||
| + if self.dtype == dtype: | ||
| + return self.copy() | ||
| + else: | ||
| + return self._simple_new(self.sp_values.astype(dtype), | ||
| + self.sp_index, float(self.fill_value)) |
jreback
Contributor
|
|
rebase in light of changes #13787 |
|
thanks! nice cleanup |
jreback
closed this
in 45d54d0
Aug 3, 2016
sinhrks
deleted the
sinhrks:sparse_op2 branch
Aug 3, 2016
|
FYI: 8ec7406 as we no longer depend on generated; was causing recompilation of algos.pyx every time :< |
|
small dtype adj needed on windows
|
|
Thx, will fix. |
sinhrks commentedJul 30, 2016
•
edited
git diff upstream/master | flake8 --diffAs a first step for #667, numeric op can now preserve
int64dtype. On current master, dtype is reset tofloat64after op.NOTE:
int64SparseSeries.__floordiv__test is skipped because denseSeriesalso has inconsistency innan/infhandling (#13843). Currently it outputs the same result asfloat64.