Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: integer Extension Array #20700

Closed
3 tasks
jreback opened this issue Apr 15, 2018 · 4 comments · Fixed by #21160
Closed
3 tasks

API: integer Extension Array #20700

jreback opened this issue Apr 15, 2018 · 4 comments · Fixed by #21160
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Apr 15, 2018

xref #8640

Could easily imagine an ExtensionArray which uses as an implementation a numpy array of the appropriate dtype and a bitmask in order to fully support Integer NA across the board. I don't think this would be too hard. As a bonus, would be zero-copy compat with pyarrow impl (for the future)

making these the actual default (e.g. when integers are inferred with or w/o nulls) might be non-trivial, but let's implement first. These would give rise to a hierarchy of dtypes, e.g. IntegerDtype, Int8Dtype

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Difficulty Advanced ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 15, 2018
@jreback jreback added this to the 0.24.0 milestone Apr 15, 2018
@jreback
Copy link
Contributor Author

jreback commented Apr 15, 2018

jreback added a commit to jreback/pandas that referenced this issue May 13, 2018
@jreback
Copy link
Contributor Author

jreback commented May 13, 2018

here is a fully-function (extension-wise) integer na: https://github.com/jreback/pandas/tree/intna
doesnt break anything and coexists

I have enabled inference to accept the new types with a Registry, e.g.

In [1]: pd.Series([1,2,3, np.nan], dtype='Int8')
Out[1]: 
0      1
1      2
2      3
3    NaN
dtype: Int8

so construction is pretty flexible now.

next up is ops

cc @TomAugspurger @jorisvandenbossche

@jorisvandenbossche
Copy link
Member

Cool!

Is your intention to do a PR to add this to pandas, or to have it as a separate package for now?

jreback added a commit to jreback/pandas that referenced this issue May 14, 2018
@jreback
Copy link
Contributor Author

jreback commented May 14, 2018

still needs quite a bit more tests / work. (have arithmetic ops done, but need comparison, and more indexing tests)

But i think directly in pandas. Note that this does not actually switch the base inference (e.g. [1,2 ,3]) still resolves to int64, we can do that at a later point). I suspect will have to change quite a lot of tests as we assume float conversions in a myriad of ways.

jreback added a commit to jreback/pandas that referenced this issue May 21, 2018
jreback added a commit to jreback/pandas that referenced this issue May 23, 2018
jreback added a commit to jreback/pandas that referenced this issue May 24, 2018
jreback added a commit to jreback/pandas that referenced this issue May 24, 2018
jreback added a commit to jreback/pandas that referenced this issue May 24, 2018
jreback added a commit to jreback/pandas that referenced this issue May 24, 2018
jreback added a commit to jreback/pandas that referenced this issue May 25, 2018
jreback added a commit to jreback/pandas that referenced this issue May 25, 2018
jreback added a commit to jreback/pandas that referenced this issue May 29, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 4, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 4, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 5, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 7, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 7, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 7, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 8, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 8, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 8, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 10, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 11, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 12, 2018
jreback added a commit to jreback/pandas that referenced this issue Jul 16, 2018
jreback added a commit that referenced this issue Jul 20, 2018
* ENH: add integer-na support via an ExtensionArray

closes #20700
closes #20747
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
* ENH: add integer-na support via an ExtensionArray

closes pandas-dev#20700
closes pandas-dev#20747
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants