-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: IntervalIndex.to_tuples doesn't handle NA correctly #18756
Comments
hmm I don't really know why we have this method, its not semantically correct because you lose the cc @shoyer |
My original reasoning for this method was to make it easier to use other APIs that need tuples, e.g., to convert an IntervalIndex into levels of a MultiIndex with Part of the need for this is that it surprisingly difficult to create a NumPy array of tuples, and this method makes that easy. For example, consider:
That said, this doesn't come up that often and the Given that the behavior is marginally useful, unambiguous (aside from handling missing values) and self-contained, I think we should lean towards keeping it around. With regards to the handling of NaN, I did choose this existing behavior intentionally. This ensures that the result has a homogeneous type (a list of all tuples), with values given by tuples of length 2. On the other hand, I do see why returning scalars could make more sense in some cases. Maybe a keyword argument |
ok @shoyer makes sense about the rationale for this method. |
@shoyer : Thanks, that makes sense. I like the import numpy as np
import pandas as pd
import pandas.util.testing as tm
from pandas.core.common import _asarray_tuplesafe
data1 = zip(np.array([1, 3, np.nan]), np.array([2, 4, np.nan]))
data2 = zip(np.array([1, 3, np.nan]), np.array([2, 4, np.nan]))
idx1 = pd.Index(_asarray_tuplesafe(data1))
idx2 = pd.Index(_asarray_tuplesafe(data2)) Then In [2]: idx1
Out[2]: Index([(1.0, 2.0), (3.0, 4.0), (nan, nan)], dtype='object')
In [3]: idx2
Out[3]: Index([(1.0, 2.0), (3.0, 4.0), (nan, nan)], dtype='object')
In [4]: idx1 == idx2
Out[4]: array([ True, True, False], dtype=bool) But the problem is In [5]: idx1.equals(idx2)
Out[5]: False
In [6]: tm.assert_index_equal(idx1, idx2)
---------------------------------------------------------------------------
AssertionError: Index are different
Index values are different (33.33333 %)
[left]: Index([(1.0, 2.0), (3.0, 4.0), (nan, nan)], dtype='object')
[right]: Index([(1.0, 2.0), (3.0, 4.0), (nan, nan)], dtype='object') So I'm not quite sure what should be done for testing. Looks like fixing I guess the other workaround would be to use |
this might be a shortcoming of |
Code Sample, a copy-pastable example if possible
Problem description
A tuple of
np.nan
is returned instead of justnp.nan
. Note thatvalues
andtolist
both yieldnp.nan
:Expected Output
np.nan
instead of a tuple ofnp.nan
:Output of
pd.show_versions()
INSTALLED VERSIONS
commit: d2fd22e
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0.dev0+353.gd2fd22e
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: