Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: more consistent error message for MultiIndex.from_arrays #25189

Merged
merged 7 commits into from
Feb 20, 2019
8 changes: 7 additions & 1 deletion pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,11 +324,17 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
codes=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['number', 'color'])
"""
error_msg = "Input must be a list / sequence of array-likes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe think about how to improve the actual message as well, because on a first read I was interpreting this as "Input must be [a list] or [a sequence of array-likes]" (while of course it is "[list or sequence] of array-likes"), which confused me at first ..

To be true to the code, what it actually needs to be is a "list-like of list-likes"? Which is also not that nice to write ..
I am wondering if a more strict error message (stricter than what we allow), something like "Input must be a list of arrays" is not actually easier to understand for users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking that the messages should be changed from Input must be... to something along the lines of 'arrays' parameter of MultiIndex.from_arrays must be... and then regurgitate whatever is in the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if a more strict error message (stricter than what we allow), something like "Input must be a list of arrays" is not actually easier to understand for users.

IIUC the reason that a sequence is accepted is to provide backward compatibility with zip. So sequence does not necessarily need to be mentioned in the docstring.

if not is_list_like(arrays):
raise TypeError("Input must be a list / sequence of array-likes.")
raise TypeError(error_msg)
elif is_iterator(arrays):
arrays = list(arrays)

# Check if elements of array are list-like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we fully test this? (test with tuples as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback: not fully. i've added a test that is basically a cut and paste from another test. could parameterise now or refactor in a follow-on PR. i prefer the later since some other refactoring of the tests may be possible and may detract from the current change.

for array in arrays:
if not is_list_like(array):
raise TypeError(error_msg)

# Check if lengths of all arrays are equal or not,
# raise ValueError, if not
for i in range(1, len(arrays)):
Expand Down
19 changes: 15 additions & 4 deletions pandas/tests/indexes/multi/test_constructor.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,15 @@ def test_from_arrays_iterator(idx):
MultiIndex.from_arrays(0)


def test_from_arrays_tuples(idx):
arrays = tuple(tuple(np.asarray(lev).take(level_codes))
for lev, level_codes in zip(idx.levels, idx.codes))

# tuple of tuples as input
result = MultiIndex.from_arrays(arrays, names=idx.names)
tm.assert_index_equal(result, idx)


def test_from_arrays_index_series_datetimetz():
idx1 = pd.date_range('2015-01-01 10:00', freq='D', periods=3,
tz='US/Eastern')
Expand Down Expand Up @@ -254,11 +263,13 @@ def test_from_arrays_empty():


@pytest.mark.parametrize('invalid_sequence_of_arrays', [
1, [1], [1, 2], [[1], 2], 'a', ['a'], ['a', 'b'], [['a'], 'b']])
1, [1], [1, 2], [[1], 2], [1, [2]], 'a', ['a'], ['a', 'b'], [['a'], 'b'],
(1,), (1, 2), ([1], 2), (1, [2]), 'a', ('a',), ('a', 'b'), (['a'], 'b'),
[(1,), 2], [1, (2,)], [('a',), 'b'],
((1,), 2), (1, (2,)), (('a',), 'b')
])
def test_from_arrays_invalid_input(invalid_sequence_of_arrays):
msg = (r"Input must be a list / sequence of array-likes|"
r"Input must be list-like|"
r"object of type 'int' has no len\(\)")
msg = "Input must be a list / sequence of array-likes"
with pytest.raises(TypeError, match=msg):
MultiIndex.from_arrays(arrays=invalid_sequence_of_arrays)

Expand Down