Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Melting with not present column does not produce error #23575

Merged
merged 34 commits into from
Nov 21, 2018
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
855985d
check for columns in dataframe
michaelsilverstein Nov 8, 2018
40fdb05
check for columns in dataframe
michaelsilverstein Nov 8, 2018
9670da2
check difference with Index; use {} str formatting
michaelsilverstein Nov 13, 2018
3ffc870
missing.any()
michaelsilverstein Nov 13, 2018
8139f78
started test
michaelsilverstein Nov 13, 2018
0a94650
added to whatsnew
michaelsilverstein Nov 13, 2018
d0f6d23
PEP criteria
michaelsilverstein Nov 13, 2018
6c76161
`missing.empty` to accommodate MultiIndex
michaelsilverstein Nov 13, 2018
ad3d926
rm `*`
michaelsilverstein Nov 13, 2018
e097a87
rm comment
michaelsilverstein Nov 13, 2018
5ff3a32
add test for id_var and multiple missing
michaelsilverstein Nov 13, 2018
fcbda15
reformat error statement; Value->KeyError
michaelsilverstein Nov 13, 2018
3175b34
simplified test
michaelsilverstein Nov 13, 2018
515fb9f
Issue -> GH
michaelsilverstein Nov 13, 2018
c7d6fcf
PEP criteria
michaelsilverstein Nov 13, 2018
5911cc3
PEP criteria
michaelsilverstein Nov 13, 2018
47ca7fc
test not working now
michaelsilverstein Nov 13, 2018
d0ee9c5
regex compatible match
michaelsilverstein Nov 14, 2018
c75ab23
PEP criteria
michaelsilverstein Nov 14, 2018
32ed22c
move test to TestMelt() class
michaelsilverstein Nov 14, 2018
e629b2a
PEP
michaelsilverstein Nov 14, 2018
89de406
PEP
michaelsilverstein Nov 14, 2018
1d13f4a
PEP
michaelsilverstein Nov 14, 2018
479b761
Merge branch 'master' into dev_melt_column_check
michaelsilverstein Nov 14, 2018
01e8d74
resolving conflicts
michaelsilverstein Nov 15, 2018
6762b21
Merge branch 'master' of https://github.com/pandas-dev/pandas into de…
michaelsilverstein Nov 15, 2018
eae7716
Merge branch 'master' of https://github.com/pandas-dev/pandas into de…
michaelsilverstein Nov 15, 2018
fba641f
handle multiindex columns
michaelsilverstein Nov 15, 2018
06b7cdb
test single var melt with multiindex
michaelsilverstein Nov 15, 2018
39c746b
test single var melt with multiindex
michaelsilverstein Nov 15, 2018
af170e1
pep8 and index sorting
michaelsilverstein Nov 16, 2018
4c9bc9f
rm extra description
michaelsilverstein Nov 21, 2018
c59d29f
add comment
michaelsilverstein Nov 21, 2018
0db8838
add MI tests
michaelsilverstein Nov 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ New features
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)
- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing
the user to specify which decimal separator should be used in the output. (:issue:`23614`)
- :func:`pd.melt` now requires `id_vars` and `value_vars` to be in the ``DataFrame``
michaelsilverstein marked this conversation as resolved.
Show resolved Hide resolved

.. _whatsnew_0240.enhancements.extension_array_operators:

Expand Down Expand Up @@ -1418,6 +1419,7 @@ Reshaping
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :func:`pandas.melt` when passing column names that are not present in ``DataFrame`` (:issue:`23575`)
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)

.. _whatsnew_0240.bug_fixes.sparse:
Expand Down
17 changes: 17 additions & 0 deletions pandas/core/reshape/melt.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pandas import compat
from pandas.core.arrays import Categorical
from pandas.core.frame import _shared_docs
from pandas.core.indexes.base import Index
from pandas.core.reshape.concat import concat
from pandas.core.tools.numeric import to_numeric

Expand All @@ -24,6 +25,10 @@
def melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
# TODO: what about the existing index?
if isinstance(frame.columns, ABCMultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not especially familiar with melt and multi-index columns, but I don't think this is quite right.

It seems like you need to specify col_level when you have a MI in the columns, so you should probably just be checks against frame.columns.levels[col_level] when you have a MI.

However, it doesn't quite seem that a col_level is required when there's a MI in the columns. The default of pd.melt(df) seems to work, but any time I specified an id_vars or value_vars without col_level I get an uninformative error message. I'm not sure what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to provide col_level for MI when only melting on one level like this example from the docstring (that I added a new test for):

pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])

But you don't need to specify col_level when using all levels of MI:
pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I am doing at L28 is gathering column names from all levels. There are other checks to make sure that melting is performed properly, this will just check to make sure that whatever you pass, it is in your df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think this is ok, can you provdie a comment on what is going on.

cols = [x for c in frame.columns for x in c]
else:
cols = list(frame.columns)
if id_vars is not None:
if not is_list_like(id_vars):
id_vars = [id_vars]
Expand All @@ -32,7 +37,13 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None,
raise ValueError('id_vars must be a list of tuples when columns'
' are a MultiIndex')
else:
# Check that `id_vars` are in frame
id_vars = list(id_vars)
missing = Index(np.ravel(id_vars)).difference(cols)
if not missing.empty:
raise KeyError("The following 'id_vars' are not present"
" in the DataFrame: {missing}"
"".format(missing=list(missing)))
else:
id_vars = []

Expand All @@ -45,6 +56,12 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None,
' columns are a MultiIndex')
else:
value_vars = list(value_vars)
# Check that `value_vars` are in frame
missing = Index(np.ravel(value_vars)).difference(cols)
if not missing.empty:
michaelsilverstein marked this conversation as resolved.
Show resolved Hide resolved
raise KeyError("The following 'value_vars' are not present in"
" the DataFrame: {missing}"
"".format(missing=list(missing)))
frame = frame.loc[:, id_vars + value_vars]
else:
frame = frame.copy()
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/reshape/test_melt.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,14 @@ def test_vars_work_with_multiindex(self):
result = self.df1.melt(id_vars=[('A', 'a')], value_vars=[('B', 'b')])
tm.assert_frame_equal(result, expected)

def test_single_vars_work_with_multiindex(self):
expected = DataFrame({
'A': {0: 1.067683, 1: -1.321405, 2: -0.807333},
'CAP': {0: 'B', 1: 'B', 2: 'B'},
'value': {0: -1.110463, 1: 0.368915, 2: 0.08298}})
result = self.df1.melt(['A'], ['B'], col_level=0)
tm.assert_frame_equal(result, expected)

def test_tuple_vars_fail_with_multiindex(self):
# melt should fail with an informative error message if
# the columns have a MultiIndex and a tuple is passed
Expand Down Expand Up @@ -233,6 +241,34 @@ def test_pandas_dtypes(self, col):
expected.columns = ['klass', 'col', 'attribute', 'value']
tm.assert_frame_equal(result, expected)

def test_melt_missing_columns_raises(self):
# GH-23575
# This test is to ensure that pandas raises an error if melting is
# attempted with column names absent from the dataframe

# Generate data
df = pd.DataFrame(np.random.randn(5, 4), columns=list('abcd'))

# Try to melt with missing `value_vars` column name
msg = "The following '{Var}' are not present in the DataFrame: {Col}"
with pytest.raises(
KeyError,
match=msg.format(Var='value_vars', Col="\\['C'\\]")):
df.melt(['a', 'b'], ['C', 'd'])

# Try to melt with missing `id_vars` column name
with pytest.raises(
KeyError,
match=msg.format(Var='id_vars', Col="\\['A'\\]")):
df.melt(['A', 'b'], ['c', 'd'])

# Multiple missing
with pytest.raises(
KeyError,
match=msg.format(Var='id_vars',
Col="\\['not_here', 'or_there'\\]")):
df.melt(['a', 'b', 'not_here', 'or_there'], ['c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do an example with an MI and columns that are not in the top level of the MI, ideally try with and w/o col_level as well.



class TestLreshape(object):

Expand Down