Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Parse two date columns broken in read_csv with multiple headers #15378

Conversation

stephenrauch
Copy link
Contributor

Fix for GH15376

In io/parsers/_try_convert_dates() when selecting columns based on a
column index from a set of columns with multi-level names, the column
name was converted to a string. This appears to be a bug since the
name was a tuple before the conversion. This causes problems
downstream when there is an attempt to use this name to lookup a
column, and that lookup fails because the desired column is keyed from
the tuple, not its string representation.


def test_parse_date_time_multi_level_column_name(self):
# GH 15376
result = conv.parse_date_time(self.dates, self.times)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what these 2 lines are doing, remove.

2001-01-05, 00:00:00, 1., 11.
"""
datecols = {'date_time': [0, 1]}
df = read_table(StringIO(data), sep=',', header=[0, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.read_csv, this tests on all parsers (c/python)

datecols = {'date_time': [0, 1]}
df = read_table(StringIO(data), sep=',', header=[0, 1],
parse_dates=datecols, date_parser=conv.parse_date_time)
self.assertIn('date_time', df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct an expected frame, and use assert_frame_equal

@@ -580,3 +580,4 @@ Bug Fixes
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``.read_csv()`` which caused ``parse_dates={'datetime': [0, 1]}`` to fail with multiline headers (:issue:`15376`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put this as the last line, instead use an empty space, otherwise you will get conflicts.

Bug in .read_csv() where parse_dates with a list-of-integers specified would fail with multiline headers

@jreback jreback added Bug IO CSV read_csv, to_csv labels Feb 12, 2017
@codecov-io
Copy link

codecov-io commented Feb 12, 2017

Codecov Report

Merging #15378 into master will decrease coverage by -0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15378      +/-   ##
==========================================
- Coverage   90.37%   90.37%   -0.01%     
==========================================
  Files         135      135              
  Lines       49440    49454      +14     
==========================================
+ Hits        44681    44693      +12     
- Misses       4759     4761       +2
Impacted Files Coverage Δ
pandas/io/parsers.py 95.51% <100%> (ø)
pandas/core/common.py 91.02% <ø> (-0.34%)
pandas/core/frame.py 97.82% <ø> (-0.05%)
pandas/tools/concat.py 97.62% <ø> (ø)
pandas/core/generic.py 96.33% <ø> (ø)
pandas/io/excel.py 79.64% <ø> (+0.24%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a8883b...030f5ec. Read the comment docs.

Fix for GH15376

In `io/parsers/_try_convert_dates()` when selecting columns based on a
column index from a set of columns with multi-level names, the column
`name` was converted to a string.  This appears to be a bug since the
`name` was a tuple before the conversion.  This causes problems
downstream when threre is an attempt to use this name to lookup a
column, and that lookup fails becuase the desired column is keyed from
the tuple, not its string representation.
2001-01-06, 00:00:00, 1.0, 11.
"""
datecols = {'date_time': [0, 1]}
result = read_csv(StringIO(data), sep=',', header=[0, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be self.read_csv, but I can fix on the merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few more of these and hopefully I'll get it.

Copy link
Contributor

@jreback jreback Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha np. parser tests are a little tricky to understand because of this actually.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

ok ping on green.

@jreback jreback added this to the 0.20.0 milestone Feb 16, 2017
@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

can you update

@stephenrauch
Copy link
Contributor Author

@jreback, You asked for update 4 days back, but I thought this was OK. If you still need something, please let me know what.

@jreback jreback closed this in fb7dc7d Feb 27, 2017
@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

closed via: fb7dc7d

thanks @stephenrauch

this test was in the wrong place (I had made a comment above, but not sure if you saw it).

In fact I think all of the pandas/tests/io/test_date_converters are in the wrong place and should simply be in pandas/tests/io/parsers/parse_dates.py (or equiv), so that they run under each parser. My guess is that this is an older file.

I'll create an issue about this.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
In `io/parsers/_try_convert_dates()` when selecting
columns based on a  column index from a set of columns with multi-
level names, the column  `name` was converted to a string.  This
appears to be a bug since the  `name` was a tuple before the
conversion.  This causes problems  downstream when there is an attempt
to use this name to lookup a  column, and that lookup fails because
the desired column is keyed from  the tuple, not its string
representation

closes pandas-dev#15376

Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits:

030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Parse two date columns broken in read_csv with multiple headers
3 participants