Skip to content

Commit

Permalink
ENH: Improve error message for read_csv header argument containing no…
Browse files Browse the repository at this point in the history
…n int types. GH16338. (#16351)

Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments.

Revising PR16351 based on feedback

Revising PR16351 lint issues

Adding release note in whatsnew v0.21.0 for PR16351
  • Loading branch information
mjlove12 authored and TomAugspurger committed May 24, 2017
1 parent b0038ac commit 7271f50
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Expand Up @@ -94,6 +94,7 @@ Indexing
I/O
^^^

- Bug in ``pd.read_csv()`` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`)


Plotting
Expand Down
6 changes: 6 additions & 0 deletions pandas/io/parsers.py
Expand Up @@ -1164,6 +1164,8 @@ def __init__(self, kwds):
# validate header options for mi
self.header = kwds.get('header')
if isinstance(self.header, (list, tuple, np.ndarray)):
if not all(map(is_integer, self.header)):
raise ValueError("header must be integer or list of integers")
if kwds.get('as_recarray'):
raise ValueError("cannot specify as_recarray when "
"specifying a multi-index header")
Expand All @@ -1184,6 +1186,10 @@ def __init__(self, kwds):
raise ValueError("index_col must only contain row numbers "
"when specifying a multi-index header")

# GH 16338
elif self.header is not None and not is_integer(self.header):
raise ValueError("header must be integer or list of integers")

self._name_processed = False

self._first_chunk = True
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/parser/header.py
Expand Up @@ -277,3 +277,12 @@ def test_no_header(self):
tm.assert_index_equal(df.columns, Index(lrange(5)))

tm.assert_index_equal(df2.columns, Index(names))

def test_non_int_header(self):
# GH 16338
msg = 'header must be integer or list of integers'
data = """1,2\n3,4"""
with tm.assert_raises_regex(ValueError, msg):
self.read_csv(StringIO(data), sep=',', header=['a', 'b'])
with tm.assert_raises_regex(ValueError, msg):
self.read_csv(StringIO(data), sep=',', header='string_header')

0 comments on commit 7271f50

Please sign in to comment.