csv.DictReader inconsistency #47686

mishok13 · 2008-07-24T15:30:21Z

BPO	3436
Nosy	@gvanrossum, @smontanaro, @warsaw, @rhettinger, @ncoghlan
Files	csv.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/smontanaro'
closed_at = <Date 2008-08-08.22:53:44.546>
created_at = <Date 2008-07-24.15:30:21.282>
labels = ['type-bug', 'library']
title = 'csv.DictReader inconsistency'
updated_at = <Date 2008-08-08.22:53:44.544>
user = 'https://bugs.python.org/mishok13'

bugs.python.org fields:

activity = <Date 2008-08-08.22:53:44.544>
actor = 'skip.montanaro'
assignee = 'skip.montanaro'
closed = True
closed_date = <Date 2008-08-08.22:53:44.546>
closer = 'skip.montanaro'
components = ['Library (Lib)']
creation = <Date 2008-07-24.15:30:21.282>
creator = 'mishok13'
dependencies = []
files = ['11021']
hgrepos = []
issue_num = 3436
keywords = ['patch']
message_count = 21.0
messages = ['70207', '70213', '70214', '70238', '70310', '70311', '70341', '70342', '70343', '70413', '70434', '70442', '70501', '70521', '70532', '70537', '70538', '70544', '70547', '70917', '70921']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'skip.montanaro', 'barry', 'rhettinger', 'ncoghlan', 'mishok13']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue3436'
versions = ['Python 2.6', 'Python 3.0']

mishok13 · 2008-07-24T15:30:19Z

I had to use csv module recently and ran into a "problem" with
DictReader. I had to get headers of CSV file and only after that iterate
throgh each row. But AFAIU there is no way to do it, other then
subclassing. So, basically, right now we have this:

Python 3.0b2+ (unknown, Jul 24 2008, 12:15:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> r = csv.DictReader(open('test.csv'))
>>> r.fieldnames
>>> next(r)
{'baz': '13', 'foo': '42', 'bar': '27'}
>>> r.fieldnames
['foo', 'bar', 'baz']

I think it would be much more useful, if DictReader got 'fieldnames' on
calling __init__ method, so this would look like this:
>>> r = csv.DictReader(open('test.csv'))
>>> r.fieldnames
['foo', 'bar', 'baz']

The easy way to do this is to subclass csv.DictReader.
The hard way to do this is to apply the patches I'm attaching. :)
These patches also remove redundant check for self.fieldnames being None
for each next()/next() call

rhettinger · 2008-07-24T16:18:50Z

I think this is the wrong approach. It would be better to have a
separate getheader() method. Having __init__ do the deed is at odds
with other uses of __init__ that only do setup but don't start reading.

mishok13 · 2008-07-24T16:29:59Z

And how this method should look?
Something like this, I suppose:
def getheader(self):
if self.fieldnames is None:
try:
self.fieldnames = self.reader.next()
except StopIteration:
pass
return self.fieldnames

Well, adding new API after beta2 is a "no-no" as I understand, so this
getheader() method can be added only in 2.7/3.1 releases. Should I post
updated patches or just live with it?

smontanaro · 2008-07-25T02:41:16Z

That would be a fairly easy change to the DictReader class (see
the attached patch) but probably can't be applied at this point
in the 2.6 release cycle even though all csv module tests pass
with it.

smontanaro · 2008-07-27T03:23:42Z

I should also point out that I've generally used this technique to
populate the fieldnames attribute from the file:

    f = open("somefile.csv", "rb")
    rdr = csv.DictReader(f, fieldnames=csv.reader(f).next())

So it is fairly trivial to set the fieldnames attribute before actually
reading any data rows.

ncoghlan · 2008-07-27T04:15:47Z

Like Raymond, I have issues with the idea of implicitly reading the
headers in __init__, but would be fine with the idea of a separate
method in 2.7/3.1.

As far as working around the absence of such a method goes, I personally
use itertools.chain if I happen to need the headers before I start
iterating:

r = csv.DictReader(open('test.csv'))
first = next(r)
# Do something with r.fieldnames
for row in chain(first, r):
    # Do something with each row

smontanaro · 2008-07-28T10:59:30Z

The consensus seems to be that __init__ shouldn't "magically" read the
header row, even though by not specifying a fieldnames arg that's
exactly what you're telling the DictReader where to find the column
headers. Given that case, my argument is that we not make any changes
(no getheaders method, etc) since there are at least a couple different
ways mentioned already to do what you want.

mishok13 · 2008-07-28T11:03:06Z

I'm ok with that. :)
Looks like you can close this one as "won't fix".

smontanaro · 2008-07-28T11:17:41Z

Done...

gvanrossum · 2008-07-30T00:17:25Z

I know this has been closed, but perhaps the fieldnames attribute could
be made into a property that reads the first line of the file if it
hasn't been read yet?

smontanaro · 2008-07-30T17:41:01Z

Guido> I know this has been closed, but perhaps the fieldnames attribute
Guido> could be made into a property that reads the first line of the
Guido> file if it hasn't been read yet?

It's a nice thought. I tried the straightforward implementation in my
sandbox and one of the more obscure tests failed. I have yet to look into
the cause.

Skip

ncoghlan · 2008-07-30T22:58:36Z

Re-opened for consideration of GvR's suggestion.

ncoghlan · 2008-07-31T13:23:21Z

I personally like the idea of making fieldnames a property - while
having merely reading an attribute cause disk I/O is slightly
questionable, it seems like a better option than returning a misleading
value for that property and also a better option than reading the first
line of the file in __init__.

Hopefully Skip can track down that obscure failure and this change can
be made at some point in the future.

mishok13 · 2008-07-31T15:24:55Z

I like the idea of fieldnames attribute being a property, so i've
uploaded patches that implement them as such.
Both patches ran through make test without problems.

smontanaro · 2008-07-31T22:56:35Z

Nick,

Working with Andrii's patch I'm trying to add a couple test cases to
make sure the methods you and I both demonstrated still work. Mine is
no problem, but I must be doing something wrong trying to use/adapt your
example. I freely admit I am not an itertools user, but I can't get
your example to work as written:

>>> r = csv.DictReader(open("foo.csv", "rb"))
>>> r.fieldnames
['f1', 'f2', 'f3']
>>> r.next()
{'f1': '1', 'f2': '2', 'f3': 'abc'}
>>> r = csv.DictReader(open("foo.csv", "rb"))
>>> first = next(r)
>>> first
{'f1': '1', 'f2': '2', 'f3': 'abc'}
>>> import itertools
>>> for x in itertools.chain(first, r):
...   print x
... 
f1
f2
f3

If I place first in a list it works:

>>> r = csv.DictReader(open("foo.csv", "rb"))
>>> first = next(r)
>>> for x in itertools.chain([first], r):
...   print x
... 
{'f1': '1', 'f2': '2', 'f3': 'abc'}

That makes intuitive sense to me. Is that what you intended?

S

smontanaro · 2008-08-01T00:59:52Z

I added a comment to Andrii's patch and added simple test cases
which check to make sure the way Nick and I currently use the
DictReader class (or at least envision using it) still works.

smontanaro · 2008-08-01T01:06:05Z

Andrii, If my view of the Python 3.0 development process is correct and
this change makes it into the 2.6 code, one of the 3.0 developers will
merge to the py3k branch.

mishok13 · 2008-08-01T08:21:07Z

Oh, so this is how the process looks like...
/me removes patches
I've uploaded both py3k and trunk patches just because I'm fixing things
the other way round -- first I write a patch for 3.0 and only after that
I backport it to 2.6. Stupid me. :)

ncoghlan · 2008-08-01T10:10:32Z

Skip's patch looks good to me (as Skip discovered, I left out the
necessary step of putting the first row back into an iterable before
invoking chain in my example code)

warsaw · 2008-08-08T22:06:29Z

Making an existing attribute a property is a nice, API-neutral way to
handle this. Let's call the inconsistency a bug and this a bug fix
<wink> so that it's fine to add to 2.6 and 3.0 at this point.

smontanaro · 2008-08-08T22:53:44Z

Committed as revision 65605.

mishok13 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 24, 2008

smontanaro closed this as completed Jul 28, 2008

ncoghlan reopened this Jul 30, 2008

smontanaro closed this as completed Aug 8, 2008

smontanaro self-assigned this Aug 8, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv.DictReader inconsistency #47686

csv.DictReader inconsistency #47686

mishok13 mannequin commented Jul 24, 2008

mishok13 mannequin commented Jul 24, 2008

rhettinger commented Jul 24, 2008

mishok13 mannequin commented Jul 24, 2008

smontanaro commented Jul 25, 2008

smontanaro commented Jul 27, 2008

ncoghlan commented Jul 27, 2008

smontanaro commented Jul 28, 2008

mishok13 mannequin commented Jul 28, 2008

smontanaro commented Jul 28, 2008

gvanrossum commented Jul 30, 2008

smontanaro commented Jul 30, 2008

ncoghlan commented Jul 30, 2008

ncoghlan commented Jul 31, 2008

mishok13 mannequin commented Jul 31, 2008

smontanaro commented Jul 31, 2008

smontanaro commented Aug 1, 2008

smontanaro commented Aug 1, 2008

mishok13 mannequin commented Aug 1, 2008

ncoghlan commented Aug 1, 2008

warsaw commented Aug 8, 2008

smontanaro commented Aug 8, 2008

csv.DictReader inconsistency #47686

csv.DictReader inconsistency #47686

Comments

mishok13 mannequin commented Jul 24, 2008

mishok13 mannequin commented Jul 24, 2008

rhettinger commented Jul 24, 2008

mishok13 mannequin commented Jul 24, 2008

smontanaro commented Jul 25, 2008

smontanaro commented Jul 27, 2008

ncoghlan commented Jul 27, 2008

smontanaro commented Jul 28, 2008

mishok13 mannequin commented Jul 28, 2008

smontanaro commented Jul 28, 2008

gvanrossum commented Jul 30, 2008

smontanaro commented Jul 30, 2008

ncoghlan commented Jul 30, 2008

ncoghlan commented Jul 31, 2008

mishok13 mannequin commented Jul 31, 2008

smontanaro commented Jul 31, 2008

smontanaro commented Aug 1, 2008

smontanaro commented Aug 1, 2008

mishok13 mannequin commented Aug 1, 2008

ncoghlan commented Aug 1, 2008

warsaw commented Aug 8, 2008

smontanaro commented Aug 8, 2008