Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv.DictReader inconsistency #47686

Closed
mishok13 mannequin opened this issue Jul 24, 2008 · 21 comments
Closed

csv.DictReader inconsistency #47686

mishok13 mannequin opened this issue Jul 24, 2008 · 21 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mishok13
Copy link
Mannequin

mishok13 mannequin commented Jul 24, 2008

BPO 3436
Nosy @gvanrossum, @smontanaro, @warsaw, @rhettinger, @ncoghlan
Files
  • csv.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/smontanaro'
    closed_at = <Date 2008-08-08.22:53:44.546>
    created_at = <Date 2008-07-24.15:30:21.282>
    labels = ['type-bug', 'library']
    title = 'csv.DictReader inconsistency'
    updated_at = <Date 2008-08-08.22:53:44.544>
    user = 'https://bugs.python.org/mishok13'

    bugs.python.org fields:

    activity = <Date 2008-08-08.22:53:44.544>
    actor = 'skip.montanaro'
    assignee = 'skip.montanaro'
    closed = True
    closed_date = <Date 2008-08-08.22:53:44.546>
    closer = 'skip.montanaro'
    components = ['Library (Lib)']
    creation = <Date 2008-07-24.15:30:21.282>
    creator = 'mishok13'
    dependencies = []
    files = ['11021']
    hgrepos = []
    issue_num = 3436
    keywords = ['patch']
    message_count = 21.0
    messages = ['70207', '70213', '70214', '70238', '70310', '70311', '70341', '70342', '70343', '70413', '70434', '70442', '70501', '70521', '70532', '70537', '70538', '70544', '70547', '70917', '70921']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'skip.montanaro', 'barry', 'rhettinger', 'ncoghlan', 'mishok13']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue3436'
    versions = ['Python 2.6', 'Python 3.0']

    @mishok13
    Copy link
    Mannequin Author

    mishok13 mannequin commented Jul 24, 2008

    I had to use csv module recently and ran into a "problem" with
    DictReader. I had to get headers of CSV file and only after that iterate
    throgh each row. But AFAIU there is no way to do it, other then
    subclassing. So, basically, right now we have this:

    Python 3.0b2+ (unknown, Jul 24 2008, 12:15:52)
    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import csv
    >>> r = csv.DictReader(open('test.csv'))
    >>> r.fieldnames
    >>> next(r)
    {'baz': '13', 'foo': '42', 'bar': '27'}
    >>> r.fieldnames
    ['foo', 'bar', 'baz']
    
    I think it would be much more useful, if DictReader got 'fieldnames' on
    calling __init__ method, so this would look like this:
    >>> r = csv.DictReader(open('test.csv'))
    >>> r.fieldnames
    ['foo', 'bar', 'baz']

    The easy way to do this is to subclass csv.DictReader.
    The hard way to do this is to apply the patches I'm attaching. :)
    These patches also remove redundant check for self.fieldnames being None
    for each next()/next() call

    @mishok13 mishok13 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 24, 2008
    @rhettinger
    Copy link
    Contributor

    I think this is the wrong approach. It would be better to have a
    separate getheader() method. Having __init__ do the deed is at odds
    with other uses of __init__ that only do setup but don't start reading.

    @mishok13
    Copy link
    Mannequin Author

    mishok13 mannequin commented Jul 24, 2008

    And how this method should look?
    Something like this, I suppose:
    def getheader(self):
    if self.fieldnames is None:
    try:
    self.fieldnames = self.reader.next()
    except StopIteration:
    pass
    return self.fieldnames

    Well, adding new API after beta2 is a "no-no" as I understand, so this
    getheader() method can be added only in 2.7/3.1 releases. Should I post
    updated patches or just live with it?

    @smontanaro
    Copy link
    Contributor

    That would be a fairly easy change to the DictReader class (see
    the attached patch) but probably can't be applied at this point
    in the 2.6 release cycle even though all csv module tests pass
    with it.

    @smontanaro
    Copy link
    Contributor

    I should also point out that I've generally used this technique to
    populate the fieldnames attribute from the file:

        f = open("somefile.csv", "rb")
        rdr = csv.DictReader(f, fieldnames=csv.reader(f).next())

    So it is fairly trivial to set the fieldnames attribute before actually
    reading any data rows.

    @ncoghlan
    Copy link
    Contributor

    Like Raymond, I have issues with the idea of implicitly reading the
    headers in __init__, but would be fine with the idea of a separate
    method in 2.7/3.1.

    As far as working around the absence of such a method goes, I personally
    use itertools.chain if I happen to need the headers before I start
    iterating:

    r = csv.DictReader(open('test.csv'))
    first = next(r)
    # Do something with r.fieldnames
    for row in chain(first, r):
        # Do something with each row

    @smontanaro
    Copy link
    Contributor

    The consensus seems to be that __init__ shouldn't "magically" read the
    header row, even though by not specifying a fieldnames arg that's
    exactly what you're telling the DictReader where to find the column
    headers. Given that case, my argument is that we not make any changes
    (no getheaders method, etc) since there are at least a couple different
    ways mentioned already to do what you want.

    @mishok13
    Copy link
    Mannequin Author

    mishok13 mannequin commented Jul 28, 2008

    I'm ok with that. :)
    Looks like you can close this one as "won't fix".

    @smontanaro
    Copy link
    Contributor

    Done...

    @gvanrossum
    Copy link
    Member

    I know this has been closed, but perhaps the fieldnames attribute could
    be made into a property that reads the first line of the file if it
    hasn't been read yet?

    @smontanaro
    Copy link
    Contributor

    Guido> I know this has been closed, but perhaps the fieldnames attribute
    Guido> could be made into a property that reads the first line of the
    Guido> file if it hasn't been read yet?

    It's a nice thought. I tried the straightforward implementation in my
    sandbox and one of the more obscure tests failed. I have yet to look into
    the cause.

    Skip

    @ncoghlan
    Copy link
    Contributor

    Re-opened for consideration of GvR's suggestion.

    @ncoghlan ncoghlan reopened this Jul 30, 2008
    @ncoghlan
    Copy link
    Contributor

    I personally like the idea of making fieldnames a property - while
    having merely reading an attribute cause disk I/O is slightly
    questionable, it seems like a better option than returning a misleading
    value for that property and also a better option than reading the first
    line of the file in __init__.

    Hopefully Skip can track down that obscure failure and this change can
    be made at some point in the future.

    @mishok13
    Copy link
    Mannequin Author

    mishok13 mannequin commented Jul 31, 2008

    I like the idea of fieldnames attribute being a property, so i've
    uploaded patches that implement them as such.
    Both patches ran through make test without problems.

    @smontanaro
    Copy link
    Contributor

    Nick,

    Working with Andrii's patch I'm trying to add a couple test cases to
    make sure the methods you and I both demonstrated still work. Mine is
    no problem, but I must be doing something wrong trying to use/adapt your
    example. I freely admit I am not an itertools user, but I can't get
    your example to work as written:

    >>> r = csv.DictReader(open("foo.csv", "rb"))
    >>> r.fieldnames
    ['f1', 'f2', 'f3']
    >>> r.next()
    {'f1': '1', 'f2': '2', 'f3': 'abc'}
    >>> r = csv.DictReader(open("foo.csv", "rb"))
    >>> first = next(r)
    >>> first
    {'f1': '1', 'f2': '2', 'f3': 'abc'}
    >>> import itertools
    >>> for x in itertools.chain(first, r):
    ...   print x
    ... 
    f1
    f2
    f3

    If I place first in a list it works:

    >>> r = csv.DictReader(open("foo.csv", "rb"))
    >>> first = next(r)
    >>> for x in itertools.chain([first], r):
    ...   print x
    ... 
    {'f1': '1', 'f2': '2', 'f3': 'abc'}

    That makes intuitive sense to me. Is that what you intended?

    S

    @smontanaro
    Copy link
    Contributor

    I added a comment to Andrii's patch and added simple test cases
    which check to make sure the way Nick and I currently use the
    DictReader class (or at least envision using it) still works.

    @smontanaro
    Copy link
    Contributor

    Andrii, If my view of the Python 3.0 development process is correct and
    this change makes it into the 2.6 code, one of the 3.0 developers will
    merge to the py3k branch.

    @mishok13
    Copy link
    Mannequin Author

    mishok13 mannequin commented Aug 1, 2008

    Oh, so this is how the process looks like...
    /me removes patches
    I've uploaded both py3k and trunk patches just because I'm fixing things
    the other way round -- first I write a patch for 3.0 and only after that
    I backport it to 2.6. Stupid me. :)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 1, 2008

    Skip's patch looks good to me (as Skip discovered, I left out the
    necessary step of putting the first row back into an iterable before
    invoking chain in my example code)

    @warsaw
    Copy link
    Member

    warsaw commented Aug 8, 2008

    Making an existing attribute a property is a nice, API-neutral way to
    handle this. Let's call the inconsistency a bug and this a bug fix
    <wink> so that it's fine to add to 2.6 and 3.0 at this point.

    @smontanaro
    Copy link
    Contributor

    Committed as revision 65605.

    @smontanaro smontanaro self-assigned this Aug 8, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants