-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add named tuple reader to CSV module #46143
Comments
Here's a proof-of-concept patch. If approved, will change from The idea corresponds to what is currently done by the dict reader but A writer is not needed because named tuples can be feed into the |
Barry, any thoughts on this? |
I'd personally be kind of surprised if Barry had any thoughts on this. Skip |
An implementation of a namedtuple reader and writer. Created a writer for the case where user would like to specify e.g. Nt = namedtuple('LessFields', 'f1 f3')
nt = Nt(f1='one', f2=2)
mywriter.writerow(nt) # writes one,missing,2 any thoughts on case where defined fieldname has a leading e.g. Leading underscores may be present in an unsighted csv file, Cheers, |
Consider providing a hook to a function that converts non-conforming class NamedTupleReader:
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", fieldnamer=None, *args, **kwds):
. . . I'm going to either post a recipe to do the renaming or provide a static >>> renamer(['abc', 'def', '1', '_hidden', 'abc', 'p', 'abc'])
['abc', 'x_def', 'x_1', 'x_hidden', 'x_abc', 'p', 'x1_abc'] |
In r69480, named tuples gained the ability to automatically rename |
Updated NamedTupleReader to give a rename=False keyword argument. Two new tests for the rename keyword. Cheers, |
I am totally new to Python dev. I reinvented a NamedTupleReader Consider doing some filtering on a csv file, like so. sample_data = [
'title,latitude,longitude',
'OHO Ofner & Hammecke Reinigungsgesellschaft mbH,48.128265,11.610848',
'Kitchen Kaboodle,45.544241,-122.715728',
'Walgreens,28.339727,-81.596367',
'Gurnigel Pass,46.731944,7.447778'
]
def filter_with_dict_reader_writer():
accepted_rows = []
for row in csv.DictReader(sample_data):
if float(row['latitude']) > 0.0 and float(row['longitude']) > 0.0:
accepted_rows.append(row)
field_names = csv.reader(sample_data).next()
output_writer = csv.DictWriter(open('accepted_by_dict.csv', 'w'),
field_names)
output_writer.writerow(dict(zip(field_names, field_names)))
output_writer.writerows(accepted_rows) You have to work so hard to maintain the headers when you write the file NamedTupleReader and NamedTupleWriter should be inverses. This means def filter_with_named_tuple_reader_writer():
accepted_rows = []
for row in csv.NamedTupleReader(sample_data):
if float(row.latitude) > 0.0 and float(row.longitude) > 0.0:
accepted_rows.append(row)
output_writer = csv.NamedTupleWriter(
open('accepted_by_named_tuple.csv', 'w'))
output_writer.writerows(accepted_rows) I patched on top of the existing NamedTupleWriter patch adding support |
My previous patch could write the header twice. But I am not sure about |
The two latest patches (ntreader4.diff and Barry or Skip, is this something you want in your module? |
Raymond> Barry or Skip, is this something you want in your module? Sorry, I haven't really looked at this ticket other than to notice its Skip |
I think it would be useful to have. |
Hrm... I replied twice by email. Only one comment appears to have
You're assuming that one instance of these classes will read or write an |
Let me be more explicit. I don't know how it implements it, but I think
Skip |
Skip> Let me be more explicit. I don't know how it implements it, but I
I agree with Skip, we mustn't have a 'wroteheader' flag internal to the Currently to write a 'header' row with a csv.writer you could (for I would not like to see another flag added to the initialisation process Cheers, |
I want to make sure I understand. Am I correct in believing that Skip I agree that we should not unconditionally write headers, but I think I believe the implicit header writing is very elegant, and the only It also seems wrong to require the construction of "header" namedtuple >>> Point._make(Point._fields)
Point(x='x', y='y') To me, that just looks weird and non-obvious to me. That Point instance |
Rob> I agree that we should not unconditionally write headers, but I I don't think you should write them by default. I've worked with lots of Skip |
More concretely, I don't think this is so onerous: names = ["col1", "col2", "color"]
writer = csv.DictWriter(open("f.csv", "wb"), fieldnames=names, ...)
writer.writerow(dict(zip(names, names)))
... or f = open("f.csv", "rb")
names = csv.reader(f).next()
reader = csv.DictReader(f, fieldnames=names, ...)
... Skip |
I did a search on Google code for the DictReader constructor. I On Thu, Feb 26, 2009 at 8:00 PM, Skip Montanaro <report@bugs.python.org> wrote:
|
My experience has been the same as Skips. |
Rob> I still don't like the lack of symmetry of supporting implicit A header is nothing more than a row in the CSV file with special Skip |
Added a patch against py3k branch. in csv.rst removed reference to reader.next() as a public method. |
Jervis> in csv.rst removed reference to reader.next() as a public method. Because? I've not seen any discussion in this issue or in any other forums Skip |
I don't understand why NamedTupleReader requires the fieldnames array |
I don't know how NamedTuple objects work, but in many situations you |
I retract my previous comment. I don't use the DictReader the way it |
Jervis> in csv.rst removed reference to reader.next() as a public method. Skip> Because? I've not seen any discussion in this issue or in any I agree, this should be applied separately. |
Antoine> I don't understand why NamedTupleReader requires the The NamedTupleReader does take the namedtuple class as the fieldnames Given the confusion, I accept that the documentation needs to be improved. The NamedTupleReader and Writer were created to follow as closely as |
Ok, I got misled by the documentation ("The contents of *fieldnames* are |
Updated version of docs for 2.7 and 3k. |
See also this python-ideas thread: http://mail.python.org/pipermail/python-ideas/2010-April/006991.html |
Type conversion is a whole 'nuther kettle of fish. This particular thread is long and complex enough that it shouldn't be made more complex. |
I suggest that this is closed unless anyone shows an active interest in it. |
Closing as no response to msg110598. |
Re-opening because we ought to do something along these lines at some point. The DictReader and DictWriter are inadequate for preserving order and they are unnecessarily memory intensive (one dict per record). FWIW, the non-conforming field name problem has already been solved by recent improvements to collections.namedtuple using rename=True. |
Unassigning, this needs fresh thought and a fresh patch from someone who can devote a little deep thinking on how to solve this problem cleanly. In the meantime, it is no problem to simply cast the CSV tuples into named tuples. |
Here's the class I have been using for reading namedtuples from CSV files: from collections import namedtuple
from itertools import imap
import csv
class CsvNamedTupleReader(object):
__slots__ = ('_r', 'row', 'fieldnames')
def __init__(self, *args, **kwargs):
self._r = csv.reader(*args, **kwargs)
self.row = namedtuple("row", self._r.next())
self.fieldnames = self.row._fields
def __iter__(self):
#FIXME: how about this? return imap(self.row._make, self._r[:len(self.fieldnames)]
return imap(self.row._make, self._r)
dialect = property(lambda self: self._r.dialect)
line_num = property(lambda self: self._r.line_num) This class wraps csv.reader since it doesn't seem to be possible to inherit from it. It uses itertools.imap to iterate over the rows output by csv.reader and convert them to the namedtuple class. One thing that needs fixing (marked with FIXME above) is what to do in the case of a row which has more fields than the header row. The simplest solution is simply to truncate such a row, but perhaps more options are needed, similar to those offered by DictReader. |
As my contribution during the sprints at PyCon 2015, I've tweaked Jervis's patch a little and updated the tests/docs to work with Python 3.5. My only real change was placing the basic reader object inside a generator expression that filters out empty lines. Being partial to functional programming I find this removes some of the code clutter in __next__(), letting that method focus on turning rows into tuples. Hopefully this will rekindle the discussion! |
Skip or Barry, do you want to look at this? |
Friendly reminder that this exists. I know everyone's busy and this is marked as low-priority, but I'm gonna keep bumping this till we add a solution :) |
I looked at this six years ago. I still haven't found a situation where I pined for a NamedTupleReader. That said, I have no objection to committing it if others, more well-versed in current Python code and NamedTuples than I gives it a pass. Note that I added a couple comments to the csv.py diff, but nobody either updated the code or explained why I was out in the weeds in my comments. |
FWIW, I relinquished my check-in privileges quite awhile ago. This should S |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: