New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to implement comment rows in csv module #42112
Comments
Sometimes csv files contain comment rows, for csv_reader = csv.reader(fp)
for row in csv_reader:
if row[0][0] != '#': #assuming no blank lines
print row I propose adding a "commentchar" parameter to the csv csv_reader = csv.reader(fp, commentchar='#')
for row in csv_reader:
print row This requires only relatively minor changes to the Note that that implementation adds SKIPPED_RECORD as a It shoud be irrelevant, but this has been developed on |
Logged In: YES I'm not inclined to clutter the C code with further complications. Why |
Logged In: YES
Sorry - I haven't been keeping up with the existing Basically, I noticed that the csv module has a bias towards My submission was intended in the "batteries included"
True, I could do any of those things, but it would be In any case, if your vote goes from your apparent -0 to -1, Cheers, |
Logged In: YES Something else just occurred to me. What about writing csv files with |
Logged In: YES Iain - There was some positive response to your patch from Skip |
Logged In: YES Here are the documentation and test diffs. I'm glad to hear of the positive feedback. I couldn't find On a related point, I noticed that the csv documentation is Thanks, |
Sorry, I'm coming back to this after a long hiatus... I'm still not #!/usr/bin/env python import csv
import StringIO
class CommentedFile:
def __init__(self, f, commentstring="#"):
self.f = f
self.commentstring = commentstring
def next(self):
line = self.f.next()
while line.startswith(self.commentstring):
line = self.f.next()
return line
def __iter__(self):
return self
f = StringIO.StringIO('''\
"event","performers","start","end","time"
# Rachel Sage
"","Rachael Sage","2008-01-03","2008-01-03","8:00pm"
# Others
"","Tung-N-GRoeVE","2008-01-16","2008-01-16","9:30pm-2:00am"
"","Bossa Nova Beatniks","2007-11-11","2007-11-11","11:11pm"
"","Special Consensus","2006-10-06","2006-10-06",""
''')
for row in csv.reader(CommentedFile(f)):
print row The output of the above is as expected: ['event', 'performers', 'start', 'end', 'time'] This has the added benefit that comment lines aren't restricted to single Skip |
Assigning to Andrew (as the primary C lib author). Andrew, please |
I think it's a reasonable enough request - I've certainly had to Some thoughts:
Skip - are you happy making the changes, or should I dust off my |
-1 on the change. Comments in CSV files are not a common use case. Also, it is already trivial to implement filtering using str.startswith |
Comment lines are a *very* common case in scientific and statistical data. +1 for the change. |
Comment lines in csv data may be common in some areas, but they are not part of any standard, and they are not the only possible extension to csv files (for example: ignore empty lines, or a terminal \ for line continuation...) Currently all members of Dialect deal with the format of the records, not with the extraction of records from the file. (lineterminator is used only when writing). The "CommentedFile" filter above is a good recipe for all cases, and is easy to use. I recommend closing this issue. |
Hello Shouldn't the comment char definition belong in a dialect class? The reader would still have to be modified to skip these lines, but having this char in the dialect would not require a change to csv.reader signature. Kind regards |
I'm still -1 on the feature - not standard enough, and easy to implement outside the csv module. |
Amaury> Comment lines in csv data may be common in some areas, but they Or different peoples' notion of how to comment strings. Precidely because
Trying to accommodate the myriad varieties of way s people might decide to Skip |
Agreed with Skip, Raymond and Amaury. |
Antoine> Since the csv module returns you an iterator, it's easy enough I prefer to do this sort of stuff as a pre-processing step, so I generally Skip |
Here is another -1 for this proposed feature. Having a comments in the csv fields and providing a way to deal will complicate matters more than required. Different suggestions of how to accomplish it has been suggested here. As others, I too recommend closing it. (It is assigned to andewmcnamara, so I guess, he would close it). |
Okay, while I am sympathetic to the points raised by the people asking for this enhancement, I'm persuaded to reject it by the arguments that the potential benefit is outweighed by the increase in complexity (code and documentation). While the attached patch from Iain requires relatively minor and innocuous changes to the state machine, it only satisfies a limited subset of users, and the effect of complexity is geometric. In other words, to understand the state machine, you need to understand the relation of each state to every other state, etc. As to why the core of the module is implemented in C, the sort of stateful parsing done by the module is one area where python does not excel (Ha!). Stripping comments, on the other hand, CAN be done with reasonable efficiency from python (and considerably more flexibility). Also, it has been my experience that CSV files that use comments typically have other non-standard features as well, requiring additional pre- and post-processing in any case. |
Note that there is one case that cannot easily be addressed via pre-processing: where the comment character coincidently appears at the start of a line within a multi-line quoted field. For example: # This is a comment What this should produce is debatable, but it would be hard to make it produce: ["1", "2", "This is field^M#3"] without implementing it within the parser. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: