Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out single and double quotes from CSV #66

Open
mfhepp opened this issue Mar 13, 2022 · 1 comment
Open

Filter out single and double quotes from CSV #66

mfhepp opened this issue Mar 13, 2022 · 1 comment

Comments

@mfhepp
Copy link

mfhepp commented Mar 13, 2022

Sometimes, it is unavoidable that CSV files contain single or double quotes for string values.

Pantable seems to include them in the resulting tables, which does not look very nice:

prefix, city_or_region, comments, Status
'030', 'Berlin', 'My comment', True
'069', 'Frankfurt', , False
'089', 'Munich', 'Another comment', True

Screen Shot 2022-03-13 at 22 54 49

It would be nice to be able to tell Pantable to remove single and double quotes from field values.

@alerque
Copy link
Contributor

alerque commented Mar 15, 2022

I don't think those are necessarily even valid CSV files, mixing single quotes and no quoting in a single file is just ... bogus. The spec only allows for double quotes. I'm not saying those things (and other problems such as tab/space or use of things other than commas for delimiters do happen in the wild. I just suggest you normalize your CSV data using other more robust tooling (I recommend csvkit tools) first thing after receiving it before you hand it off to other tools like this one that assume properly-formed data. All downstream tools should not be responsible for parsing all the possible broken formats that might exist in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants