New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comment char parameter to to_csv method #27637
Comments
I'm not sure we would want to do this automatically. As you say,
I'm not if we would want to deviate from the stdlib here. cc @gfyoung if you have thoughts. |
To clarify, when you say "nan", I think you mean the "#" doesn't show up, right? Have you considered escaping those "#" ? Then you can pass in an |
As soon as a "#" character is encountered, the rest of the line is ignored because it's interpreted as comment. But if the # is in a quoted string, this doesn't happen.
That's actually a good solution as well. Except that you have to pre-process the columns beforehand. Might go for this one though. 👍 I have to admit this can be a feature request for quite a specific use case. Can imagine it's perhaps not worth the work. |
IMO, this isn't generally useful enough to warrant a new keyword. |
@mthaak : I unfortunately would have to agree here with @TomAugspurger. That being said: give the escaping attempt a shot. If it works for you, we could consider adding this to our cookbook, as it's not entirely unreasonable that people would want to do this. |
In case you would like to add it to the cookbook: escaping the comment characters + |
Code Sample
results in my.csv:
(a# is not quoted)
Problem description
We would like to use the "#" character as comment indicator such that lines that start with the character are automatically ignored. However when fields contain the "#" character and the
pd.read_csv("my.csv", comment="#")
is used to read the CSV, then those fields are read as nan. When those fields are quoted, then they are read as literal strings (which is the behavior we want). So we want to automatically quote fields containing the comment character "#" into_csv
.(the work-around we have now is set
quoting=2
(non-numeric) so by default all strings are quoted)Expected Output
results in my.csv:
A more universal solution could be to allow passing a list of characters to quote for the
quoting
parameter. E.g.df.to_csv("my.csv", quoting=['"', '#', ","]
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.24.2
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.11
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.0
sphinx: 2.1.1
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: 0.9.3
psycopg2: 2.8.3 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: