Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent CRLF behaviour between Windows and Linux #1362

Closed
daviewales opened this issue May 2, 2022 · 4 comments
Closed

Inconsistent CRLF behaviour between Windows and Linux #1362

daviewales opened this issue May 2, 2022 · 4 comments

Comments

@daviewales
Copy link
Contributor

Small description

If I open a CSV with LF line endings in VisiData on Windows, make a change, then save the file, I get CRCRLF line endings.
Note that this is not just CRLF line endings, which is what you get on Linux (see #1076). Note that there are somehow two CR characters before the LF!

Expected result

The line endings should either be LF (preserve original) or CRLF (Python and VisiData default for CSV writer on Linux).
The line endings should not be CRCRLF...

Actual result with screenshot
If you get an unexpected error, please include the full stack trace that you get with Ctrl-E.

Steps to reproduce with sample data and a .vd

Original file (note the LF (0a) line ending:
lf.csv

$ xxd lf.csv
00000000: 612c 620a 312c 320a                      a,b.1,2.

New file (note the CRCRLF (0d 0d 0a) line ending!)
lf.csv

xxd lf.csv
00000000: 612c 620d 0d0a 332c 340d 0d0a            a,b...3,4...

CommandLog (zipped):
lf.zip

Additional context
Please include the version of VisiData: saul.pw/VisiData v2.7.1

@daviewales daviewales added the bug label May 2, 2022
@saulpw
Copy link
Owner

saulpw commented May 3, 2022

Thanks for the report, @daviewales. You're right, this is incorrect. Ideally it would detect the csv options and set them on the source sheet so it would save in an identical format (though derived sheets wouldn't inherit this format, which may or may not be desired).

@daviewales
Copy link
Contributor Author

The strange thing is that on Linux, it does: LF -> CRLF. But on Windows we get LF -> CRCRLF. I haven't checked the source, but there's something strange going on there... (I know that VisiData doesn't exactly support Windows, but it's strange all the same.)

@saulpw
Copy link
Owner

saulpw commented May 5, 2022

Interesting. This is due to options.csv_lineterminator being set to '\r\n' by default, and on windows \n is CRLF. So you can set this option manually to \x0a to get LF-only line-endings. I guess the default should be explicit bytes as well.

@saulpw saulpw closed this as completed May 5, 2022
daviewales added a commit to daviewales/visidata that referenced this issue May 5, 2022
Fixes saulpw#1362

According to the docs for csv.writer:

> If csvfile is a file object, it should be opened with newline=''
> https://docs.python.org/3/library/csv.html#csv.writer

Additionally:

> If newline='' is not specified, newlines embedded inside quoted fields will
> not be interpreted correctly, and on platforms that use \r\n linendings on
> write an extra \r will be added. It should always be safe to specify
> newline='', since the csv module does its own (universal) newline handling.
> https://docs.python.org/3/library/csv.html#id3

This commit adds `newline=None` as a keyword argument for `open_text` in
`path.py`. It then uses this new keyword argument when opening a file
for writing in `loaders/csv.py`
@daviewales
Copy link
Contributor Author

@saulpw I did some more digging, and I think the issue is because the CSV file should be opened with newline='' for csv.writer.

See this note on the csv.writer docs:
https://docs.python.org/3/library/csv.html#id3

I'll send through a pull request with a fix.

daviewales added a commit to daviewales/visidata that referenced this issue May 5, 2022
Fixes saulpw#1362

According to the docs for csv.writer:

> If csvfile is a file object, it should be opened with newline=''
> https://docs.python.org/3/library/csv.html#csv.writer

Additionally:

> If newline='' is not specified, newlines embedded inside quoted fields will
> not be interpreted correctly, and on platforms that use \r\n linendings on
> write an extra \r will be added. It should always be safe to specify
> newline='', since the csv module does its own (universal) newline handling.
> https://docs.python.org/3/library/csv.html#id3

This commit adds `newline=None` as a keyword argument for `open_text` in
`path.py`. It then uses this new keyword argument when opening a file
for writing in `loaders/csv.py`
saulpw pushed a commit that referenced this issue May 7, 2022
Fixes #1362

According to the docs for csv.writer:

> If csvfile is a file object, it should be opened with newline=''
> https://docs.python.org/3/library/csv.html#csv.writer

Additionally:

> If newline='' is not specified, newlines embedded inside quoted fields will
> not be interpreted correctly, and on platforms that use \r\n linendings on
> write an extra \r will be added. It should always be safe to specify
> newline='', since the csv module does its own (universal) newline handling.
> https://docs.python.org/3/library/csv.html#id3

This commit adds `newline=None` as a keyword argument for `open_text` in
`path.py`. It then uses this new keyword argument when opening a file
for writing in `loaders/csv.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants