Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlr silently truncating some TSV files since #923 #1102

Closed
johnkerl opened this issue Oct 6, 2022 · 5 comments
Closed

mlr silently truncating some TSV files since #923 #1102

johnkerl opened this issue Oct 6, 2022 · 5 comments

Comments

@johnkerl
Copy link
Owner

johnkerl commented Oct 6, 2022

Since #923 , mlr is silently truncating some files. I do not know which files, and cannot share the one file I know is causing the issue because of data security issues. Here's the output:

original file

$ wc -l temp.tsv
308170 temp.tsv

tsv filetype

$ mlr --tsv cat temp.tsv | wc -l
20408

csv filetype with tab separator

$ mlr --csv --fs "\t" cat temp.tsv | wc -l
308170

version 6.0.0

$ mlr --tsv cat temp.tsv | wc -l
308170

Even if my file is somehow invalid, this probably should throw an error instead of silently truncating the file. Is there something I should be looking for in the file so I can make a reprex for you?

Originally posted by @BEFH in #923 (comment)

@BEFH
Copy link
Contributor

BEFH commented Oct 6, 2022

PR is #923, thanks.

@johnkerl johnkerl changed the title Since this PR, mlr is silently truncating some files. I do not know which files, and cannot share the one file I know is causing the issue because of data security issues. Here's the output: mlr silently truncating some TSV files since #923 Oct 6, 2022
johnkerl added a commit that referenced this issue Dec 29, 2022
@johnkerl johnkerl reopened this Dec 29, 2022
@johnkerl
Copy link
Owner Author

This is fixed for TSV (in my code) but the issue persists for CSV where the bug is in the Go-CSV library -- this needs to be worked around as well ...

@BEFH
Copy link
Contributor

BEFH commented Dec 29, 2022 via email

@johnkerl
Copy link
Owner Author

Concretely:

$ cat test.csv
a
1
2

4
5
$ mlr5 --icsv --ojson cat test.csv
{ "a": 1 }
{ "a": 2 }
{ "a": "" }
{ "a": 4 }
{ "a": 5 }
$ mlr --icsv --ojson cat test.csv
[
{
  "a": 1
},
{
  "a": 2
},
{
  "a": 4
},
{
  "a": 5
}
]

@johnkerl
Copy link
Owner Author

This is fixed for TSV (in my code) but the issue persists for CSV where the bug is in the Go-CSV library -- this needs to be worked around as well ...

@skitt has kindly opened #1164 to track the problem for CSV so I'll close this issue out as it solves the problem for TSV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants