Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV empty-line regression since 6.0 #1164

Closed
skitt opened this issue Dec 30, 2022 · 9 comments
Closed

CSV empty-line regression since 6.0 #1164

skitt opened this issue Dec 30, 2022 · 9 comments

Comments

@skitt
Copy link
Contributor

skitt commented Dec 30, 2022

This is forwarded from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1022740

--ragged in 6.x doesn’t work as it used to in 5.x.

In 5.x,

$ echo -e "a,b\n1,2\n\n5,6" | mlr --ragged --csv cut -f a
a
1

5

But in 6.x,

$ echo -e "a,b\n1,2\n\n5,6" | mlr --ragged csv cut -f a
a
1
5
@aborruso
Copy link
Contributor

This seems something similar to this cc @johnkerl

@johnkerl
Copy link
Owner

Indeed #1102 is related -- that's for TSV; this one for CSV.

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

@skitt @aborruso the problem is even more general:

$ echo -e "a,b\n1,2\n\n5,6" | cat
a,b
1,2

5,6
$ echo -e "a,b\n1,2\n\n5,6" | mlr --csv cat
a,b
1,2
5,6

By moving from my own homegrown CSV parser written in C to the Go CSV library (for the Miller 6 Go port) I fell victim to this "feature" of the Go CSV library. This needs to be worked around.

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

cat csv0.go

package main

import (
	"fmt"
	"io"
	"os"
	"strings"

	"encoding/csv"
)

// IsEOF handles the following problem: reading past end of files opened with
// os.Open returns the error which is io.EOF. Reading past close of pipes
// opened with popen (e.g.  Miller's prepipe, where the file isn't 'foo.dat'
// but rather the process 'gunzip < foo.dat |') returns not io.EOF but an error
// with 'file already closed' within it. See also
// https://stackoverflow.com/questions/47486128/why-does-io-pipe-continue-to-block-even-when-eof-is-reached
func IsEOF(err error) bool {
	if err == nil {
		return false
	} else if err == io.EOF {
		return true
	} else if strings.Contains(err.Error(), "file already closed") {
		return true
	} else {
		return false
	}
}

func main() {
	csvReader := csv.NewReader(os.Stdin)
	for {
		csvRecord, err := csvReader.Read()
		if IsEOF(err) {
			break
		}
		if err != nil && csvRecord == nil {
			// See https://golang.org/pkg/encoding/csv.
			// We handle field-count ourselves.
			fmt.Printf("Error: %v\n", err)
			break
		}
		fmt.Printf("CSV record: %v\n", csvRecord)
	}
}

cat x

a,b,c
1,2,3
,,
7,8,9

csv0 < x

CSV record: [a b c]
CSV record: [1 2 3]
CSV record: [  ]
CSV record: [7 8 9]

cat y

a
1
2

4
5

csv0 < y

CSV record: [a]
CSV record: [1]
CSV record: [2]
CSV record: [4]
CSV record: [5]

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

And (I should have seen this much sooner)
https://pkg.go.dev/encoding/csv#Reader

Blank lines are ignored.

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

This was reported to the golang authors but (sadly) the report got lots of pushback there :(
golang/go#39119

@johnkerl johnkerl changed the title --ragged regression since 6.0 CSV empty-line regression since 6.0 Jan 1, 2023
@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

I'll make a 6.6.0 for this, hopefully tomorrow or next day

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2023

... or apparently today :)

https://github.com/johnkerl/miller/releases/tag/v6.6.0

@aborruso
Copy link
Contributor

aborruso commented Jan 1, 2023

Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants