Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: Reading is slow #16791

Closed
ALTree opened this issue Aug 18, 2016 · 38 comments

Comments

Projects
None yet
@ALTree
Copy link
Member

commented Aug 18, 2016

$ go version
go version go1.7 linux/amd64

Reading of csv files is, out of the box, quite slow (tl;dr: 3x slower than a simple Java program, 1.5x slower than the obvious python code). A typical example:

package main

import (
    "bufio"
    "encoding/csv"
    "fmt"
    "io"
    "os"
)

func main() {
    f, _ := os.Open("mock_data.csv")
    defer f.Close()

    r := csv.NewReader(f)
    for {
        line, err := r.Read()
        if err == io.EOF {
            break
        }
        if line[0] == "42" {
            fmt.Println(line)
        }
    }

}

Python3 equivalent:

import csv
with open('mock_data.csv') as f:
    r = csv.reader(f)
    for row in r:
        if row[0] == "42":
            print(row)

Equivalent Java code [EDIT: not actually equivalent, please see pauldraper comment below for a better test]

import java.io.BufferedReader;
import java.io.FileReader;

public class ReadCsv {
    public static void main(String[] args) {
        BufferedReader br;
        String line;
        try {
            br = new BufferedReader(new FileReader("mock_data.csv"));
            while ((line = br.readLine()) != null) {
                String[] data = line.split(",");
                if (data[0].equals("42")) {
                    System.out.println(line);
                }
            }
        } catch (Exception e) {}
    }
}

Tested on a 50MB, 1'000'002 lines csv file generated as:

data = ",Carl,Gauss,cgauss@unigottingen.de,Male,30.4.17.77\n"
with open("mock_data.csv", "w") as f:
    f.write("id,first_name,last_name,email,gender,ip_address\n")
    f.write(("1"+data)*int(1e6))
    f.write("42"+data);

Results:

Go:       avg 1.489 secs
Python:   avg 0.933 secs  (1.5x faster)
Java:     avg 0.493 secs  (3.0x faster)

Go error reporting is obviously better than the one you can have with that Java code, and I'm not sure about Python, but people has been complaining about encoding/csv slowness, so it's probably worth investigating whether the csv package can be made faster.

@ALTree ALTree changed the title encoding/csv: Read is slow encoding/csv: Reading is slow Aug 18, 2016

@freeformz

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2016

What if you change r := csv.NewReader(bufio.NewReader(f)) to r := csv.NewReader(f)?

@ALTree

This comment has been minimized.

Copy link
Member Author

commented Aug 18, 2016

Yeah I'm dumb, csv.NewReader already uses bufio.NewReader. The performance is exactly the same, but I edited the snippet.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Aug 18, 2016

CPU:

(pprof) top 20  
2.18s of 2.24s total (97.32%)  
Dropped 13 nodes (cum <= 0.01s)  
Showing top 20 nodes out of 42 (cum >= 0.02s)  
      flat  flat%   sum%        cum   cum%  
     0.45s 20.09% 20.09%      0.46s 20.54%  bufio.(*Reader).ReadRune  
     0.38s 16.96% 37.05%      0.68s 30.36%  bytes.(*Buffer).WriteByte  
     0.30s 13.39% 50.45%      0.30s 13.39%  bytes.(*Buffer).grow  
     0.22s  9.82% 60.27%      0.29s 12.95%  runtime.mallocgc  
     0.19s  8.48% 68.75%      0.63s 28.12%  encoding/csv.(*Reader).readRune  
     0.15s  6.70% 75.45%      1.63s 72.77%  encoding/csv.(*Reader).parseField  
     0.10s  4.46% 79.91%      0.78s 34.82%  bytes.(*Buffer).WriteRune  
     0.08s  3.57% 83.48%      2.14s 95.54%  encoding/csv.(*Reader).parseRecord  
     0.06s  2.68% 86.16%      0.06s  2.68%  bytes.(*Buffer).Truncate  
     0.05s  2.23% 88.39%      0.28s 12.50%  runtime.rawstringtmp  
     0.04s  1.79% 90.18%      0.33s 14.73%  runtime.slicebytetostring  
     0.03s  1.34% 91.52%      0.03s  1.34%  runtime.memclr  
     0.02s  0.89% 92.41%      2.18s 97.32%  csv_perf.BenchmarkCSV  
     0.02s  0.89% 93.30%      2.16s 96.43%  encoding/csv.(*Reader).Read  
     0.02s  0.89% 94.20%      0.02s  0.89%  runtime.(*mspan).countFree  
     0.02s  0.89% 95.09%      0.02s  0.89%  runtime.heapBitsSetType  
     0.02s  0.89% 95.98%      0.23s 10.27%  runtime.rawstring  
     0.01s  0.45% 96.43%      0.07s  3.12%  bytes.(*Buffer).Reset  
     0.01s  0.45% 96.88%      0.02s  0.89%  runtime.lock  
     0.01s  0.45% 97.32%      0.02s  0.89%  runtime.scanblock  


(pprof) top --cum 20  
2.09s of 2.24s total (93.30%)  
Dropped 13 nodes (cum <= 0.01s)  
Showing top 20 nodes out of 42 (cum >= 0.06s)  
      flat  flat%   sum%        cum   cum%  
         0     0%     0%      2.23s 99.55%  runtime.goexit  
     0.02s  0.89%  0.89%      2.18s 97.32%  csv_perf.BenchmarkCSV  
         0     0%  0.89%      2.18s 97.32%  testing.(*B).run1.func1  
         0     0%  0.89%      2.18s 97.32%  testing.(*B).runN  
     0.02s  0.89%  1.79%      2.16s 96.43%  encoding/csv.(*Reader).Read  
     0.08s  3.57%  5.36%      2.14s 95.54%  encoding/csv.(*Reader).parseRecord  
     0.15s  6.70% 12.05%      1.63s 72.77%  encoding/csv.(*Reader).parseField  
     0.10s  4.46% 16.52%      0.78s 34.82%  bytes.(*Buffer).WriteRune  
     0.38s 16.96% 33.48%      0.68s 30.36%  bytes.(*Buffer).WriteByte  
     0.19s  8.48% 41.96%      0.63s 28.12%  encoding/csv.(*Reader).readRune  
     0.45s 20.09% 62.05%      0.46s 20.54%  bufio.(*Reader).ReadRune  
     0.04s  1.79% 63.84%      0.33s 14.73%  runtime.slicebytetostring  
     0.30s 13.39% 77.23%      0.30s 13.39%  bytes.(*Buffer).grow  
     0.22s  9.82% 87.05%      0.29s 12.95%  runtime.mallocgc  
     0.05s  2.23% 89.29%      0.28s 12.50%  runtime.rawstringtmp  
     0.02s  0.89% 90.18%      0.23s 10.27%  runtime.rawstring  
         0     0% 90.18%      0.08s  3.57%  runtime.makeslice  
     0.01s  0.45% 90.62%      0.07s  3.12%  bytes.(*Buffer).Reset  
         0     0% 90.62%      0.07s  3.12%  runtime.systemstack  
     0.06s  2.68% 93.30%      0.06s  2.68%  bytes.(*Buffer).Truncate  

Memory:

(pprof) top 30
917.56MB of 917.56MB total (  100%)
      flat  flat%   sum%        cum   cum%
  917.56MB   100%   100%   917.56MB   100%  encoding/csv.(*Reader).parseRecord
         0     0%   100%   917.56MB   100%  csv_perf.BenchmarkCSV
         0     0%   100%   917.56MB   100%  encoding/csv.(*Reader).Read
         0     0%   100%   917.56MB   100%  runtime.goexit
         0     0%   100%   783.55MB 85.40%  testing.(*B).launch
         0     0%   100%   134.01MB 14.60%  testing.(*B).run1.func1
         0     0%   100%   917.56MB   100%  testing.(*B).runN
(pprof) list parseRecord
Total: 917.56MB
ROUTINE ======================== encoding/csv.(*Reader).parseRecord in /home/bradfitz/go/src/encoding/csv/reader.go
  917.56MB   917.56MB (flat, cum)   100% of Total
         .          .    238:           haveField, delim, err := r.parseField()
         .          .    239:           if haveField {
         .          .    240:                   // If FieldsPerRecord is greater than 0 we can assume the final
         .          .    241:                   // length of fields to be equal to FieldsPerRecord.
         .          .    242:                   if r.FieldsPerRecord > 0 && fields == nil {
  543.55MB   543.55MB    243:                           fields = make([]string, 0, r.FieldsPerRecord)
         .          .    244:                   }
  374.01MB   374.01MB    245:                   fields = append(fields, r.field.String())
         .          .    246:           }
         .          .    247:           if delim == '\n' || err == io.EOF {
         .          .    248:                   return fields, err
         .          .    249:           } else if err != nil {
         .          .    250:                   return nil, err

We probably can't do much about the slice per row, but we could probably add a interned string table cache to *Reader, keeping an LRU of the most recent N rows. Maybe per-column. And maybe not a map, but just a window of the past N (3, 5?) values per column.

You would have to take care of keeping the cost of that string interning mechanism cheaper than the savings from reduced GC.

I haven't looked at the rest of the CPU profile.

Peeking on bytes and doing ReadByte instead of the relatively expensive ReadRune would probably help a lot.

@bradfitz bradfitz added this to the Unplanned milestone Aug 18, 2016

@nussjustin

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2016

FYI I build and ran the go code from this issue with tip (604efe1) and compared it to tip+cl 24723 applied and I got around 10% improvment.

I also wrote a little benchmark:

func BenchmarkCSV(b *testing.B) {
    d, err := ioutil.ReadFile("mock_data.csv")

    if err != nil {
        b.Fatal(err)
    }

    br := bytes.NewReader(d)

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        br.Reset(d)

        r := csv.NewReader(br)

        for {
            _, err := r.Read()

            if err == io.EOF {
                break
            }

            if err != nil {
                b.Fatal(err)
            }
        }
    }
}

Results for 10 runs each (tip & tip+cl 24723)

name   old time/op    new time/op    delta
CSV-8     1.27s ± 1%     1.13s ± 2%  -11.49%  (p=0.000 n=10+9)

name   old alloc/op   new alloc/op   delta
CSV-8     160MB ± 0%     144MB ± 0%  -10.00%   (p=0.000 n=9+9)

name   old allocs/op  new allocs/op  delta
CSV-8     7.00M ± 0%     2.00M ± 0%  -71.43%   (p=0.000 n=9+9)

CPU profile with cl 24723:

(pprof) top20
1.18s of 1.18s total (  100%)
Showing top 20 nodes out of 37 (cum >= 0.01s)
      flat  flat%   sum%        cum   cum%
     0.33s 27.97% 27.97%      0.35s 29.66%  bufio.(*Reader).ReadRune
     0.17s 14.41% 42.37%      0.17s 14.41%  bytes.(*Buffer).grow
     0.16s 13.56% 55.93%      0.33s 27.97%  bytes.(*Buffer).WriteByte
     0.16s 13.56% 69.49%      0.95s 80.51%  encoding/csv.(*Reader).parseField
     0.08s  6.78% 76.27%      1.17s 99.15%  encoding/csv.(*Reader).parseRecord
     0.07s  5.93% 82.20%      0.42s 35.59%  encoding/csv.(*Reader).readRune
     0.06s  5.08% 87.29%      0.12s 10.17%  runtime.mallocgc
     0.05s  4.24% 91.53%      0.05s  4.24%  runtime.heapBitsSetType
     0.04s  3.39% 94.92%      0.37s 31.36%  bytes.(*Buffer).WriteRune
     0.02s  1.69% 96.61%      0.02s  1.69%  runtime.memmove
     0.01s  0.85% 97.46%      0.10s  8.47%  runtime.makeslice
     0.01s  0.85% 98.31%      0.01s  0.85%  runtime.memclr
     0.01s  0.85% 99.15%      0.04s  3.39%  runtime.rawstringtmp
     0.01s  0.85%   100%      0.01s  0.85%  syscall.Syscall
         0     0%   100%      1.18s   100%  _/home/justinn/Workspace_test.BenchmarkCSV
         0     0%   100%      0.02s  1.69%  bufio.(*Reader).fill
         0     0%   100%      0.01s  0.85%  bytes.(*Buffer).ReadFrom
         0     0%   100%      0.02s  1.69%  bytes.(*Reader).Read
         0     0%   100%      1.17s 99.15%  encoding/csv.(*Reader).Read
         0     0%   100%      0.01s  0.85%  io/ioutil.ReadFile
(pprof) top20 --cum
1.14s of 1.18s total (96.61%)
Showing top 20 nodes out of 37 (cum >= 0.02s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      1.18s   100%  _/home/justinn/Workspace_test.BenchmarkCSV
         0     0%     0%      1.18s   100%  runtime.goexit
         0     0%     0%      1.18s   100%  testing.(*B).run1.func1
         0     0%     0%      1.18s   100%  testing.(*B).runN
         0     0%     0%      1.17s 99.15%  encoding/csv.(*Reader).Read
     0.08s  6.78%  6.78%      1.17s 99.15%  encoding/csv.(*Reader).parseRecord
     0.16s 13.56% 20.34%      0.95s 80.51%  encoding/csv.(*Reader).parseField
     0.07s  5.93% 26.27%      0.42s 35.59%  encoding/csv.(*Reader).readRune
     0.04s  3.39% 29.66%      0.37s 31.36%  bytes.(*Buffer).WriteRune
     0.33s 27.97% 57.63%      0.35s 29.66%  bufio.(*Reader).ReadRune
     0.16s 13.56% 71.19%      0.33s 27.97%  bytes.(*Buffer).WriteByte
     0.17s 14.41% 85.59%      0.17s 14.41%  bytes.(*Buffer).grow
     0.06s  5.08% 90.68%      0.12s 10.17%  runtime.mallocgc
     0.01s  0.85% 91.53%      0.10s  8.47%  runtime.makeslice
     0.05s  4.24% 95.76%      0.05s  4.24%  runtime.heapBitsSetType
     0.01s  0.85% 96.61%      0.04s  3.39%  runtime.rawstringtmp
         0     0% 96.61%      0.04s  3.39%  runtime.slicebytetostring
         0     0% 96.61%      0.03s  2.54%  runtime.rawstring
         0     0% 96.61%      0.02s  1.69%  bufio.(*Reader).fill
         0     0% 96.61%      0.02s  1.69%  bytes.(*Reader).Read

Memory profile with cl 24723:

(pprof) top10
1971674 of 1971675 total (  100%)
Dropped 2 nodes (cum <= 9858)
      flat  flat%   sum%        cum   cum%
   1971674   100%   100%    1971674   100%  encoding/csv.(*Reader).parseRecord
         0     0%   100%    1971675   100%  _/home/justinn/Workspace_test.BenchmarkCSV
         0     0%   100%    1971674   100%  encoding/csv.(*Reader).Read
         0     0%   100%    1971675   100%  runtime.goexit
         0     0%   100%    1971675   100%  testing.(*B).run1.func1
         0     0%   100%    1971675   100%  testing.(*B).runN
@martisch

This comment has been minimized.

Copy link
Member

commented Aug 19, 2016

Did a quick test and it seems for cases that are mostly ASCII we could get another speedup by replacing bytes.Buffer with []byte, byte appends in two places. This will avoid some of the overhead of the WriteRune->(if >utft8.RuneSelf)->WriteByte->grow bytes.Buffer chain for writing each byte.

quick test with the mentioned mock_data.csv:

name    old time/op  new time/op  delta
CSV-2    1.48s ± 0%   1.10s ± 0%  -25.64%  (p=0.000 n=19+19)
Read-2  5.48µs ± 0%  5.05µs ± 0%   -7.77%  (p=0.000 n=20+20)

And it would work in addition to the patch from @nuss-justin.

Quick prototype patch 27430

If it is deemed to be an ok tradeoff to remove the bytes.Buffer dependency i can prepare a polished patch.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 19, 2016

CL https://golang.org/cl/27430 mentions this issue.

@hirochachacha

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2016

I think we can use https://golang.org/pkg/bufio/#Reader.ReadSlice.

line, err := r.ReadSlice('\n')

and use bytes.IndexByte(',') directly.
So extra allocations (allocated by external of *bufio.Reader) is required only when quotes contains escape sequences.

@ulikunitz

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2016

Find a barebone csv reader https://gist.github.com/ulikunitz/676de633513152c41ab832158a07ac57. It doesn't do CRLF conversion, doesn't count lines nor columns and supports no options.

Here is the benchstat output against 1.7 with BenchmarkRead from reader_test.go and BenchmarkCSV proposed by @nuss-justin.

name    old time/op  new time/op  delta
Read-8  9.63µs ± 3%  8.02µs ± 2%  -16.78%   (p=0.000 n=9+10)
CSV-8    1.78s ± 0%   0.86s ± 1%  -51.76%  (p=0.000 n=10+10)

I propose to make it the starting point for making it compatible with the current implementation.

@kokes

This comment has been minimized.

Copy link

commented Aug 24, 2016

Replacing [rR]eadRune by [rR]eadByte gives me around 18% speedup, see my implementation. This is obviously just for testing, since I rip out the rune handling altogether.

From what I can see, we read runes just because we can have a rune delimiter (and/or comment). We can add a private byte comma to Reader and populate it iff the separator is byte-sized, which is both the default (an actual comma) and the most likely case (has anyone ever split a CSV on a rune?). And if we have a byte-sized separator, we can replace all rune reading and writing with byte reading and writing.

I can imagine the Read method to test the size and then dispatch a different method, which only uses byte reading, writing, and switching.

if r.Comma < utf8.RuneSelf {
    r.comma = byte(r.Comma)
    record, err = r.parseRecordByBytes()
} else {
    record, err = r.parseRecord()
}

(Or it could be equally incorporated to parseField, but there would be a lot of switching back and forth, depending on the byte/rune context.)

(Handling of r.Comment would be handled similarly, but that's a rather minor issue.)

(I don't know much about runes, but could we maybe just read on bytes and stop on the first byte of a rune, regardless of its length? And if it's a multibyte character, then check the subsequent bytes for a match. I don't know how many false positives this would yield though.)

@ericlagergren

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2016

@kokes

...if it's a multibyte character, then check the subsequent bytes for a match. I don't know how many false positives this would yield though.)

utf8 does just that: https://golang.org/pkg/unicode/utf8/#example_DecodeRune
However: http://research.swtch.com/utf8

@weberc2

This comment has been minimized.

Copy link

commented Aug 29, 2016

Apologies if this is the wrong place for this comment (perhaps this should be its own feature request?). I was wondering if it would be possible to implement an efficient streaming API. I'm thinking a single buffer reused reused across all rows, with calls that return a [][]byte, error wherein the subslices are slices into the buffer (only valid until the next hit to the API). The standard Read() and ReadAll() methods could use this API, allocating a new buffer before each call to this API so as to provide the normal guarantees that the returned slices won't be affected by subsequent calls?

@weberc2

This comment has been minimized.

Copy link

commented Aug 29, 2016

Here's a naive example: https://play.golang.org/p/zbMdK8rCTH

type CSVReader struct {
    Scanner *bufio.Scanner
}

func (r CSVReader) StreamingRead() ([][]byte, error) {
    if r.Scanner.Scan() {
        return bytes.Split(r.Scanner.Bytes(), []byte(",")), nil
    }
    if err := r.Scanner.Err(); err != nil {
        return nil, err
    }
    return nil, io.EOF
}
BenchmarkStdRead-4                    50          30504491 ns/op         2486373 B/op    85013 allocs/op
BenchmarkStreamingRead-4             300           4916641 ns/op         1924190 B/op     5006 allocs/op
@forax

This comment has been minimized.

Copy link

commented Sep 3, 2016

It's almost OT but the equivalent of defer in Java is a try-with-resources, i.e. the try with parenthesis,
so the code in Java should be

public static void main(String[] args) throws IOException {
  try(BufferedReader reader = Files.newBufferedReader(Paths.get("mock_data.csv"))) {
    String line;
    while((line = reader.readLine()) {
      String[] data = line.split(",");
      if (data[0].equals("42")) {
        System.out.println(line);
      }
    }
  } // close() is implicitly called here
}

or with the Stream API of Java 8

public static void main(String[] args) throws IOException {
  try(Stream stream = Files.lines(Paths.get("mock_data.csv"))) {
    stream
      .filter(l -> l.split(",")[0].equals("42"))
      .forEach(System.out::println);
  } // close() is implicitly called here
}
@pauldraper

This comment has been minimized.

Copy link

commented Sep 3, 2016

The Java code isn't a CSV parser.

Using Apache Commons CSV,

import java.io.*;
import java.nio.charset.Charset;
import java.util.Iterator;
import org.apache.commons.csv.*;

class CsvTest {
    public static void main(String[] args) throws IOException {
        File data = new File("mock_data.csv");
        Charset encoding = Charset.defaultCharset();
        try (CSVParser parser = CSVParser.parse(data, encoding, CSVFormat.DEFAULT)) {
            Iterator<CSVRecord> iterator = parser.iterator();
            while (iterator.hasNext()) {
                CSVRecord line = iterator.next();
                if (line.get(0).equals("42")) {
                    System.out.println(line);
                }
            }
        }
    }
}

$ TIMEFORMAT=%R

$ go version
go version go1.6 linux/amd64
$ time ./csv-test > /dev/null
3.049
$ time ./csv-test > /dev/null
3.238

$ python3 --version
Python 3.4.3
$ time python csv-test.py > /dev/null
0.586
$ time python csv-test.py > /dev/null
0.581

$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~14.04-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
$ time java -cp .:commons-csv-1.4.jar CsvTest > /dev/null
1.698
$ time java -cp .:commons-csv-1.4.jar CsvTest > /dev/null
1.552

Compared to Go 1.6 (latest version with a .deb), my test put Python at 5.3x faster and Java at 1.9x faster.

@ALTree

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2016

@pauldraper thanks for posting this. I'm really not a Java programmer and the Java the code I posted in the first message is not actually useful for a fair comparison.

@olalonde

This comment has been minimized.

Copy link

commented Sep 3, 2016

@pwaller

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2016

For another data point, I observe that it is possible to go >5x faster than the stdlib parser, if you can throw out quote handling and avoid allocations. Whenever I've needed to read large amounts of CSV-style data, I first make sure separators don't appear in the data, then no quoting is required and you can use a much simpler CSV parser. My favourites are the ASCII characters Unit Separator and Record Separator (man ascii, \x1f & \x1e), which don't usually appear in the data.

https://github.com/pwaller/usv/blob/master/usv.go

Of course this isn't a viable option if you want to be able to read standard CSV files, but it does seem to hint that there is a big performance loss somewhere in the standard library CSV decoder.

@weberc2

This comment has been minimized.

Copy link

commented Sep 3, 2016

@pauldraper Someone on HN pointed out that your Java benchmark didn't first warm up the JIT. This is probably an important caveat.

@pauldraper

This comment has been minimized.

Copy link

commented Sep 3, 2016

@weberc2, it important if you have a long running process. It all depends what you want to measure.

If I'm writing a command-line CSV parser, then warming it up would be unrealistic.

This is an apple/orange/pear comparison anyway. Compiled vs. interpreted + compiled vs. JIT: completely difference performance characteristics. The point is that even in the best light, the csv module could use some love.


FYI, for the record, I think you overestimate the importance of warm-up time here. PyPy uses JIT + pure Python csv. For me, it's 4x faster than the Go baseline and 2x faster than Java.

(For Java, Apache Commons CSV is just not particularly fast. I chose the most "standard" library. If you want something faster but more complicated, look at Jackson or another library.)

@kokes

This comment has been minimized.

Copy link

commented Sep 5, 2016

I took the approach I mentioned earlier (operating on bytes instead of runes) and extended it a bit. What I'm outlining here is an effort to achieve feature parity (and thus a drop-in replacement for the current reader) with improved performance.

The main idea is as follows: instead of constructing each field rune by rune (or even byte by byte), why not just check each byte and when an action is required (e.g. a newline/separator/carriage return/quotation mark is hit), slice out everything buffered so far, starting at the end of the previous field. This way we eliminate one allocation per field (a costly one at that). (We may even call string() directly on this underlying slice, since string copies it out.)

This can be done quite easily for unquoted fields, because they are always contiguous in memory, so we just slice them out. There is a bit more work for quoted fields with non-lazy quotes within - e.g. "this is ""a field"" with quotes" would need to be sliced three times to cut out the double quotes - still less work then byte by byte appending.

In order to use this slicing of loaded data, I opted for a fixed-sized buffer that I read into and then slice that (or string to be precise, so we're good, since string() copies the data) once the end of a field is hit. At the end of a CSV row, we take the remaining buffer and move it to the beginning of the buffer, so that the next Read() call can reuse this. I'm not quite sure it's okay we advance the reader more than necessary - that is, if we just want to read one line that's shorter than 256 bytes (my arbitrary choice), we will have advanced the reader by that much. If we're in a field at the end of our buffer, we save the remainder to a separate slice and read new data (and stitch together afterwards).

There are still some issues, obviously:

  • saving intermediate data between reads is quite wonky and does not work at times (bloody off by ones)
  • at the end of a Read(), do we copy the remainder of the buffer to the beginning or do we just save the position we're at for the next Read() to pick it up? The former is implemented now, the latter might be a bit quicker for narrow columns
  • it currently supports just the basics - no quotes within fields (be it proper quotes or lazy ones), carriage returns are folded into newlines only at the end of a CSV lines, not within quoted fields, no support for comments etc.

I did some simple benchmarking, but it's a bit premature, since the output is not always correct. The speedup is quite nice thought, around 2-3x, but let's wait for a bit for more sophisticated testing.

https://gist.github.com/kokes/3924255b9629b0bef87f8127d473cd56

@infogulch

This comment has been minimized.

Copy link

commented Sep 6, 2016

I wanted to benchmark Read() (e.g. how many allocations are made per record) so I added a benchmark to test this that reads b.N rows.

Turns out it makes columns+1 allocations (surprise, surprise). I was able to bring this down to 2 allocations (regardless of the number of columns) by concatenating the entire record into one byte buffer, converting it to a string once at the end and slicing the string up into the fields with field lengths that were recorded during record parsing. (ReadNLarge has 26 columns, ReadNSmall has 3.)

benchmark                 old ns/op     new ns/op     delta
BenchmarkRead-4           7724          6463          -16.33%
BenchmarkReadNLarge-4     2690          1487          -44.72%
BenchmarkReadNSmall-4     382           271           -29.06%

benchmark                 old allocs     new allocs     delta
BenchmarkRead-4           41             30             -26.83%
BenchmarkReadNLarge-4     27             2              -92.59%
BenchmarkReadNSmall-4     4              2              -50.00%

benchmark                 old bytes     new bytes     delta
BenchmarkRead-4           5844          5704          -2.40%
BenchmarkReadNLarge-4     442           448           +1.36%
BenchmarkReadNSmall-4     51            51            +0.00%

The only problem I see with this approach is if the user needs to keep around one column which prevents the rest of a large record from being cleaned up by the GC (is the GC able to trim and reallocate unreachable strings after they've been sliced?). Changing this back to allocate every field separately makes the delta negligible.

One bonus is that it's now trivial to implement a zero-allocs ReadStream() ([][]byte, error) interface. The performance win here is more dramatic, if slightly offtopic:

benchmark                 old ns/op     new ns/op     delta
BenchmarkRead-4           7579          6333          -16.44%
BenchmarkReadNLarge-4     2733          1126          -58.80%
BenchmarkReadNSmall-4     377           153           -59.42%

benchmark                 old allocs     new allocs     delta
BenchmarkRead-4           41             30             -26.83%
BenchmarkReadNLarge-4     27             0              -100.00%
BenchmarkReadNSmall-4     4              0              -100.00%

benchmark                 old bytes     new bytes     delta
BenchmarkRead-4           5844          5704          -2.40%
BenchmarkReadNLarge-4     442           0             -100.00%
BenchmarkReadNSmall-4     51            0             -100.00%
@nussjustin

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

@infogulch Yeah, your change is similiar to mine (https://golang.org/cl/24723) where I basically did the same thing.

The only problem I see with this approach is if the user needs to keep around one column which prevents the rest of a large record from being cleaned up by the GC (is the GC able to trim and reallocate unreachable strings after they've been sliced?). Changing this back to allocate every field separately makes the delta negligible.

No, the GC doesn't trim the unreachable memory. It will only collect the whole allocation at once. But the case where a user only holds onto a single column, preventing the rest from being GC'ed, is in my experience not that common and the user can still copy the column into a new string.

@siadat

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

The benchmarked code for all languages includes print functions even though they are redirected to /dev/null. So we are benchmarking the printing functions as well. It would be interesting to compare the results if you post the result for benchmarks without the print functions., i.e.,:

  • remove fmt.Println(line) from the Go code
  • remove print(row) from the Python code
  • remove System.out.println(line) from the Java code
@ALTree

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2016

The print functions are only called once (the condition id == 42 is only true for the last record), I doubt they make any difference.

@siadat

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

@ALTree Absolutely, my bad. I just read the CSV generation code.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 4, 2016

CL https://golang.org/cl/30357 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 5, 2016

encoding/csv: update and add CSV reading benchmarks
Benchmarks broken off from https://golang.org/cl/24723 and modified to
allocate less in the places we're not trying to measure.

Updates #16791

Change-Id: I508e4cfeac60322d56f1d71ff1912f6a6f183a63
Reviewed-on: https://go-review.googlesource.com/30357
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Oct 5, 2016

CL https://golang.org/cl/24723 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 5, 2016

encoding/csv: avoid allocations when reading records
This commit changes parseRecord to allocate a single string per record,
instead of per field, by using indexes into the raw record.

Benchstat (done with f69991c)

name                          old time/op    new time/op    delta
Read-8                          3.17µs ± 0%    2.78µs ± 1%  -12.35%  (p=0.016 n=4+5)
ReadWithFieldsPerRecord-8       3.18µs ± 1%    2.79µs ± 1%  -12.23%  (p=0.008 n=5+5)
ReadWithoutFieldsPerRecord-8    4.59µs ± 0%    2.77µs ± 0%  -39.58%  (p=0.016 n=4+5)
ReadLargeFields-8               57.0µs ± 0%    55.7µs ± 0%   -2.18%  (p=0.008 n=5+5)

name                          old alloc/op   new alloc/op   delta
Read-8                            660B ± 0%      664B ± 0%   +0.61%  (p=0.008 n=5+5)
ReadWithFieldsPerRecord-8         660B ± 0%      664B ± 0%   +0.61%  (p=0.008 n=5+5)
ReadWithoutFieldsPerRecord-8    1.14kB ± 0%    0.66kB ± 0%  -41.75%  (p=0.008 n=5+5)
ReadLargeFields-8               3.86kB ± 0%    3.94kB ± 0%   +1.86%  (p=0.008 n=5+5)

name                          old allocs/op  new allocs/op  delta
Read-8                            30.0 ± 0%      18.0 ± 0%  -40.00%  (p=0.008 n=5+5)
ReadWithFieldsPerRecord-8         30.0 ± 0%      18.0 ± 0%  -40.00%  (p=0.008 n=5+5)
ReadWithoutFieldsPerRecord-8      50.0 ± 0%      18.0 ± 0%  -64.00%  (p=0.008 n=5+5)
ReadLargeFields-8                 66.0 ± 0%      24.0 ± 0%  -63.64%  (p=0.008 n=5+5)

For a simple application that I wrote, which reads in a CSV file (via
ReadAll) and outputs the number of rows read (15857625 rows), this change
reduces the total time on my notebook from ~58 seconds to ~48 seconds.

This reduces time and allocations (bytes) each by ~6% for a real world
CSV file at work (~230000 rows, 13 colums).

Updates #16791

Change-Id: Ia07177c94624e55cdd3064a7d2751fb69322d3e4
Reviewed-on: https://go-review.googlesource.com/24723
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@dpinela

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2017

I've made a separate implementation of parseRecord for the case where it's not necessary to parse non-ASCII characters (specifically if the separator and comment rune are ASCII and leading space trimming is disabled), using strings.Split when possible and a simple state machine when not: https://go-review.googlesource.com/36270

@gopherbot

This comment has been minimized.

Copy link

commented Feb 3, 2017

CL https://golang.org/cl/36270 mentions this issue.

@aktau

This comment has been minimized.

Copy link

commented Mar 5, 2017

Is there any separate issue about stopping to interpret the input as UTF-8? Though performance improvements would be nice (CSV reading and writing is the main slowdown in my command line program), the functionality improvement is the important part.

The problem is that in some situations I encounter, the columns are just byte strings, and they may contain byte sequences that are not valid UTF-8. Go, when reading (and probably also writing) them, mangles it:

Input: <some-unprintable-byte-sequence-that-doesn't-contain-quotes-nor-commas>,aktau
As hex: 0941 b41c 2c61 6b74 6175 0a

Note that the 3rd byte 0xb4 in the first column is above the pass-through bytes of UTF-8 (>=0x7F) as detailed here https://research.swtch.com/utf8. When passed through Go's CSV reader, the following is returned:

Input (hex):      0941 b41c
Go mangled (hex): 0941 efbf bd1c

I didn't understand what was happening at first, until I looked at Go's CSV reading/writing source and saw that it used the *Rune functions, even though none of the special characters in (traditional) CSV are multi-byte.

@ALTree

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2017

@aktau this looks like a different problem. Please open another issue. Thanks!

@aktau

This comment has been minimized.

Copy link

commented Mar 5, 2017

@ALTree done: #19410. The reason I posted here was because some of the proposed solutions for speeding up encoding/csv purport to get rid of the *Rune functions. This would also solve my problem. But you're absolutely right, it's a separate issue. Sorry for the hijack!

@nussjustin

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2017

#19721 added a new field ReuseRecord to the Reader for cases where the returned slice itself is only needed until the next call to Read. This doesn't help in all cases, but can still improve many uses of the Read.

@Redundancy

This comment has been minimized.

Copy link

commented May 22, 2017

Edit: not relevant, my bad.

@ALTree

This comment has been minimized.

Copy link
Member Author

commented May 22, 2017

1 hour for a 50MB file? no way.. also why the profile shows 99.8% time spent in cgocall? There must be something wrong with your code, please ask for a code review in one of the go forums or mailing list.

@Redundancy

This comment has been minimized.

Copy link

commented May 22, 2017

Ah, you're right, for some reason the code didn't have proper termination checking. Not sure when that got broken - apologies, ignore!

@davecheney

This comment has been minimized.

Copy link
Contributor

commented May 22, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Oct 20, 2017

Change https://golang.org/cl/72150 mentions this issue: encoding/csv: simplify and optimize Reader

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.