encoding/csv: BOM presents in fields read with Reader #9588

tp · 2015-01-14T10:55:19Z

I was checking the header fields of an external CSV file and noticed, that the file BOM is part of the first field when reading with csv.Reader.

package main

import (
    "bytes"
    "encoding/csv"
    "fmt"
)

func main() {
    csvData := []byte("\uFEFFa,b")

    r := bytes.NewReader(csvData)

    csvR := csv.NewReader(r)

    header, err := csvR.Read()

    if err != nil {
        fmt.Println(err.Error())
        return
    }

    fmt.Printf("%q", header[0]) // prints "\ufeffa" where I expected "a"
}

snippet on playground

Related: Since U+FEFF is called a "[...] space", I was expected string.TrimSpace to remove it, which it did not. (Which would have been my preferable work-around to remove and spaces around fields). I would guess this is also the reason why csvR.TrimLeadingSpace = true does not remove the BOM.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2015-01-14T15:52:48Z

The BOM is a bizarre idea in general, and it makes absolutely no sense when using UTF-8. It's not appropriate for encoding/csv to do anything special with a BOM. If you have to deal with it, deal it with before passing your reader to encoding/csv. If you have a file that is not UTF-8, you will to use a translating reader anyhow, as encoding/csv, like all Go code, expects UTF-8.

While it's true that U+FEFF is a space, the UTF-8 representation of U+FEFF is not the literal bytes FEFF.

mikioh changed the title ~~BOM presents in fields read with csv.Reader~~ encoding/csv: BOM presents in fields read with Reader Jan 14, 2015

ianlancetaylor closed this as completed Jan 14, 2015

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/csv: BOM presents in fields read with Reader #9588

encoding/csv: BOM presents in fields read with Reader #9588

tp commented Jan 14, 2015

ianlancetaylor commented Jan 14, 2015

encoding/csv: BOM presents in fields read with Reader #9588

encoding/csv: BOM presents in fields read with Reader #9588

Comments

tp commented Jan 14, 2015

ianlancetaylor commented Jan 14, 2015