Skip to content

pkg/scanner: Scanner.Pos() column/offset off by 1 on illegal UTF-8 encoding #2138

@gopherbot

Description

@gopherbot

by ijknacho:

What steps will reproduce the problem?
Given:
  package main

  import (
    "bytes"
    "fmt"
    "io"
    "scanner"
  )

  func runScanner(r io.Reader) {
    var s scanner.Scanner
    s.Init(r)
    s.Error = func(s *scanner.Scanner, text string) {
      pos := s.Pos()
      fmt.Printf("  pos = %v, text = %v\n", pos, text)
    }
    _ = s.Scan()
  }

  func main() {
    // NUL at column 1
    buf := bytes.NewBufferString("")
    buf.WriteByte(0x0)
    fmt.Printf("scan of %q:\n", buf.Bytes())
    runScanner(buf)

    // NUL at column 4 
    buf = bytes.NewBufferString("abc")
    buf.WriteByte(0x0)
    fmt.Printf("scan of %q:\n", buf.Bytes())
    runScanner(buf)

    // bad unicode at column 1
    buf = bytes.NewBufferString("")
    buf.WriteByte(0x80)
    fmt.Printf("scan of %q:\n", buf.Bytes())
    runScanner(buf)

    // BUG:
    // bad unicode at column 4, but scanner reports column 3
    buf = bytes.NewBufferString("abc")
    buf.WriteByte(0x80)
    fmt.Printf("scan of %q:\n", buf.Bytes())
    runScanner(buf)
  }

What is the expected output?
scan of "abc\x80":
  pos = 1:4, text = illegal UTF-8 encoding

What do you see instead?
scan of "abc\x80":
  pos = 1:3, text = illegal UTF-8 encoding


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Linux version 2.6.32-131.6.1.el6.x86_64

Which revision are you using?  (hg identify)
d5785050f61d (release-branch.r59) release.r59/release

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions