go-yaml should handle invalid integers as strings #157

Open
mbruzek opened this Issue Feb 9, 2016 · 8 comments

Comments

Projects
None yet
9 participants

mbruzek commented Feb 9, 2016

Starting a numeric sequence with zero indicates an octal number. So 01234567 == 342391 (base 10). Numeric sequences that start with zero and contain numbers higher than 7 are invalid integers and should be parsed as strings even when they do not have quotes around them.

key: 01182252

01182252 is not a valid octal integer and should be treated as a string. go-yaml attempts to parse the numeric sequence as an integer and this is incorrect.

Several yaml validators and other language yaml packages handle this case correctly such as the Python ruamel package and on line parsers http://yaml-online-parser.appspot.com/

package main

import (
    "fmt"
    "gopkg.in/v2/yaml"
)

var a = make(map[string]interface{})

func main() {
    data := []byte(`key: 01234567`)
    err := yaml.Unmarshal(data, a)
    if err != nil {
        fmt.Printf("%v\n", err)
    }
    fmt.Printf("%v\n", a["key"])

    // This value is surrounded by quotes.
    data = []byte(`key: "01182252"`)
    err = yaml.Unmarshal(data, a)
    if err != nil {
        fmt.Printf("%v\n", err)
    }
    fmt.Printf("%v\n", a["key"])

    // This value is not a valid octal integer therefore a string.
    data = []byte(`key: 01182252`)
    err = yaml.Unmarshal(data, a)
    if err != nil {
        fmt.Printf("%v\n", err)
    }
    fmt.Printf("%v\n", a["key"])
}

jbiel commented Aug 5, 2016

This is an issue for me.

qix commented Oct 6, 2016

I'm seeing this issue in an unlucky git commit hash that contained only digits 0787904.

Python serializes it to an unquoted string, which I don't like but is technically okay I believe.

print yaml.safe_dump({'test': '0787904'}, default_flow_style=False)
test: 0787904

vinzenz pushed a commit to vinzenz/yaml that referenced this issue Nov 3, 2016

vinzenz pushed a commit to vinzenz/yaml that referenced this issue Nov 3, 2016

CpuID commented Nov 18, 2016

This would be awesome to have resolved, docker-compose.yml ports are recommended to be defined as strings instead of integers.

dsegan commented Jan 20, 2017

FWIW, the spec seems to confirm this is wrong: http://yaml.org/spec/1.2/spec.html#id2803828

Integers are 0 | -? [1-9] [0-9]*

Contributor

niemeyer commented Jan 25, 2017

Merged #171.

@niemeyer niemeyer closed this Jan 25, 2017

PR #171 doesn't actually fix this issue.

Contributor

jameinel commented Feb 14, 2017

The regex in #171 uses: var yamlStyleFloat = regexp.MustCompile(^[-+]?[0-9]*\.?[0-9]+([eE][-+][0-9]+)?$)
The linked spec says for integers: http://yaml.org/spec/1.2/spec.html#id2803828
0 | -? [1-9] [0-9]*

Which would be "0" or negative or must not start with a 0 character.

IOW, Octal integers are disallowed. As is '-0'. The YAML spec also doesn't allow a leading "+" sign.

For floats it allows:
-? [1-9] ( . [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?

Which again doesn't allow a leading +, only in the exponent section, and it also doesn't allow a leading 0.

Contributor

rogpeppe commented Jan 9, 2018

FWIW, the spec seems to confirm this is wrong: http://yaml.org/spec/1.2/spec.html#id2803828

Integers are 0 | -? [1-9] [0-9]*

That's not actually the case. That shows the canonical form of an integer. The actual regexp is defined in section 10.3.2 and is [-+]? [0-9]+.

The actual bug here is that go-yaml is interpreting the number as octal (YAML uses a 0o prefix for octal numbers, which go-yaml doesn't understand) not decimal.

However, it seems that many existing parsers do indeed treat "01234" as a string, not an integer, despite the spec, so it may be wise to do that anyway.

@rogpeppe rogpeppe reopened this Jan 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment