Skip to content

text/template: Handling of nbsp (U+00A0) character #71722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
weakcamel opened this issue Feb 13, 2025 · 5 comments
Closed

text/template: Handling of nbsp (U+00A0) character #71722

weakcamel opened this issue Feb 13, 2025 · 5 comments

Comments

@weakcamel
Copy link

Go version

go version go1.24.0 darwin/amd64

Output of go env in your module/workspace:

AR='ar'
CC='cc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='c++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/Users/waldekm/Library/Caches/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/waldekm/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/65/z8mpjq_j3svgcq3cf9d_6svr0000gp/T/go-build199605776=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/Users/waldekm/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/waldekm/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/Cellar/go/1.24.0/libexec'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/waldekm/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/Cellar/go/1.24.0/libexec/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.24.0'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I tried to compile the following example:

package main

import "text/template"
import "os"

func main() {

        type Item struct {
                Labels string
        }

        sweaters := Item{""}

        tmpl, err := template.New("test").Parse( "Disk space on one or more of filesystems on the host is low (under 20%). Path: {{ .Labels }}.")
        if err != nil { panic(err) }

        err = tmpl.Execute(os.Stdout, sweaters)
        if err != nil { panic(err) }
}

What did you see happen?

$ go run test.go
panic: template: test:1: bad character U+00A0

goroutine 1 [running]:
main.main()
	/Users/waldekm/gotempl/test.go:15 +0x172
exit status 2

What did you expect to see?

I expected the text to be parsed, i.e.

Disk space on one or more of filesystems on the host is low (under 20%). Path: .

I had a look at https://pkg.go.dev/text/template and it reads

The input text for a template is UTF-8-encoded text in any format. "Actions"--data evaluations or control structures--are delimited by "{{" and "}}"; all text outside actions is copied to the output unchanged.

I'm quite inexperienced with Go so please forgive my possible ignorance; to me this means an nbsp (non-breaking space) character should be handled by text/template just fine?

@seankhliao
Copy link
Member

it's only allowed within text, not within actions.
closing as working as intended.

Unlike many projects, the Go project does not use GitHub Issues for general discussion or asking questions. GitHub Issues are used for tracking bugs and proposals only.

For questions please refer to https://github.com/golang/go/wiki/Questions

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Feb 13, 2025
@weakcamel
Copy link
Author

weakcamel commented Feb 14, 2025

@seankhliao Thanks for the reply. Could you please point me to the relevant section of documentation that states so? If there isn't one then it's either a bug in code or bug in documentation and this issue should not be closed.

I've also noticed another tiny inconsistency within the documentation:

https://pkg.go.dev/text/template

For this trimming, the definition of white space characters is the same as in Go: space, horizontal tab, carriage return, and newline.

whereas the isSpace function states clearly that U+00A0 (NBSP) is indeed considered a white space:

https://pkg.go.dev/unicode#IsSpace

IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is
'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).

@ianlancetaylor
Copy link
Contributor

As you've seen, template actions use the same space characters as the the Go language: https://go.dev/ref/spec#Tokens.

The unicode package uses the Unicode definitions.

The Go language spec is clear as to what characters it supports. It's OK that the Go language and Unicode do not have identical definitions for space characters.

@weakcamel
Copy link
Author

@ianlancetaylor Thanks for your comment!

It's OK that the Go language and Unicode do not have identical definitions for space characters.

I would agree with that if not for the statement a few paragraphs above: https://go.dev/ref/spec#Source_code_representation

Source code is Unicode text encoded in UTF-8.

It's not logical to me to be at the same compliant with UTF-8 (a standard which defines what characters and characters classes there are) and disagree on what a white space character class is.

@ianlancetaylor
Copy link
Contributor

Using the UTF-8 encoding does not require anything about the meaning of characters. We also don't use https://www.unicode.org/reports/tr31/ for identifiers.

If you like, you can open a language change proposal for the language to start accepting all Unicode whitespace characters. See https://go.dev/s/proposal. If you want to pursue this, please look into what other comparable languages like C++, Rust, Python do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants