Closed as not planned
Closed as not planned
Description
Go version
go version 1.23.5
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/reginaldo/.cache/go-build'
GOENV='/home/reginaldo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/reginaldo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/reginaldo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/snap/go/10818'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/snap/go/10818/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.5'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/reginaldo/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2302776111=/tmp/go-build -gno-record-gcc-switches'
What did you do?
I'm using the http.DetectContentType to get the content type of a html file that starts with these lines:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...
So I call http.DetectContentType passing the first 512 bytes of the file.
I shared this go.play link to help to reproduce the issue: https://go.dev/play/p/cM_Wy5pEiYT
What did you see happen?
As after <!DOCTYPE html
we have a \n
character, the content type returned by the function is text/plain; charset=utf-8
.
Apparently because the isTT
function is not considering line-feed characters as a tag-termination byte and is returning false when matching the file content with the html signature string:
// isTT reports whether the provided byte is a tag-terminating byte (0xTT)
// as defined in https://mimesniff.spec.whatwg.org/#terminology.
func isTT(b byte) bool {
switch b {
case ' ', '>':
return true
}
return false
}
What did you expect to see?
The correct content type returned should be text/xml; charset=utf-8
.