Skip to content

net/http: DetectContentType not working for text/html with line feed character #71584

Closed as not planned
@russinholi

Description

@russinholi

Go version

go version 1.23.5

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/reginaldo/.cache/go-build'
GOENV='/home/reginaldo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/reginaldo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/reginaldo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/snap/go/10818'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/snap/go/10818/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.5'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/reginaldo/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2302776111=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I'm using the http.DetectContentType to get the content type of a html file that starts with these lines:

<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...

So I call http.DetectContentType passing the first 512 bytes of the file.

I shared this go.play link to help to reproduce the issue: https://go.dev/play/p/cM_Wy5pEiYT

What did you see happen?

As after <!DOCTYPE html we have a \n character, the content type returned by the function is text/plain; charset=utf-8.
Apparently because the isTT function is not considering line-feed characters as a tag-termination byte and is returning false when matching the file content with the html signature string:

// isTT reports whether the provided byte is a tag-terminating byte (0xTT)
// as defined in https://mimesniff.spec.whatwg.org/#terminology.
func isTT(b byte) bool {
	switch b {
	case ' ', '>':
		return true
	}
	return false
}

What did you expect to see?

The correct content type returned should be text/xml; charset=utf-8 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReportIssues describing a possible bug in the Go implementation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions