Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race #36

Closed
papa-stiflera opened this issue Oct 20, 2017 · 9 comments
Closed

Data race #36

papa-stiflera opened this issue Oct 20, 2017 · 9 comments

Comments

@papa-stiflera
Copy link

Example code:

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a", func(e *colly.HTMLElement) {
		link := e.Attr("href")
		fmt.Println(link)
		c.Visit(e.Request.AbsoluteURL(link))
	})

	c.Visit("https://en.wikipedia.org/")
}

Execution log:

==================
WARNING: DATA RACE
Write at 0x00c420083950 by main goroutine:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/hashmap_fast.go:598 +0x0
  net/textproto.MIMEHeader.Set()
      /usr/local/go/src/net/textproto/header.go:22 +0x60
  net/http.Header.Set()
      /usr/local/go/src/net/http/header.go:31 +0x60
  net/http.(*Request).AddCookie()
      /usr/local/go/src/net/http/request.go:385 +0x37c
  net/http.(*Client).send()
      /usr/local/go/src/net/http/client.go:170 +0x115
  net/http.(*Client).Do()
      /usr/local/go/src/net/http/client.go:602 +0x513
  crawler/vendor/github.com/asciimoo/colly.(*httpBackend).Do()
      /home/skruglov/Projects/go/src/crawler/vendor/github.com/asciimoo/colly/http_backend.go:154 +0x105
  crawler/vendor/github.com/asciimoo/colly.(*httpBackend).Cache()
      /home/skruglov/Projects/go/src/crawler/vendor/github.com/asciimoo/colly/http_backend.go:110 +0x9e
  crawler/vendor/github.com/asciimoo/colly.(*Collector).scrape()
      /home/skruglov/Projects/go/src/crawler/vendor/github.com/asciimoo/colly/colly.go:226 +0x461
  crawler/vendor/github.com/asciimoo/colly.(*Collector).Visit()
      /home/skruglov/Projects/go/src/crawler/vendor/github.com/asciimoo/colly/colly.go:157 +0x9b
  main.main()
      /home/skruglov/Projects/go/src/crawler/main.go:19 +0xe4

Previous read at 0x00c420083950 by goroutine 24:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/hashmap_fast.go:208 +0x0
  net/http.http2isConnectionCloseRequest()
      /usr/local/go/src/net/http/h2_bundle.go:8652 +0xae
  net/http.(*http2clientConnReadLoop).endStreamError()
      /usr/local/go/src/net/http/h2_bundle.go:8288 +0xe2
  net/http.(*http2clientConnReadLoop).endStream()
      /usr/local/go/src/net/http/h2_bundle.go:8277 +0x54
  net/http.(*http2clientConnReadLoop).processData()
      /usr/local/go/src/net/http/h2_bundle.go:8267 +0x1ce
  net/http.(*http2clientConnReadLoop).run()
      /usr/local/go/src/net/http/h2_bundle.go:7896 +0x737
  net/http.(*http2ClientConn).readLoop()
      /usr/local/go/src/net/http/h2_bundle.go:7788 +0x11c

Goroutine 24 (running) created at:
  net/http.(*http2Transport).newClientConn()
      /usr/local/go/src/net/http/h2_bundle.go:7053 +0xe1a
  net/http.(*http2Transport).NewClientConn()
      /usr/local/go/src/net/http/h2_bundle.go:6991 +0x55
  net/http.(*http2addConnCall).run()
      /usr/local/go/src/net/http/h2_bundle.go:835 +0x55
==================

#mw-head
#p-search
/wiki/Wikipedia
...
@asciimoo
Copy link
Member

I can't reproduce the bug, the above code runs without errors. What is your environment (os/go version/colly version)?

@papa-stiflera
Copy link
Author

$ go version
go version go1.9.1 linux/amd64

colly version: d7069d1 (master)

@asciimoo
Copy link
Member

hmm.. I can reproduce if I add the -race flag to go run command, but I don't really understand why this happens.

@kataras
Copy link

kataras commented Oct 20, 2017

@papa-stiflera you didn't paste the whole code you're using in order to be able to re-produce this. At the end of your logs: /home/skruglov/Projects/go/src/crawler/main.go:19 +0xe4 but the code you gave us is not too long( maybe of the import statements? I don't know but...) so I assume your data race is somewhere else and has nothing to do with the net/http package or the colly one. Please give us the necessary information to help you, thank you!

My output (win 10 x64, go 1.9.1);

colly_issue_36_output

@asciimoo
Copy link
Member

asciimoo commented Oct 20, 2017

@kataras thanks for the debugging. My first assumption was the same, then I tried the above code, and the bug is reproducible - even with the below snippet:

package main

import (
    "github.com/asciimoo/colly"
)

func main() {
    colly.NewCollector().Visit("https://en.wikipedia.org/")
}

The strange thing is that if I change the URL to a service which doesn't support HTTP/2 (e.g. to https://httpbin.org/) the race disappears.

UPDATE:
It's pretty sure that the bug is somehow connected to the HTTP/2 support;
This command doesn't fail:
GODEBUG='http2client=0' go run -race t/test_36.go
and this fails:
GODEBUG='http2client=1' go run -race t/test_36.go

@kataras
Copy link

kataras commented Oct 21, 2017

@asciimoo It's funny because the data racer couldn't find that with the first try, I had to re-run the program more than 4 times to view the race log...

Update:

I managed to "fix" that by locking when client.Do, see below and test it by yourself, if that works just put locks there;

colly_changes

@asciimoo
Copy link
Member

@kataras unfortunately your suggested solution is not applicable, because it forbids parallelism in httpBackend and the error doesn't disappear for me if I run GODEBUG='http2client=1' go run -race xy.go.

@kataras
Copy link

kataras commented Oct 21, 2017

I know @asciimoo ...I suggested it as a temporary solution, I don't know the whole code base so I can't help any further for now, but if it's a net/http issue then you have to fill an issue there :/

@asciimoo
Copy link
Member

The bug is reproducible without colly.
The following code demonstrates the problem:

package main

import (
    "net/http"
    "net/http/cookiejar"
)

func main() {
    jar, _ := cookiejar.New(nil)
    client := &http.Client{Jar: jar}
    client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
        lastRequest := via[len(via)-1]
        req.Header = lastRequest.Header
        return nil
    }
    client.Get("https://en.wikipedia.org/")
}

Seems, the bug only appears if the client has a cookie jar and a custom redirect handler which writes to http.Request.Header using HTTP/2 protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants