Runtime Error: invalid memory address or nil pointer dereference #35

AwolDes · 2017-10-18T07:36:57Z

Hey mate!

I'm loving colly so far. I'm new to the Go programming language and I've just been messing around with your scraping library and found a weird bug.

I was just testing out scraping my website, and then allowing the scraper to scrape Medium. I end up with this error:

(I'm using Go v 1.9 on Linux x86).

This is the code:

package main
import (
    "fmt"
    "github.com/asciimoo/colly"
)

func main() {
    scraper := colly.NewCollector()
    scraper.AllowedDomains = []string{"onslow.io", "medium.com"}

    scraper.OnHTML("a[href]", func(element *colly.HTMLElement) {
        link := element.Attr("href")
        // Print link
		fmt.Printf("Link found: %q -> %s\n", element.Text, link)
		// Visit link found on page
		// Only those links are visited which are in AllowedDomains
		go scraper.Visit(element.Request.AbsoluteURL(link))
    })

    scraper.OnError(func(request *colly.Response, err error) {
        fmt.Println("Request URL:", request.Request.URL, "failed with response:", request, "\nError:", err)
    })

    scraper.OnRequest(func(request *colly.Request) {
        fmt.Println("Visiting", request.URL.String())
    })

    scraper.Visit("http://onslow.io")
    scraper.Wait()


}

From what I've gathered, it has to do with the Goroutines possibly not syncing properly?

If you have any other ideas on the cause of this, it'd be great to hear them!

Cheers

The text was updated successfully, but these errors were encountered:

asciimoo · 2017-10-18T10:19:59Z

@AwolDes thanks for the detailed bug report and the nice words. 2665d14 fixes the bug hopefully.
A minor advice for the above code: The concurrency is not limited what is not adviced (you can DOS the targets or get "Too Many Requests" error). Use LimitRules to control the allowed parallelism for domains or spawn fixed number of goroutines.
Limit example:

scraper.Limit(&colly.LimitRule{DomainGlob: "onslow.io", Parallelism: 2})
scraper.Limit(&colly.LimitRule{DomainGlob: "medium.com", Parallelism: 5})

AwolDes · 2017-10-19T00:09:36Z

Thanks for the quick fix @asciimoo! And thank you for pointing out the Limit() function that colly has. I wasn't aware of this, I'll be sure to implement it.

asciimoo closed this as completed in 2665d14 Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime Error: invalid memory address or nil pointer dereference #35

Runtime Error: invalid memory address or nil pointer dereference #35

AwolDes commented Oct 18, 2017

asciimoo commented Oct 18, 2017

AwolDes commented Oct 19, 2017

Runtime Error: invalid memory address or nil pointer dereference #35

Runtime Error: invalid memory address or nil pointer dereference #35

Comments

AwolDes commented Oct 18, 2017

asciimoo commented Oct 18, 2017

AwolDes commented Oct 19, 2017