Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting final request when there is a page redirect #74

Closed
native-human opened this issue Dec 29, 2017 · 2 comments
Closed

Getting final request when there is a page redirect #74

native-human opened this issue Dec 29, 2017 · 2 comments
Labels

Comments

@native-human
Copy link

When there is a page redirect, colly automatically follows the redirect. In that case, I get a Request object in the OnHTML callback. It seems that colly provides the original Request and not the one after the redirect. Since I want to follow all the links on the html site, I use the Request object to get the absolute URL. However, in that case this doesn't work as expected, since the Request Object has the wrong URL. The example below illustrates the problem:

package main

import (
	"fmt"
	"net/http"
	"time"

	"github.com/gocolly/colly"
)

func main() {
	go func() {
		http.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			http.Redirect(w, r, "/r/", http.StatusSeeOther)

		}))
		http.Handle("/r/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			fmt.Fprintf(w, `<a href="test">test</a>`)
		}))
		http.ListenAndServe("127.0.0.1:9999", nil)
	}()
	time.Sleep(500 * time.Millisecond)
	c := colly.NewCollector()
	c.AllowedDomains = []string{"127.0.0.1:9999"}
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		fmt.Println(e.Request.AbsoluteURL(e.Attr("href")))
	})
	c.Visit("http://127.0.0.1:9999/")
	c.Wait()
	time.Sleep(1000 * time.Hour)
}

The example gives "http://127.0.0.1:9999/test". However when I go to "http://127.0.0.1" via firefox and click on the link, I get redirected to "http://127.0.0.1:9999/r/test".

Is there a better way to mimic the behavior of the browser in this case?

@asciimoo
Copy link
Member

@native-human thanks for the detailed report. Hopefully 37c1a91 fixes it, could you confirm?

@native-human
Copy link
Author

Thanks, works great now for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants