Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to listen for EventAuthRequired events correctly #654

Closed
PIGfaces opened this issue Jul 13, 2022 · 15 comments
Closed

How to listen for EventAuthRequired events correctly #654

PIGfaces opened this issue Jul 13, 2022 · 15 comments
Labels
needs info The description is not enough to tackle the problem question Questions related to rod

Comments

@PIGfaces
Copy link

PIGfaces commented Jul 13, 2022

Rod Version: v0.107.3

The code to demonstrate your question

Hi guys, sorry for my english (and thanks to google translator)
as for me it's much nicer than chromedp, but I have a problem

I want to listen for FetchAuthRequired events, but such code will block the page. I also did not find relevant examples.
how to use EachEvent for FetchAuthRequired event correctly.

for my error code

package example_test

import (
	"testing"
	"time"

	"github.com/go-rod/rod"
	"github.com/go-rod/rod/lib/launcher"
	"github.com/go-rod/rod/lib/proto"
	"github.com/go-rod/rod/lib/utils"
)

func TestXxx(t *testing.T) {
	l := launcher.New().Headless(false).Devtools(true).MustLaunch()
	browser := rod.New().ControlURL(l).Trace(true).Logger(utils.LoggerQuiet).MustConnect().MustIgnoreCertErrors(true)

	page := browser.MustPage()

	go page.EachEvent(func(e *proto.FetchAuthRequired) {
		fakeAuth := &proto.FetchAuthChallengeResponse{
			Response: proto.FetchAuthChallengeResponseResponseProvideCredentials,
			Username: "user",
			Password: "passwd123",
		}
		// 继续
		proto.FetchContinueWithAuth{
			RequestID:             e.RequestID,
			AuthChallengeResponse: fakeAuth,
		}.Call(page)
	})()

	page.Timeout(time.Second * 20).
		MustNavigate("https://mdn.dev").
		MustWaitLoad().
		Timeout(time.Second * 5).
		MustElement("#root > div > main > div > div:nth-child(1) > section:nth-child(1) > h2").
		MustText()
}

I tried the example but it was blank

rod/examples_test.go

Lines 350 to 369 in 8eb4347

func Example_customize_browser_launch() {
url := launcher.New().
Proxy("127.0.0.1:8080"). // set flag "--proxy-server=127.0.0.1:8080"
Delete("use-mock-keychain"). // delete flag "--use-mock-keychain"
MustLaunch()
browser := rod.New().ControlURL(url).MustConnect()
defer browser.MustClose()
// So that we don't have to self issue certs for MITM
browser.MustIgnoreCertErrors(true)
// Adding authentication to the proxy, for the next auth request.
// We use CLI tool "mitmproxy --proxyauth user:pass" as an example.
go browser.MustHandleAuth("user", "pass")()
// mitmproxy needs a cert config to support https. We use http here instead,
// for example
fmt.Println(browser.MustPage("https://mdn.dev/").MustElement("title").MustText())
}

What you got

blank page for my code and timeout panic

alt my code run result

blank page for rod/example_test.go

alt example code result

What you expected to see

the page loads normally just like chromedp listenTarget

func TestAuth(t *testing) {
    chromedp.ListenTarget(ctx, func(ev interface{}) {
        switch v := ev.(type); v {
        case *fetch.EventAuthRequired:
            // continueAuth
        }
    })
}

What have you tried to solve the question

Try to understand function source code of HandleAuth() & compare chromedp code with rod code

@PIGfaces PIGfaces added the question Questions related to rod label Jul 13, 2022
@ysmood
Copy link
Collaborator

ysmood commented Jul 13, 2022

@PIGfaces
Copy link
Author

PIGfaces commented Jul 13, 2022

https://go-rod.github.io/#/network?id=proxy

对于文档中的这个例子中的 http 地址确实是可行的,但将 api.ipify.org 换成 mdn.dev 后依然会出现空白页面,只要监听了 FetchAuthRequired 这个事件,即使使用了 go 关键字依然会导致空白页面的产生

@PIGfaces
Copy link
Author

怎么才能正确订阅 FetAuthRequired 事件呢,像类似使用 switch-case 关键字,没命中则不执行相关的事件代码

@ysmood
Copy link
Collaborator

ysmood commented Jul 14, 2022

这个就是switch-case,IDE跳转到函数文档就够了: https://go-rod.github.io/#/events/README?id=handle-multiple-events

@PIGfaces
Copy link
Author

但是监听了 FetchAuthRequired 事件会阻塞页面加载,你这边没有复现出来嘛,将文档中的示例地址换成别的,如 http://mdn.dev

@ysmood
Copy link
Collaborator

ysmood commented Jul 15, 2022

这个是你代理的问题,http://mdn.dev 会强制跳转到 https

@PIGfaces
Copy link
Author

PIGfaces commented Jul 19, 2022

我的测试方式:

  • 取消代理
  • httpshttp 都测试
    这两种方式依然会出现白屏现象

@ysmood
Copy link
Collaborator

ysmood commented Jul 19, 2022

什么网站,单元测试你都无法通过吗?我这边测试是没有问题的

rod/hijack_test.go

Lines 348 to 382 in c78f6bd

func TestHandleAuth(t *testing.T) {
g := setup(t)
s := g.Serve()
// mock the server
s.Mux.HandleFunc("/a", func(w http.ResponseWriter, r *http.Request) {
u, p, ok := r.BasicAuth()
if !ok {
w.Header().Add("WWW-Authenticate", `Basic realm="web"`)
w.WriteHeader(401)
return
}
g.Eq("a", u)
g.Eq("b", p)
g.HandleHTTP(".html", `<p>ok</p>`)(w, r)
})
s.Route("/err", ".html", "err page")
go g.browser.MustHandleAuth("a", "b")()
page := g.newPage(s.URL("/a"))
page.MustElementR("p", "ok")
wait := g.browser.HandleAuth("a", "b")
var page2 *rod.Page
wait2 := utils.All(func() {
page2, _ = g.browser.Page(proto.TargetCreateTarget{URL: s.URL("/err")})
})
g.mc.stubErr(1, proto.FetchContinueRequest{})
g.Err(wait())
wait2()
page2.MustClose()
}

@PIGfaces
Copy link
Author

我也尝试了以下单元测试,均能正常加载 html 元素

  • 原有测试
  • 注释 355-360 行(忽略 MustHandleAuth 报错)

依然会超时失败的测试

将如下行

rod/hijack_test.go

Lines 370 to 371 in c78f6bd

page := g.newPage(s.URL("/a"))
page.MustElementR("p", "ok")

更换为

page := g.newPage("https://mdn.dev")
el := page.MustElement("html")

正常获取 html 元素的方式
注释掉

go g.browser.MustHandleAuth("a", "b")()

疑问 & 不解: 不能为正常页面监听 Auth 事件嘛,若不能的话该如何解决动态爬虫中有的页面有 Auth 有的页面没有

@ysmood
Copy link
Collaborator

ysmood commented Jul 21, 2022

疑问 & 不解: 不能为正常页面监听 Auth 事件嘛,若不能的话该如何解决动态爬虫中有的页面有 Auth 有的页面没有

可以的,给个 for loop 就行了,只要出现 auth 事件就处理。

page := g.newPage("https://mdn.dev")

我之前不是说了问题出在 https 吗?你只改 http 为 https,为啥不改改代理的设置呢,都要改的不能只改一个。我测试里的代理是为 http 设计的,没有考虑 https 的情况。这也不是 rod 该要解决的问题,代理的问题请自己查看专注于代理的库或者教学。

https 和 http 都测试
这两种方式依然会出现白屏现象

你说 http 也有同样问题,能给个只有 http 的网站吗?或者自己mock 一个只有 http 的,用 http://mdn.dev 是不行的,它会强制跳转到 https 页面。

@ysmood ysmood added the needs info The description is not enough to tackle the problem label Jul 21, 2022
@PIGfaces
Copy link
Author

感谢作者这么耐心解答问题,很抱歉我还是有些地方不明白

我之前不是说了问题出在 https 吗?你只改 http 为 https,为啥不改改代理的设置呢,都要改的不能只改一个。我测试里的代理是为 http 设计的,没有考虑 https 的情况。这也不是 rod 该要解决的问题,代理的问题请自己查看专注于代理的库或者教学。

新问题: 一定要设置代理才可以正确监听 auth 吗,对于 http://mdn.dev 我去掉了代理也是空白页面。

你说 http 也有同样问题,能给个只有 http 的网站吗?或者自己mock 一个只有 http 的,用 http://mdn.dev/ 是不行的,它会强制跳转到 https 页面。

http://www.b520.cc/

@ysmood
Copy link
Collaborator

ysmood commented Jul 21, 2022

新问题: 一定要设置代理才可以正确监听 auth 吗,对于 http://mdn.dev/ 我去掉了代理也是空白页面。

不用,这两个没有关系。

http://www.b520.cc/

你这个网站打开不需要 auth 啊?不懂你给这个地址什么意思?

@PIGfaces
Copy link
Author

你这个网站打开不需要 auth 啊?不懂你给这个地址什么意思?

https://github.com/Qianlitp/crawlergo/blob/9d6f751f05c19d66ceaf8f00a4185c03c6ccfa2b/pkg/engine/intercept_request.go#L85-L96

就是这个 ISSUE 最开始的问题,动态爬虫中,我不知道哪些页面需要处理 Auth 事件,所以想让所有页面都监听 Auth 事件,但正常页面监听这个事件的话就会阻塞页面加载,参照 crawlergo 调用 chromdp 的使用方式改写的

@ysmood
Copy link
Collaborator

ysmood commented Jul 21, 2022

懂你意思了,看了下代码,原因是你抄漏了这行:

https://github.com/Qianlitp/crawlergo/blob/5bd29ce7ab68a6961b7403382de025b68adb2c47/pkg/engine/intercept_request.go#L34

如果要处理 fetch 事件,必须要处理所有的 pause 事件,这个 chrome 的文档里有提到。我写个例子给你看看就懂了,注意看 proto.FetchContinueRequest

func TestLab(t *testing.T) {
	page := rod.New().MustConnect().MustPage()

	go page.EachEvent(func(e *proto.FetchAuthRequired) {
		fakeAuth := &proto.FetchAuthChallengeResponse{
			Response: proto.FetchAuthChallengeResponseResponseProvideCredentials,
			Username: "user",
			Password: "passwd123",
		}
		_ = proto.FetchContinueWithAuth{
			RequestID:             e.RequestID,
			AuthChallengeResponse: fakeAuth,
		}.Call(page)
	}, func(e *proto.FetchRequestPaused) {
		_ = proto.FetchContinueRequest{
			RequestID: e.RequestID,
		}.Call(page)
	})()

	txt := page.MustNavigate("https://mdn.dev").
		MustElement("#root > div > main > div > div:nth-child(1) > section:nth-child(1) > h2").
		MustText()

	gop.P(txt)
}

@PIGfaces
Copy link
Author

感谢大佬!
跪谢大佬!
.......
你~是我滴神!

怪我一开始表述不清,哈哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info The description is not enough to tackle the problem question Questions related to rod
Projects
None yet
Development

No branches or pull requests

2 participants