Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different MaxDepth on AllowedDomains and others ? #773

Closed
Tazeg opened this issue Jul 1, 2023 · 3 comments
Closed

Different MaxDepth on AllowedDomains and others ? #773

Tazeg opened this issue Jul 1, 2023 · 3 comments
Labels

Comments

@Tazeg
Copy link

Tazeg commented Jul 1, 2023

How to set colly.MaxDepth(1) for non allowed domains and infinite depth for AllowedDomains ?
My aim is to get example.com pages and if this site contains a link to example2.com I want to get it but no more.

@WGH-
Copy link
Collaborator

WGH- commented Aug 20, 2023

You'll have to check Request.Depth manually before calling Request.Visit, I believe

@WGH- WGH- added the question label Aug 20, 2023
@Tazeg
Copy link
Author

Tazeg commented Sep 20, 2023

e.Request.Depth is always 1.

For example, if https://example.com contains <a href="https://domain1.com/page.html"> and <a href="https://domain2.com/page.html"> , I need to visit those domains pages, but I don't want to visit any other pages of domain1 neither domain2.

I don't understand how to do this.

@WGH-
Copy link
Collaborator

WGH- commented Oct 16, 2023

Please ask this on Stack Overflow.

As a hint, MaxDepth and AllowedDomains won't help you, as they're not flexible enough. I think you'll have to introduce your own logic in OnHTML handler, and call Visit conditionally.

@WGH- WGH- closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants