Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Katana will not exclude links with parameters such as js and css from crawling #987

Open
CatDrinkCoffee opened this issue Aug 10, 2024 · 3 comments
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@CatDrinkCoffee
Copy link

Normally, the crawler will not request js and css pages once more, but when I used the -sb parameter to observe the browser crawling process, I found that Katana actually had the following problem

For example: .js and .css will not be visited once during the crawling process, but if it is with parameters, such as .js?ver=1.1, the crawler will choose to visit this page once, which will cause a huge number of crawler requests. Now many pages may have a parameter value after the js link. I think this is a defect and hope it can be fixed. Thank you

The picture below is what I captured when the crawler chose to visit this js (this js link is with parameters)

1723303751361

@CatDrinkCoffee CatDrinkCoffee added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Aug 10, 2024
@zrquan
Copy link

zrquan commented Aug 15, 2024

Have you tried the -igq flag?

@CatDrinkCoffee
Copy link
Author

Have you tried the -igq flag?

There is no -igq parameter in the document, but this is not a defect, right? I don't need to use other parameters to circumvent this defect. From the execution process, the program itself is designed not to crawl specific links, but in some special cases, this design fails.

@zrquan
Copy link

zrquan commented Sep 2, 2024

There is no -igq parameter in the document

Sorry, it should be the -iqp parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

2 participants