Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error page classifier #1245

Merged
merged 16 commits into from Jul 19, 2023

Conversation

dogancanbakir
Copy link
Member

This PR adds an error page classifier and error page filtering support. Closes #1201.

w/o filtering:

$ go run httpx.go -l list.txt

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

                projectdiscovery.io

[INF] Current httpx version v1.3.2 (latest)
https://projectdiscovery.io/notarealpage
https://projectdiscovery.io
https://scanme.sh

w/ filtering:

$ go run httpx.go -l list.txt -fep

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

                projectdiscovery.io

[INF] Current httpx version v1.3.2 (latest)
https://projectdiscovery.io
https://scanme.sh
$ cat list.txt 
https://projectdiscovery.io
https://projectdiscovery.io/notarealpage
http://scanme.sh

@Mzack9999 Mzack9999 added the Type: Enhancement Most issues will probably ask for additions or changes. label Jun 28, 2023
Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - commented out training logic to reduce embed occupation. We will need to move it the utils repository and automate periodic training via GH action

Copy link
Member

@ehsandeep ehsandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dogancanbakir This is an interesting feature, can you add a small document about what this feature means to the user and its working as well when best to use?

Also, we can write information to filtered_error_page.json of what is being filtered when -fep option is used for further review/testing.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Merge conflict
  • Let's check if we can move the bayesian classifier agnostic implementation to utils/mlutils and keep here only the loading with weights (with commented out code)
  • Optional: unless we are going to implement a periodic scrape + train GH action for now I think static stored weights from our latest training suffice

@dogancanbakir
Copy link
Member Author

@Mzack9999,

Let's check if we can move the bayesian classifier agnostic implementation to utils/mlutils and keep here only the loading with weights (with commented out code)

I have submitted a PR (projectdiscovery/utils#208) which, once merged, can be utilized in this implementation.

Optional: unless we are going to implement a periodic scrape + train GH action for now I think static stored weights from our latest training suffice

I think we can do the latter; using stored weights from our latest training is fine.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cleanup the code and reuse projectdiscovery/utils#208 (merged)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement Most issues will probably ask for additions or changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add onboard hash / classic ML classifier for standard error-pages/home-pages
3 participants