Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: surface the duration and error metrics for each source when enumerating domains #727

Conversation

owenrumney
Copy link
Contributor

@owenrumney owenrumney commented Dec 2, 2022

  • Add a duration field to the subscraping.Result struct
  • Populate the TimeTaken field before passing to the result channel

I've tried to be as light touch as possible - if this option isn't desirable I thought about adding a map[string]map[string]time.Duration to the Agent and storing the sources time taken for a given domain against the agent and surface that way.

If that would be more desirable, happy to rework the PR for that

Resolves #726

[INF] Found 7 subdomains for 'owenrumney.co.uk' in 30 seconds 357 milliseconds
[INF] Printing source statistics for 'owenrumney.co.uk'

 Source               Duration      Results     Errors
────────────────────────────────────────────────────────
 alienvault           776ms              47          0
 anubis               123ms               0          1
 commoncrawl          30.354s             0          0
 crtsh                186ms              86          0
 digitorus            255ms              93          0
 dnsdb                0s                  0          0
 dnsdumpster          2.975s              5          0
 hackertarget         727ms               0          0
 rapiddns             24.069s             1          0
 riddler              123ms               0          0
 sitedossier          8.469s              0          1
 virustotal           0s                  0          0
 waybackarchive       577ms             343          0

 The following sources were skipped...

 bevigil
 binaryedge
 bufferover
 c99
 censys
 certspotter
 chaos
 chinaz
 dnsrepo
 fofa
 fullhunt
 github
 hunter
 intelx
 passivetotal
 quake
 robtex
 securitytrails
 shodan
 threatbook
 whoisxmlapi
 zoomeye
 zoomeyeapi

@owenrumney owenrumney marked this pull request as ready for review December 2, 2022 09:59
@ehsandeep ehsandeep changed the base branch from master to dev December 2, 2022 10:08
@owenrumney
Copy link
Contributor Author

It looks like the issues breaking the build were already there? Possibly already being tracked by #678

@vzamanillo
Copy link
Contributor

I'm not sure about this change, modifying the source result properties from the outside is pretty weird, I think this should be done per source, each source is the entity responsible for doing this.

@owenrumney
Copy link
Contributor Author

Let me rework it and move the duration into Run. I'm inclined to agree, it doesn't feel great updating state of the source externally

@owenrumney owenrumney force-pushed the surface-enumeration-duration-in-result branch from 7d49486 to e0a5e1a Compare December 12, 2022 10:11
@owenrumney
Copy link
Contributor Author

@vzamanillo I've reworked to move the time keeping into the source. Exposing as a time.Duration to make it easier to work with as a metric, for logging purposes the .String() method will make sure its printed in a sensible granularity.

I also stepped over commoncrawl test for without auth - that seemed to be consistently failing and when I looked at the actually payload, it's downloading 5mb of JSON for the test.

@tarunKoyalwar
Copy link
Member

@owenrumney , If I run the program now time taken by each source is printed before all subdomains . Printing time taken by each source at end of program will be much better and visible .

I suggest to implement (a *Agent) TimeTaken method of Agent or any other func name .
using sources from Agent Struct we can call TimeTaken method of each source and print them . with some formatting maybe like a table and sort them according to name or time taken etc .

and we can call this new function from runner package after gologger.Info().Msgf("Found %d subdomains .

I hope this makes sense . Feel free to comment if you have any questions or suggestions.

@owenrumney
Copy link
Contributor Author

@tarunKoyalwar - Thanks for the feedback, I've restructured the timer with the deferred function but also added counts for the results and errors.

In the debug flag group, I've added --source-statistics flag to say you want to output the source stats at the end.

This shows -
image

I wanted to get your thoughts on the intent then if the principle is acceptable I'll update the PR

@tarunKoyalwar
Copy link
Member

@owenrumney , this looks awesome are you using any 3rd party library for this ??

@tarunKoyalwar
Copy link
Member

also -stats flag should be enough instead of --source-statistics . we already have -stats flag in nuclei .

Owen Rumney added 2 commits December 12, 2022 20:12
- Add a duration field to the `subscraping.Result` struct
- Populate the `TimeTaken` field before passing to the result channel

Resolves projectdiscovery#726
- sources responsible for their own time keeping
- step over commoncrawl in the no auth test, it is consistently failing as a timeout
@owenrumney owenrumney force-pushed the surface-enumeration-duration-in-result branch from e0a5e1a to ef04847 Compare December 12, 2022 20:12
@owenrumney
Copy link
Contributor Author

@tarunKoyalwar - no 3rd library just a simple stats.go file

I'm more than happy to keep chipping away at this if there are any changes you want made.

I've changed the flag to --stats as suggested

- add stats flag to the debug group
- print the time taken, number of results and errors for sources that run
- print stats for each run of the agent
@owenrumney owenrumney force-pushed the surface-enumeration-duration-in-result branch from ef04847 to f55404f Compare December 12, 2022 20:17
@tarunKoyalwar
Copy link
Member

@owenrumney , everything seems good . can you use gologger.Print().Msgf instead of fmt.Print this will keep printing uniform across the package .

@owenrumney
Copy link
Contributor Author

@tarunKoyalwar - switched to gologger.Print(), didn't know that worked without the label 🎉

Fixed linting issues

Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@tarunKoyalwar
Copy link
Member

./subfinder -d hackerone.com -stats        

               __    _____           __         
   _______  __/ /_  / __(_)___  ____/ /__  _____
  / ___/ / / / __ \/ /_/ / __ \/ __  / _ \/ ___/
 (__  ) /_/ / /_/ / __/ / / / / /_/ /  __/ /    
/____/\__,_/_.___/_/ /_/_/ /_/\__,_/\___/_/ v2.5.5

		projectdiscovery.io

Use with caution. You are responsible for your actions
Developers assume no liability and are not responsible for any misuse or damage.
By using subfinder, you also agree to the terms of the APIs used.

[INF] Loading provider config from '/Users/tarun/.config/subfinder/provider-config.yaml'
[INF] Enumerating subdomains for 'hackerone.com'
b.ns.hackerone.com
api.hackerone.com
go.hackerone.com
mta-sts.managed.hackerone.com
mta-sts.forwarding.hackerone.com
www.hackerone.com
design.hackerone.com
support.hackerone.com
mta-sts.hackerone.com
info.hackerone.com
events.hackerone.com
gslink.hackerone.com
hackerone.com
a.ns.hackerone.com
links.hackerone.com
docs.hackerone.com
3d.hackerone.com
resources.hackerone.com
[INF] Found 18 subdomains for 'hackerone.com' in 2 seconds 897 milliseconds
[INF] Printing source statistics for 'hackerone.com'

 Source               Duration      Results     Errors
────────────────────────────────────────────────────────
 alienvault           797ms              91          0
 anubis               116ms              13          0
 crtsh                730ms             437          0
 digitorus            791ms             228          0
 dnsdumpster          2.898s             18          0
 hackertarget         1.884s              4          0
 riddler              480ms              12          0
 virustotal           0s                  0          0


 The following sources were included but skipped...

 bevigil
 bufferover
 c99
 censys
 certspotter
 chaos
 chinaz
 dnsrepo
 fofa
 fullhunt
 hunter
 intelx
 passivetotal
 quake
 robtex
 securitytrails
 shodan
 whoisxmlapi


@owenrumney owenrumney changed the title feat: surface the duration for each source when enumerating domains feat: surface the duration and error metrics for each source when enumerating domains Dec 13, 2022
Copy link
Member

@ehsandeep ehsandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owenrumney this is nice, thank you for improving the UX by adding this option.

I also noticed some numbers are odd (censys), and we can handle it separately..

./subfinder -d hackerone.com -stats

               __    _____           __         
   _______  __/ /_  / __(_)___  ____/ /__  _____
  / ___/ / / / __ \/ /_/ / __ \/ __  / _ \/ ___/
 (__  ) /_/ / /_/ / __/ / / / / /_/ /  __/ /    
/____/\__,_/_.___/_/ /_/_/ /_/\__,_/\___/_/ v2.5.5

		projectdiscovery.io

Use with caution. You are responsible for your actions
Developers assume no liability and are not responsible for any misuse or damage.
By using subfinder, you also agree to the terms of the APIs used.

[INF] Loading provider config from '/Users/geekboy/.config/subfinder/provider-config.yaml'
[INF] Enumerating subdomains for 'hackerone.com'
o1.email.hackerone.com
3d.hackerone.com
zendesk1.hackerone.com
www.hackerone.com
docs.hackerone.com
mta-sts.forwarding.hackerone.com
b.ns.hackerone.com
info.hackerone.com
events.hackerone.com
zendesk3.hackerone.com
api.hackerone.com
o3.email.hackerone.com
hackerone.com
zendesk2.hackerone.com
email.hackerone.com
zendesk4.hackerone.com
support.hackerone.com
gslink.hackerone.com
o2.email.hackerone.com
gsemail.hackerone.com
mta-sts.hackerone.com
mta-sts.managed.hackerone.com
a.ns.hackerone.com
links.hackerone.com
design.hackerone.com
resources.hackerone.com
ns.hackerone.com
go.hackerone.com
managed.hackerone.com
[INF] Found 29 subdomains for 'hackerone.com' in 9 seconds 584 milliseconds
[INF] Printing source statistics for 'hackerone.com'

 Source               Duration      Results     Errors
────────────────────────────────────────────────────────
 alienvault           850ms              91          0
 anubis               350ms              13          0
 bevigil              899ms               0          0
 c99                  562ms              14          0
 censys               9.514s          50615          0
 chaos                386ms              21          0
 crtsh                698ms             438          0
 digitorus            699ms             228          0
 dnsdumpster          2.74s              18          0
 dnsrepo              901ms              49          0
 fullhunt             892ms              16          0
 hackertarget         1.753s              4          0
 quake                9.584s             25          0
 riddler              463ms              12          0
 shodan               302ms               6          0
 virustotal           0s                  0          0

@ehsandeep ehsandeep merged commit 8f499e9 into projectdiscovery:dev Dec 15, 2022
@owenrumney owenrumney deleted the surface-enumeration-duration-in-result branch December 19, 2022 12:13
@vzamanillo
Copy link
Contributor

vzamanillo commented Dec 19, 2022

@ehsandeep Noticed the Gitlab source was not migrated, can we reopen this?

@tarunKoyalwar
Copy link
Member

@vzamanillo , there is a work in progress PR with lot of code refactoring and other changes (including gitlab source and default rate limit) . It will be available by tomorrow eve

@vzamanillo
Copy link
Contributor

Oki oki, dind't know, thanks.

@owenrumney
Copy link
Contributor Author

Oh sorry did I miss updating one?

@tarunKoyalwar
Copy link
Member

@owenrumney , no worries I updated it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface the source duration in the emueration result
4 participants