Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multi -extract-regex flag and json output #529

Merged
merged 14 commits into from
Jun 13, 2022

Conversation

M09Ic
Copy link
Contributor

@M09Ic M09Ic commented Feb 25, 2022

feature1: multiple -er flags

example :

echo https://www.baidu.com |.\httpx.exe -er head -er ip -silent

https://www.baidu.com [head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head,head] [ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip,ip]

feature2: json output

example:

echo https://www.baidu.com |.\httpx.exe -er head -er ip -silent -json

{"timestamp":"2022-02-26T02:48:09.7169724+08:00","scheme":"https","port":"443","path":"/","body-sha256":"b90a16e9a86a118c71046ca27860e4023773fba3220fb742166d222ec664d6fa","header-sha256":"4ab7f2a738c21d43ba8a701e2efc94595e4b8a8b84e89f5168013af6bf2f90c7","url":"https://www.baidu.com:443","input":"https://www.baidu.com","title":"百度一下","webserver":"apache","content-type":"text/html","method":"GET","host":"45.113.192.101","content-length":206085,"status-code":200,"response-time":"2.4769607s","failed":false,"extracts":{"head":["head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head","head"],"ip":["ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip","ip"]},"lines":3,"words":3996}

@M09Ic
Copy link
Contributor Author

M09Ic commented Feb 25, 2022

Sorry, I don't know how to use git, so I've committed some of the content from the last pr to that pr.
Please help me to remove the commit about the #517.

@ehsandeep ehsandeep requested a review from Mzack9999 March 1, 2022 09:26
Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - A couple of suggestions:

  • Remove duplicates from the extraction list + eventually replace them with a repetition counter:
    {"head":["head","head"],"test":["test","test"]} would become {"head":["head":2],"test":["test":2]}
  • Add support for named extraction groups in regexes
    ^ @ehsandeep

@M09Ic
Copy link
Contributor Author

M09Ic commented Mar 3, 2022

lgtm - A couple of suggestions:

  • Remove duplicates from the extraction list + eventually replace them with a repetition counter:
    {"head":["head","head"],"test":["test","test"]} would become {"head":["head":2],"test":["test":2]}
  • Add support for named extraction groups in regexes
    ^ @ehsandeep

I think the duplicate counters will be nested in one more layer, and can just be de-duplicated instead of adding counters. So, maybe we can just do the de-duplication.

Named extraction groups seems to add new flag, I think, maybe add some extract regexp preset, like automatically map ip to ((2(5[0-5]|[0-4]\d))|[0-1]? \d{1,2})(\. ((2(5[0-5]|[0-4]\d))|[0-1]? \d{1,2})){3}. We can add many useful presets, such as js, ip, phone, mail, url...

@M09Ic
Copy link
Contributor Author

M09Ic commented Mar 11, 2022

I tried to use FindAllStringSubMatch instead of FindAllString to get all named extraction groups in regexes, but it makes the result complicated and confusing.
I think that if we need multiple named extraction groups in regexes, we can use multiple -er flags instead of multiple named extraction groups in a single regular expression.
Like, use -er aa(group1)bbgroup2cc -er aagroup1bb(group2)cc instead of -er aa(group1)bb(group2)cc

@M09Ic
Copy link
Contributor Author

M09Ic commented May 17, 2022

new feature support regexp preset dd5657e

input:
echo https://hackerone.com/reports/1536299 |.\httpx.exe -ep ip -json -silent

output:
echo https://hackerone.com/reports/1536299 |.\httpx.exe -ep ip -json -silent {"timestamp":"2022-05-17T11:10:02.3879915+08:00","scheme":"https","port":"443","path":"/reports/1536299","url":"https://hackerone.com:443/reports/1536299","input":"https://hackerone.com/reports/1536299","title":"HackerOne","webserver":"cloudflare","content-type":"text/html","method":"GET","host":"104.16.100.52","content-length":7832,"status-code":200,"csp":{"domains":["www.google-analytics.com","api.amplitude.com","cover-photos.hackerone-user-content.com","hackerone-us-west-2-production-attachments.s3.us-west-2.amazonaws.com","https://errors.hackerone.net/api/30/security/?sentry_key=374aea95847f4040a69f9c8d49a3a59d\u0026sentry_environment=production","www.youtube-nocookie.com","a5s.hackerone-ext-content.com","*.browser-intake-datadoghq.com","hackathon-photos.hackerone-user-content.com","profile-photos.hackerone-user-content.com","cdn.amplitude.com","b5s.hackerone-ext-content.com","errors.hackerone.net"]},"response-time":"2.3855601s","failed":false,"hashes":{"body-md5":"6356eedcda71d4eb9e79eda99f6c2a40","body-mmh3":"217403385","body-sha256":"6e6643740d4de0e30910aea7798b63804a2b67d2ead37bfbbd3225e392855591","body-simhash":"9814074894922261300","header-md5":"f38f3cf4a19e52d65ffc87695a6ec730","header-mmh3":"-29793505","header-sha256":"3c009ee19b2660ae0ca0bd3c3c17f38485d95a41eff4e0e12b4128db9c94efce","header-simhash":"15615867893775578413"},"extracts":{"ip":["172.105.190.180","172.105.190.180","172.105.190.180"]},"lines":83,"words":384}

@M09Ic
Copy link
Contributor Author

M09Ic commented May 17, 2022

deduplicate results c56abcb

input:
echo https://hackerone.com/reports/1536299 |.\httpx.exe -ep ip -json -silent

new output:
..."extracts":{"ip":["172.105.190.180"]}...

@ehsandeep ehsandeep requested a review from Mzack9999 May 29, 2022 15:16
@Mzack9999 Mzack9999 requested a review from ehsandeep June 13, 2022 11:35
@ehsandeep ehsandeep merged commit cd18edc into projectdiscovery:dev Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants