Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to save save redirection value instead of request #90

Open
noraj opened this issue Oct 21, 2018 · 9 comments
Open

Option to save save redirection value instead of request #90

noraj opened this issue Oct 21, 2018 · 9 comments
Labels
bug Something isn't working medium

Comments

@noraj
Copy link

noraj commented Oct 21, 2018

I used python3 photon.py --url http://x.x.x.x --level 1 --only-url and I got a list of 103 internal URL.

All the URL are using the following pattern: http://x.x.x.x/?r=[redirection_token].

Having this list alone is pretty useless, what is interesting is to get the redirection value (for example contained in the Location header after a HTTP 302 or 303 code).

There should be an option to store the redirection value instead of the raw URL when a redirection HTTP code is hit.

This could be implemented with something like in pseudo-code:

check_http_code_status(code):
  switch(code):
  case 200:
    store(request)
  case 301, 302, 303:
    store(answer.location):
  case 404:
    do_nothing
@s0md3v s0md3v added bug Something isn't working medium labels Oct 22, 2018
@s0md3v
Copy link
Owner

s0md3v commented Oct 22, 2018

Hi @noraj ,

Thanks for reporting the issue, can you please check if this PR fixes it?

Photon should now store the redirecting URLs in redirects.txt in the following format:

https://example.com/redirect_from==>https://example.com/redirect_to

@s0md3v
Copy link
Owner

s0md3v commented Oct 23, 2018

@noraj ???

@noraj
Copy link
Author

noraj commented Oct 23, 2018

@s0md3v Yeah answering, I'm just writing long post and I need to check what I say before affirming it.

I git cloned a fresh copy then git checkout redirect, then ran python photon.py --url http://x.X.x.x/ --level 1 --only-url but I have the exact same result as before without https://example.com/redirect_from==>https://example.com/redirect_to.

I think this is because when http://x.X.x.x/ is hit the code is 200 and there is --level 1 so other links are scrapped but not requested no we never go in the if code[0] == '3': statement.

Photon/photon.py

Lines 219 to 222 in 0a5de25

if code != '404':
if code[0] == '3':
redirects.add(url + ':' + response.url)
return response.text

So we are forced to use python photon.py --url http://x.X.x.x/ --level 2 --only-url but here instead of having the 103 internal URL from the root page I have more than 700 URLs from all the sub-pages and it took way more time to scan (103 remote pages instead of just one).

That is why I talked about a redirect switch option that will allow internal URL collected to be requested to see if they answer a page or a redirection, and then if it is a redirection.
So what I mean is keep the actual behavior + add a new option --whatevername that will treat internal URL scrapped as potential redirection and so request them to store the potential redirection value in addition of the raw internal URL.

Also I got about 30 (using level 2) URL in failed.txt but all are valid, example:

$ curl -vvv http://x.x.x.x/\?s\=_____ba8da76e357a______
*   Trying x.x.x.x...
* TCP_NODELAY set
* Connected to x.x.x.x (x.x.x.x) port 80 (#0)
> GET /?s=_____ba8da76e357a______ HTTP/1.1
> Host: x.x.x.x
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 303 See Other
< Date: Tue, 23 Oct 2018 18:47:37 GMT
< Server: localhost
< Content-Type: text/html
< Location: https://googleprojectzero.blogspot.com/xxxxxxxxxxx.html
< Content-Length: 0
< 
* Connection #0 to host x.x.x.x left intact

So I don't know why they are failed.

But even with level no redirection value are stored, I even checked with grep -ri '==>' ./.

@noraj
Copy link
Author

noraj commented Oct 23, 2018

PS : maybe check that python requests lib handle 303 redirect.

@s0md3v
Copy link
Owner

s0md3v commented Oct 24, 2018

Hi @noraj ,

It is to let you know that the issue has been acknowledged and I am working on it.
I will add a new switch, --verify which will solve redirection and 404 issues by verifying all the URLs added on each level before crawling further.

Thanks for the verbose explanation of the issue, it really helped.

PS: Would it be possible for you to provide the website you are testing against?
You can dm at twitter

@0xInfection
Copy link
Contributor

I guess adding a parameter allow_redirects=False to L239, and doing a relevant check will fix this.

@s0md3v
Copy link
Owner

s0md3v commented Nov 9, 2018

@0xInfection We want to follow redirects.

@s0md3v
Copy link
Owner

s0md3v commented Nov 9, 2018

Don't worry guys, I will fix it once I have free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working medium
Projects
None yet
Development

No branches or pull requests

4 participants
@noraj @s0md3v @0xInfection and others