Permalink
Browse files

Fix https bugs in snoopy

  • Loading branch information...
1 parent 64631b1 commit 05e37af6e94f6816a6efd19dc60fcb423f7265b7 @Zleep-Dogg Zleep-Dogg committed Jun 16, 2013
Showing with 8 additions and 1 deletion.
  1. +8 −1 lib/Snoopy.class.php
View
@@ -202,6 +202,7 @@ function fetch($URI)
if (!is_executable($this->curl_path))
return false;
$this->host = $URI_PARTS["host"];
+ $this->port=NULL; // don't force port unless non-default - otherwise we get weird -H "Host: www.site.com:80" parameter to curl which REPLACES host in returned data...
@Zleep-Dogg

Zleep-Dogg Jun 18, 2013

Contributor

This is a fix for something that actually happens inside the _httpsrequest function:

  • If there is a port defined (default=80), then _httpsrequest will append the port to the Host-parameter
  • No matter the port in the host parameter, curl will actually connect to the port defined in the url
  • If a website uses the header, then it will show the wrong port - here is the output from a command line example:
   ~$ /usr/bin/curl -s -k -H "Host: www.googleapis.com" https://www.googleapis.com/books/v1/volumes\?q=isbn:8774994506 | grep selfLink
      "selfLink": "https://www.googleapis.com/books/v1/volumes/e6QycAAACAAJ",
   ~$ /usr/bin/curl -s -k -H "Host: www.googleapis.com:7913" https://www.googleapis.com/books/v1/volumes\?q=isbn:8774994506 | grep selfLink
      "selfLink": "https://www.googleapis.com:7913/books/v1/volumes/e6QycAAACAAJ",
if(!empty($URI_PARTS["port"]))
$this->port = $URI_PARTS["port"];
if($this->_isproxy)
@@ -1006,7 +1007,7 @@ function _httpsrequest($url,$URI,$http_method,$content_type="",$body="")
$headerfile = tempnam($temp_dir, "sno");
- exec($this->curl_path." -k -D \"$headerfile\"".$cmdline_params." \"".escapeshellcmd($URI)."\"",$results,$return);
+ exec($this->curl_path." -k -D \"$headerfile\"".$cmdline_params." ".escapeshellcmd($URI),$results,$return);
@Zleep-Dogg

Zleep-Dogg Jun 18, 2013

Contributor

If a url includes '?' (which a lot of queries do) it would be "double escaped" in bash. as an example
$URI='https://www.googleapis.com/books/v1/volumes?q=isbn:8774994506'
would end up as
"https://www.googleapis.com/books/v1/volumes\?q=isbn:8774994506"
in the argument, meaning curl would try to fetch the url with backslash.

The correct thing (on the commandline) is to do either quoting or escaping - not both. And since the url ought to be urlencoded already (and thus not contain spaces), I opted for removing the quotes - I'm not sure, but escapeshellcmd might actually do it right if the quotes were inside...

if($return)
{
@@ -1046,7 +1047,13 @@ function _httpsrequest($url,$URI,$http_method,$content_type="",$body="")
}
if(preg_match("|^HTTP/|",$result_headers[$currentHeader]))
+ {
+ if(preg_match("|^HTTP/[^\s]*\s(.*?)\s|",$result_headers[$currentHeader], $status))
+ {
+ $this->status= $status[1];
@Zleep-Dogg

Zleep-Dogg Jun 18, 2013

Contributor

_httpsrequest did not update status like _httprequest does (fix is simple copy-paste).
In opendb this results in https not working at all since the status codes are checked in lib/OpenDbSnoopy.class.php#L129

+ }
$this->response_code = $result_headers[$currentHeader];
+ }
$this->headers[] = $result_headers[$currentHeader];
}

2 comments on commit 05e37af

Owner

pellcorp replied Jun 18, 2013

so this is when using curl right?

what about native?

we should see if we can merge this back up stream as well as i normally just get latest code from their periodically

Contributor

Zleep-Dogg replied Jun 18, 2013

That is correct... I'll just add a couple of more descriptive comments on the lines...

And yes, we obviously should try to move it upstream - I just wanted to fix it here first to get my site-plugin to work...

Please sign in to comment.