Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Improve `httpcompression` #162

Closed
wants to merge 1 commit into from

3 participants

@alrra

Improvements:

  • correctly detects SDCH compression according to the specification
  • correctly overwrites the user agent (using the -A, not the -H option)
  • limits the connection and operation time (this is useful for preventing the function from hanging for hours due to slow networks or links going down)
  • shows errors
  • shows redirects

Tested on: Mac OS X v.10.8.2 and Ubuntu 12.04 LTS


Examples:

Input Output [old] Output [new]
adobe.com
adobe.com is not using any encoding
adobe.com 
 ↳ http://www.adobe.com/ [gzip]
"http://www.google.com/s?q=alrra&output=search"
http://www.google.com/s?q=alrra&output=search is encoded using gzip
http://www.google.com/s?q=alrra&output=search [sdch,gzip]
firefox.com
             
firefox.com is encoded using gzip
Content-Encoding: gzip
Content-Encoding: gzip
Content-Encoding: gzip
             
firefox.com 
 ↳ http://www.firefox.com/ 
   ↳ http://www.mozilla.org/firefox/ 
     ↳ http://www.mozilla.org/en-US/firefox/new/ [gzip]
         
google.com/+
             
google.com/+ is encoded using gzip
             
google.com/+ 
 ↳ http://www.google.com/+ 
   ↳ https://plus.google.com/ [gzip]
     ↳ https://accounts.google.com/ServiceLogin?service=oz&continue=https://plus.google.com/?gpsrc%3Dgplp0&hl=ro [gzip]
        
sadsada.dsa
sadsada.dsa is not using any encoding
curl: (6) Could not resolve host: sadsada.dsa; nodename nor servname provided, or not known
@porada

Great piece of code. Haven’t you considered releasing this as a separate tool?

@mathiasbynens

Incredibly nice work, @alrra. I too feel like this belongs to its own file now, e.g. ~/bin/httpcompression. What do you think?

@alrra

I too feel like this belongs to its own file now, e.g. ~/bin/httpcompression. What do you think?

@mathiasbynens Done, changed the pull request. Also, feel free to make any additional / required changes.

Haven’t you considered releasing this as a separate tool?

@porada Are you referring to the same thing as @mathiasbynens ?

@alrra alrra Improve `httpcompression`
- correctly detect `SDCH` compression, see:
  http://www.blogs.zeenor.com/wp-content/uploads/2011/01/Shared_Dictionary_Compression_over_HTTP.pdf
- correctly overwrite the user agent (use the -A, not the -H option)
- limit the connection and operation time (useful for preventing curl
  from hanging for hours due to slow networks or links going down)
- show errors
- show redirects
cf1022f
@mathiasbynens

Finally merged; thanks!

@jtyost2 jtyost2 referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@arekwiertlewski arekwiertlewski referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@mrkd mrkd referenced this pull request from a commit in mrkd/dotfiles
@alrra alrra .functions: Improve `httpcompression` and move it to its own file
Closes #162.
cefafc4
@thorsten thorsten referenced this pull request from a commit in thorsten/dotfiles
@alrra alrra .functions: Improve `httpcompression` and move it to its own file
Closes #162.
11e056e
@dmcass dmcass referenced this pull request from a commit in dmcass/windows-dotfiles
@alrra alrra .functions: Improve `httpcompression` and move it to its own file
Closes #162.
5ee928b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 9, 2013
  1. @alrra

    Improve `httpcompression`

    alrra authored
    - correctly detect `SDCH` compression, see:
      http://www.blogs.zeenor.com/wp-content/uploads/2011/01/Shared_Dictionary_Compression_over_HTTP.pdf
    - correctly overwrite the user agent (use the -A, not the -H option)
    - limit the connection and operation time (useful for preventing curl
      from hanging for hours due to slow networks or links going down)
    - show errors
    - show redirects
This page is out of date. Refresh to see the latest.
Showing with 128 additions and 7 deletions.
  1. +1 −7 .functions
  2. +127 −0 bin/httpcompression
View
8 .functions
@@ -79,12 +79,6 @@ function gz() {
printf "gzip: %d bytes (%2.2f%%)\n" "$gzipsize" "$ratio"
}
-# Test if HTTP compression (RFC 2616 + SDCH) is enabled for a given URL.
-# Send a fake UA string for sites that sniff it instead of using the Accept-Encoding header. (Looking at you, ajax.googleapis.com!)
-function httpcompression() {
- encoding="$(curl -LIs -H 'User-Agent: Mozilla/5 Gecko' -H 'Accept-Encoding: gzip,deflate,compress,sdch' "$1" | grep '^Content-Encoding:')" && echo "$1 is encoded using ${encoding#* }" || echo "$1 is not using any encoding"
-}
-
# Syntax-highlight JSON strings or files
# Usage: `json '{"foo":42}'` or `echo '{"foo":42}' | json`
function json() {
@@ -162,4 +156,4 @@ function unquarantine() {
for attribute in com.apple.metadata:kMDItemDownloadedDate com.apple.metadata:kMDItemWhereFroms com.apple.quarantine; do
xattr -r -d "$attribute" "$@"
done
-}
+}
View
127 bin/httpcompression
@@ -0,0 +1,127 @@
+#!/bin/bash
+
+# Test if HTTP compression (RFC 2616 + SDCH) is enabled for a given URL
+
+declare -r hUA="Mozilla/5.0 Gecko"
+declare -r hAE="Accept-Encoding: gzip, deflate, sdch"
+declare -r maxConTime=15
+declare -r maxTime=30
+
+declare availDicts="" dict="" dictClientID="" dicts="" headers="" i="" \
+ indent="" url="" encoding="" urlHeaders=""
+
+headers="$( curl --connect-timeout $maxConTime \
+ -A "$hUA" `# Send a fake UA string for sites
+ # that sniff it instead of using
+ # the Accept-Encoding header` \
+ -D - `# Get response headers` \
+ -H "$hAE" \
+ -L `# If the page was moved to a different
+ # location, redo the request` \
+ -m $maxTime \
+ -s `# Don\'t show the progress meter` \
+ -S `# Show error messages` \
+ -o /dev/null `# Ignore content` \
+ "$1" )" \
+&& ( \
+
+ url="$1"
+
+ # Iterate over the headers of all redirects
+ while [ -n "$headers" ]; do
+
+ # Get headers for the "current" URL
+ urlHeaders="$( printf "%s" "$headers" |
+ sed -n '1,/^HTTP/p' )"
+
+ # Remove the headers for the "current" URL
+ headers="${headers/"$urlHeaders"/}"
+
+ # ----------------------------------------------------------------------
+ # | SDCH |
+ # ----------------------------------------------------------------------
+
+ # SDCH Specification:
+ # - www.blogs.zeenor.com/wp-content/uploads/2011/01/Shared_Dictionary_Compression_over_HTTP.pdf
+
+ # Check if the server advertised any dictionaries
+ dicts="$( printf "%s" "$urlHeaders" |
+ grep -i 'Get-Dictionary:' |
+ cut -d':' -f2 |
+ sed s/,/\ /g )"
+
+ if [ -n "$dicts" ]; then
+
+ availDicts=""
+ dict=""
+
+ for i in $dicts; do
+
+ # Check If the dictionary location is specified as a path,
+ # and if so, construct it's URL from the host name of the
+ # referrer URL
+ [[ "$i" != http* ]] \
+ && dict="$(printf "$url" |
+ sed -En 's/([^/]*\/\/)?([^/]*)\/?.*/\1\2/p')"
+
+ dict="$dict$i"
+
+ # Request the dictionaries from the server and
+ # construct the `Avail-Dictionary` header value
+ #
+ # [ The user agent identifier for a dictionary is defined
+ # as the URL-safe base64 encoding (as described in RFC
+ # 3548, section 4 [RFC3548]) of the first 48 bits (bits
+ # 0..47) of the dictionary's SHA-256 digest ]
+ #
+ dictClientID="$( curl --connect-timeout $maxConTime \
+ -A "$hUA" -LsS -m $maxTime "$dict" |
+ openssl dgst -sha256 -binary |
+ openssl base64 |
+ cut -c 1-8 |
+ sed -e 's/\+/-/' -e 's/\//_/' )"
+
+ [ -n $availDicts ] && availDicts="$adics,$dictClientID" \
+ || availDicts="$dictClientID"
+
+ done
+
+ # Redo the request (advertising the available dictionaries)
+ # and replace the old resulted headers with the new ones
+ urlHeaders="$( curl --connect-timeout $maxConTime \
+ -A "$hUA" -D - -H "$hAE" \
+ -H "Avail-Dictionary: $availDicts" \
+ -m $maxTime -o /dev/null -sS "$1" )"
+ fi
+
+ # ----------------------------------------------------------------------
+
+ # Get the content encoding header values
+ encoding="$( printf "%s" "$urlHeaders" |
+ grep -i 'Content-Encoding:' |
+ cut -d' ' -f2 |
+ tr "\r" "," |
+ tr -d "\n" |
+ sed 's/,$//' )"
+
+ [ -n "$encoding" ] && encoding="[$encoding]"
+
+ # Print the output for the "current" URL
+ if [ "$url" != "$1" ]; then
+ printf "%s\n" "$indent$url $encoding"
+ indent=" "$indent
+ else
+ printf "\n%s\n" " $1 $encoding"
+ indent=""
+ fi
+
+ # Get the next URL value
+ url="$( printf "%s" "$urlHeaders" |
+ grep -i 'Location' |
+ sed -e 's/Location://' |
+ tr -d '\r' )"
+
+ done
+ printf "\n"
+
+) || printf ""
Something went wrong with that request. Please try again.