Skip to content

Commit

Permalink
added the crawl-response function
Browse files Browse the repository at this point in the history
  • Loading branch information
heyZeus committed Mar 8, 2009
1 parent f14b187 commit 7101a86
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 50 deletions.
49 changes: 21 additions & 28 deletions README.textile
Original file line number Diff line number Diff line change
Expand Up @@ -5,45 +5,38 @@ commons-client Java library.
h3. Usage

<pre><code>
; Prints the HTML of the clojure.org home page
(println (scrape "http://www.clojure.org"))

; Prints the HTML of the clojure.org/api page
(println (scrape "http://www.clojure.org" "/api"))

; scrape also accepts a client, a method and a body
; in the body of scrape, you can examine various things like the status code
; The crawl macro also accepts a client, a method and a body.
; In the body of crawl, you can examine various things like
; the status code, the response, etc.
(let [clj-ws (client "http://www.clojure.org")
home (method "/")]
(println (scrape clj-ws home
(println (.getStatusCode home)))))

(crawl clj-ws home
(println (.getStatusCode home))
(println (response-str home))))

; If you don't care about the HTML from the query you should just call
; send-method. In this example you are posting the login form and need
; to make sure a cookie is set to validate the login was successful.
; If you need to login to a website you can do that too and
; verify any cookies are set to validate the login
(let [site (client "http://www.example.com")
login (method "/accounts/login" :post {:login "doctor_no" :password "clojurerox"})]
(send-method site login)
(if (assert-cookie-names site "username")
(println "yeah, I'm in")
(println "I can't remember my password again!")))
login (method "/accounts/login" :post {:login "doctor_no" :password "clojurerox"})]
(crawl site login
(if (assert-cookie-names site "username" "logged-time")
(println "yeah, I'm in")
(println "I can't remember my password again!"))))

; the crawl-response function is for quick and dirty things and
; just returns the response as a string from the server
(println (crawl-response "http://www.clojure.org"))

; prints the HTML of the clojure.org/api page
(println (crawl-response "http://www.clojure.org" "/api"))

; You can also pass in a body to the send-method macro to do something
; like check the response status code. Note you can't check the response
; code outside of the send-method call since all associated resources have
; been released.
(let [clj-ws (client "http://www.clojure.org")
home (method "/")]
(send-method clj-ws home
(println (.getStatusCode home))))
</code></pre>


I've only implemented some basic functionality to make the commons-client
more "in line" with functional programming. There are lots of things that
could be added to this Clojure wrapper. I've just scratched the surface.
more in line with functional programming. There are lots of things that
could be added to this Clojure script. I've just scratched the surface.

Since I'm calling commons-client under the covers I've using the underlying
naming conventions. The client function returns a
Expand Down
42 changes: 21 additions & 21 deletions clj_web_crawler.clj
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,6 @@
TraceMethod HeadMethod PutMethod))
(:use [clojure.contrib.duck-streams :only (slurp*)]))

(defmacro send-method
"Sends a request to the given method and client. The reponse from the server is
stored in the method and any cookies are stored in the client. The response and
any resources associated with the request are cleared from the method after this
function is called."
[client method & body]
`(try
(.executeMethod ~client ~method)
~@body
(finally (.releaseConnection ~method))))

(defn response-str
"Returns the response from the method as a string."
[method]
Expand Down Expand Up @@ -46,23 +35,34 @@
([path type] (method path type nil))
([path] (method path nil nil)))

(defmacro crawl
"Returns the HTML as a string. It will free up any resources associated
with the method. If the resulting page is a redirect the redirect page
will be returned. Also the optional body will be run against the
redirected page."
([server method & body]
`(send-method ~server ~method
(do ~@body)))
([server] (crawl server (method "/"))))

(defn client
"Creates a HttpClient for the given server."
[host]
(let [c (HttpClient.)]
(.. c (getHostConfiguration) (setHost (URI. host true)))
c))

(defmacro crawl
"Sends an HTTP request to the server. Pass in a body to examine
the status code, response, etc. All resource associated with
the method will be freed up at the end of the macro."
([#^org.apache.commons.httpclient.HttpClient server
#^org.apache.commons.httpclient.HttpMethodBase method & body]
`(try
(.executeMethod ~server ~method)
~@body
(finally (.releaseConnection ~method))))
([server] (crawl server (method "/"))))

(defn crawl-response
"Returns the response as a string. Sends a GET request to the server."
([#^String server #^String http-method]
(let [c (client server)
m (method http-method)]
(crawl c m
(response-str m))))
([#^String server] (crawl-response server "/")))

(defn cookies
"Convience function to get the cookies from the client."
[client]
Expand Down
5 changes: 4 additions & 1 deletion test/main.clj
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,14 @@
(wc/crawl clj-ws api
(is (.contains (wc/response-str api) "API")))))

(deftest crawl-response
(is (.contains (wc/crawl-response "http://www.clojure.org") clj-home-page-text))
(is (.contains (wc/crawl-response "http://www.clojure.org" "/") clj-home-page-text)))

; this test depends on a website that i don't have any control over,
; this test is fragile, but better than no test
(deftest cookie-names
(wc/send-method clj-ws home)
(wc/crawl clj-ws home)
;(println "Cookies from clojure.org ")
;(wc/print-cookies clj-ws)
(is (wc/assert-cookie-names clj-ws "test" "master"))
Expand Down

0 comments on commit 7101a86

Please sign in to comment.