Permalink
Browse files

scrape now accepts string args

  • Loading branch information...
1 parent 975679b commit e2b58cfeec5611aba6c73fcae77d36ad4a20e77f @heyZeus committed Mar 7, 2009
Showing with 28 additions and 42 deletions.
  1. +11 −2 README.textile
  2. +13 −39 clj_web_crawler.clj
  3. +4 −1 test/main.clj
View
@@ -5,10 +5,18 @@ commons-client Java library.
h3. Usage
<pre><code>
-Prints the HTML of the clojure.org website
+; Prints the HTML of the clojure.org home page
+(println (scrape "http://www.clojure.org"))
+
+; Prints the HTML of the clojure.org/api page
+(println (scrape "http://www.clojure.org" "/api"))
+
+; scrape also accepts a client, a method and a body
+; in the body of scrape, you can examine various things like the status code
(let [clj-ws (client "http://www.clojure.org")
home (method "/")]
- (println (scrape clj-ws home)))
+ (println (scrape clj-ws home
+ (println (.getStatusCode home)))))
; If you don't care about the HTML from the query you should just call
@@ -32,6 +40,7 @@ Prints the HTML of the clojure.org website
(println (.getStatusCode home))))
</code></pre>
+
I've only implemented some basic functionality to make the commons-client
more "in line" with functional programming. There are lots of things that
could be added to this Clojure wrapper. I've just scratched the surface.
View
@@ -28,18 +28,21 @@
with the method. If the resulting page is a redirect the redirect page
will be returned. Also the optional body will be run against the
redirected page."
- [client method & body]
- `(send-method ~client ~method
- (let [location# (redirect-location ~method)]
- (if location#
- (do
- (let [redirect-method# (method location#)]
- (send-method ~client redirect-method#
- ~@body
- (response-str redirect-method#))))
+ ([server http-method & body]
+ `(let [s# (if (= String (class ~server)) (client ~server) ~server)
+ m# (if (= String (class ~http-method)) (method ~http-method) ~http-method)]
+ (send-method s# m#
+ (let [location# (redirect-location m#)]
+ (if location#
+ (do
+ (let [redirect-method# (method location#)]
+ (send-method s# redirect-method#
+ ~@body
+ (response-str redirect-method#))))
(do
~@body
- (response-str ~method))))))
+ (response-str m#)))))))
+ ([server] (scrape server "/")))
(defn client
"Creates a HttpClient for the given server."
@@ -102,32 +105,3 @@
(if-let [location (and header (.getValue header))]
location))))
-(comment
-
-; Prints the HTML of the clojure.org website
-(let [server (client "http://www.clojure.org")
- home (method "/")]
- (println (scrape server home)))
-
-; If you don't care about the HTML from the query you should just call
-; send-method. In this example you are posting the login form and need
-; to make sure a cookie is set to validate the login was successful.
-(let [server (client "http://www.example.com")
- login (method "/accounts/login" :post {:login "mr_cool" :password "clojurerox"})]
- (send-method server login)
- (if (assert-cookie-names server "username")
- (println "yeah, I'm in")
- (println "i can't remember my password again!")))
-
-; You can also pass in a body to the send-method macro to do something
-; like check the response status code. Note you can't check the response
-; code outside of the send-method call since all associated resources are
-; released at that point.
-(let [server (client "http://www.clojure.org")
- login (method "/")]
- (send-method server login
- (println (.getStatusCode login))))
-
-)
-
-
View
@@ -27,7 +27,10 @@
(deftest scrape
(let [html (wc/scrape clj-ws home)]
- (is (= (.contains html clj-home-page-text)))))
+ (is (.contains html clj-home-page-text)))
+ (is (.contains (wc/scrape "http://www.clojure.org") clj-home-page-text))
+ (is (.contains (wc/scrape "http://www.clojure.org" "/api") "API")))
+
; this test depends on a website that i don't have any control over,
; this test is fragile, but better than no test

0 comments on commit e2b58cf

Please sign in to comment.