Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A wrapper around Apache commons-client for the Clojure programming language.
Clojure
tree: 3970f3136b

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
test
MIT-license.txt
README.textile
clj_web_crawler.clj

README.textile

clj-web-crawler allows you to easily crawl and scrape the web from the comfort of
your own computer. This Clojure script is a wrapper around the Apache
commons-client Java library.

Usage


Prints the HTML of the clojure.org website
(let [clj-ws (client "http://www.clojure.org")
      home  (method "/")] 
   (println (scrape clj-ws home)))  


; If you don't care about the HTML from the query you should just call
; send-method. In this example you are posting the login form and need 
; to make sure a cookie is set to validate the login was successful.
(let [site (client "http://www.example.com")
      login  (method "/accounts/login" :post {:login "doctor_no" :password "clojurerox"})] 
  (send-method site login) 
  (if (assert-cookie-names site "username")  
    (println "yeah, I'm in")
    (println "I can't remember my password again!")))


; You can also pass in a body to the send-method macro to do something
; like check the response status code. Note you can't check the response
; code outside of the send-method call since all associated resources have 
; been released.
(let [clj-ws (client "http://www.clojure.org")
      home  (method "/")]
  (send-method clj-ws home 
    (println (.getStatusCode home))))

I’ve only implemented some basic functionality to make the commons-client
more “in line” with functional programming. There are lots of things that
could be added to this Clojure wrapper. I’ve just scratched the surface.

Since I’m calling commons-client under the covers I’ve using the underlying
naming conventions. The client function returns a
org.apache.commons.httpclient.HttpClient and the method function returns
an implementation of org.apache.commons.httpclient.HttpMethodBase. Please
see the commons-client API docs for reference.

Requirements

- clojure-contrib
- Apache commons-client 3.1 and its dependencies (only tested with the 3.1 version)

Any comments, suggestions, improvements are welcome!

Something went wrong with that request. Please try again.