Fetching contributors…
Cannot retrieve contributors at this time
172 lines (130 sloc) 4.7 KB

WWW::Mechanize examples


require 'rubygems'
require 'mechanize'

a = { |agent|
  agent.user_agent_alias = 'Mac Safari'

a.get('') do |page|
  search_result = page.form_with(:name => 'f') do |search|
    search.q = 'Hello world'

  search_result.links.each do |link|
    puts link.text


a =
a.get('') do |page|
  # Click the login link
  login_page = In/))

  # Submit the login form
  my_page = login_page.form_with(:action => '/account/login.php') do |f|
    f.form_loginname  = ARGV[0]
    f.form_pw         = ARGV[1]

  my_page.links.each do |link|
    text = link.text.strip
    next unless text.length > 0
    puts text

File Upload

Upload a file to flickr.

a = { |agent|
  # Flickr refreshes after login
  agent.follow_meta_refresh = true

a.get('') do |home_page|
  signin_page = In/))

  my_page = signin_page.form_with(:name => 'login_form') do |form|
    form.login  = ARGV[0]
    form.passwd = ARGV[1]

  # Click the upload link
  upload_page =

  # We want the basic upload page.
  upload_page = Uploader/))

  # Upload the file
  upload_page.form_with(:method => 'POST') do |upload_form|
    upload_form.file_uploads.first.file_name = ARGV[2]

Pluggable Parsers

Lets say you want html pages to automatically be parsed with Rubyful Soup. This example shows you how:

require 'rubygems'
require 'mechanize'
require 'rubyful_soup'

class SoupParser < WWW::Mechanize::Page
  attr_reader :soup
  def initialize(uri = nil, response = nil, body = nil, code = nil)
    @soup =
    super(uri, response, body, code)

agent =
agent.pluggable_parser.html = SoupParser

Now all HTML pages will be parsed with the SoupParser class, and automatically give you access to a method called 'soup' where you can get access to the Beautiful Soup for that page.

Using a proxy

require 'rubygems'
require 'mechanize'

agent =
agent.set_proxy('localhost', '8000')
page = agent.get(ARGV[0])
puts page.body

The transact method

transact runs the given block and then resets the page history. I.e. after the block has been executed, you're back at the original page; no need count how many times to call the back method at the end of a loop (while accounting for possible exceptions).

This example also demonstrates subclassing Mechanize.

require 'mechanize'

class TestMech < WWW::Mechanize
  def process
    get ''
    search_form = page.forms.first
    search_form.words = 'WWW'
    submit search_form

    page.links_with(:href => %r{/projects/} ).each do |link|
      next if link.href =~ %r{/projects/support/}

      puts 'Loading %-30s %s' % [link.href, link.text]
        transact do
          click link
          # Do stuff, maybe click more links.
        # Now we're back at the original page.

      rescue => e
        $stderr.puts "#{e.class}: #{e.message}"

Client Certificate Authentication (Mutual Auth)

In most cases a client certificate is created as an additional layer of security for certain websites. The specific case that this was initially tested on was for automating the download of archived images from a banks (Wachovia) lockbox system. Once the certificate is installed into your browser you will have to export it and split the certificate and private key into separate files. Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for PKCS #12. You can convert them from p12 to pem format by using the following commands:

openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys

require 'rubygems'
require 'mechanize'

# create Mechanize instance
agent =

# set the path of the certificate file
agent.cert = 'example.cer'

# set the path of the private key file
agent.key = 'example.key'

# get the login form & fill it out with the username/password
login_form = @agent.get("").form('Login')
login_form.Userid = 'TestUser'
login_form.Password = 'TestPassword'

# submit login form
agent.submit(login_form, login_form.buttons.first)