Permalink
Browse files

Clean up RDoc files

  • Loading branch information...
drbrain committed Apr 2, 2011
1 parent 1cc57e1 commit 3f88eb71d47f948d4998384bbe301bcd68f3415a
Showing with 80 additions and 38 deletions.
  1. +5 −3 EXAMPLES.rdoc
  2. +71 −31 GUIDE.rdoc
  3. +4 −4 README.rdoc
View
@@ -47,6 +47,7 @@ is the same as { ... }.submit.
end
== File Upload
+
Upload a file to flickr.
require 'rubygems'
@@ -157,9 +158,10 @@ In most cases a client certificate is created as an additional layer of security
for certain websites. The specific case that this was initially tested on was
for automating the download of archived images from a banks (Wachovia) lockbox
system. Once the certificate is installed into your browser you will have to
-export it and split the certificate and private key into separate files. Exported
-files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for PKCS #12.
-You can convert them from p12 to pem format by using the following commands:
+export it and split the certificate and private key into separate files.
+Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for
+PKCS #12. You can convert them from p12 to pem format by using the following
+commands:
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
View
@@ -1,20 +1,26 @@
= Getting Started With Mechanize
+
This guide is meant to get you started using Mechanize. By the end of this
guide, you should be able to fetch pages, click links, fill out and submit
forms, scrape data, and many other hopefully useful things. This guide
really just scratches the surface of what is available, but should be enough
information to get you really going!
== Let's Fetch a Page!
+
First thing is first. Make sure that you've required mechanize and that you
instantiate a new mechanize object:
- require 'rubygems'
- require 'mechanize'
- agent = Mechanize.new
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = Mechanize.new
+
Now we'll use the agent we've created to fetch a page. Let's fetch google
with our mechanize agent:
- page = agent.get('http://google.com/')
+
+ page = agent.get('http://google.com/')
+
What just happened? We told mechanize to go pick up google's main page.
Mechanize stored any cookies that were set, and followed any redirects that
google may have sent. The agent gave us back a page that we can use to
@@ -23,101 +29,135 @@ scrape data, find links to click, or find forms to fill out.
Next, lets try finding some links to click.
== Finding Links
+
Mechanize returns a page object whenever you get a page, post, or submit a
form. When a page is fetched, the agent will parse the page and put a list
of links on the page object.
Now that we've fetched google's homepage, lets try listing all of the links:
- page.links.each do |link|
- puts link.text
- end
+
+ page.links.each do |link|
+ puts link.text
+ end
+
We can list the links, but Mechanize gives a few shortcuts to help us find a
link to click on. Lets say we wanted to click the link whose text is 'News'.
Normally, we would have to do this:
- page = agent.page.links.find { |l| l.text == 'News' }.click
+
+ page = agent.page.links.find { |l| l.text == 'News' }.click
+
But Mechanize gives us a shortcut. Instead we can say this:
- page = agent.page.link_with(:text => 'News').click
+
+ page = agent.page.link_with(:text => 'News').click
+
That shortcut says "find all links with the name 'News'". You're probably
thinking "there could be multiple links with that text!", and you would be
correct! If you use the plural form, you can access the list.
If you wanted to click on the second news link, you could do this:
- agent.page.links_with(:text => 'News')[1].click
+
+ agent.page.links_with(:text => 'News')[1].click
+
We can even find a link with a certain href like so:
- page.link_with(:href => '/something')
+
+ page.link_with(:href => '/something')
+
Or chain them together to find a link with certain text and certain href:
- page.link_with(:text => 'News', :href => '/something')
+
+ page.link_with(:text => 'News', :href => '/something')
These shortcuts that mechanize provides are available on any list that you
can fetch like frames, iframes, or forms. Now that we know how to find and
click links, lets try something more complicated like filling out a form.
== Filling Out Forms
+
Lets continue with our google example. Here's the code we have so far:
- require 'rubygems'
- require 'mechanize'
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = Mechanize.new
+ page = agent.get('http://google.com/')
- agent = Mechanize.new
- page = agent.get('http://google.com/')
If we pretty print the page, we can see that there is one form named 'f',
that has a couple buttons and a few fields:
- pp page
+
+ pp page
+
Now that we know the name of the form, lets fetch it off the page:
+
google_form = page.form('f')
+
Mechanize lets you access form input fields in a few different ways, but the
most convenient is that you can access input fields as accessors on the
object. So lets set the form field named 'q' on the form to 'ruby mechanize':
- google_form.q = 'ruby mechanize'
+
+ google_form.q = 'ruby mechanize'
+
To make sure that we set the value, lets pretty print the form, and you should
see a line similar to this:
- #<Mechanize::Field:0x1403488 @name="q", @value="ruby mechanize">
+
+ #<Mechanize::Field:0x1403488 @name="q", @value="ruby mechanize">
+
If you saw that the value of 'q' changed, you're on the right track! Now we
can submit the form and 'press' the submit button and print the results:
- page = agent.submit(google_form, google_form.buttons.first)
- pp page
+
+ page = agent.submit(google_form, google_form.buttons.first)
+ pp page
+
What we just did was equivalent to putting text in the search field and
clicking the 'Google Search' button. If we had submitted the form without
a button, it would be like typing in the text field and hitting the return
button.
Lets take a look at the code all together:
- require 'rubygems'
- require 'mechanize'
- agent = Mechanize.new
- page = agent.get('http://google.com/')
- google_form = page.form('f')
- google_form.q = 'ruby mechanize'
- page = agent.submit(google_form)
- pp page
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = Mechanize.new
+ page = agent.get('http://google.com/')
+ google_form = page.form('f')
+ google_form.q = 'ruby mechanize'
+ page = agent.submit(google_form)
+ pp page
Before we go on to screen scraping, lets take a look at forms a little more
in depth. Unless you want to skip ahead!
== Advanced Form Techniques
+
In this section, I want to touch on using the different types in input fields
possible with a form. Password and textarea fields can be treated just like
text input fields. Select fields are very similar to text fields, but they
have many options associated with them. If you select one option, mechanize
will deselect the other options (unless it is a multi select!).
For example, lets select an option on a list:
- form.field_with(:name => 'list').options[0].select
+
+ form.field_with(:name => 'list').options[0].select
Now lets take a look at checkboxes and radio buttons. To select a checkbox,
just check it like this:
- form.checkbox_with(:name => 'box').check
+
+ form.checkbox_with(:name => 'box').check
+
Radio buttons are very similar to checkboxes, but they know how to uncheck
other radio buttons of the same name. Just check a radio button like you
would a checkbox:
+
form.radiobuttons_with(:name => 'box')[1].check
+
Mechanize also makes file uploads easy! Just find the file upload field, and
tell it what file name you want to upload:
+
form.file_uploads.first.file_name = "somefile.jpg"
== Scraping Data
+
Mechanize uses nokogiri[http://nokogiri.org/] to parse
html. What does this mean for you? You can treat a mechanize page like
an nokogiri object. After you have used Mechanize to navigate to the page
that you need to scrape, then scrape it using nokogiri methods:
+
agent.get('http://someurl.com/').search(".//p[@class='posted']")
View
@@ -5,7 +5,7 @@
== DESCRIPTION
-The Mechanize library is used for automating interaction with websites.
+The Mechanize library is used for automating interaction with websites.
Mechanize automatically stores and sends cookies, follows redirects,
can follow links, and submit forms. Form fields can be populated and
submitted. Mechanize also keeps track of the sites that you have visited as
@@ -28,12 +28,12 @@ The bug tracker is available here:
== Examples
-If you are just starting, check out the GUIDE.
+If you are just starting, check out the GUIDE.
Also, check out the EXAMPLES file.
== Authors
-Copyright (c) 2005 by Michael Neumann (mneumann@ntecs.de)
+Copyright (c) 2005 by Michael Neumann (mneumann@ntecs.de)
Copyright (c) 2006-2010:
@@ -53,7 +53,7 @@ perl Mechanize which is available here[http://search.cpan.org/~petdance/WWW-Mech
Thank you to Michael Neumann for starting the Ruby version. Thanks to everyone
who's helped out in various ways. Finally, thank you to the people using this
library!
-
+
== License
This library is distributed under the GPL. Please see the LICENSE file.

0 comments on commit 3f88eb7

Please sign in to comment.