Permalink
Browse files

adding the guide

git-svn-id: svn+ssh://rubyforge.org/var/svn/mechanize/trunk@302 f1cf478b-080f-0410-abad-959bfeec9ea8
  • Loading branch information...
1 parent 36eb903 commit 1c07f21f61310bd54b2248c6c76e09abfac12c73 aaronp committed Sep 6, 2006
Showing with 151 additions and 19 deletions.
  1. +125 −0 GUIDE
  2. +3 −2 README
  3. +4 −2 Rakefile
  4. +3 −3 lib/mechanize/form_elements.rb
  5. +4 −0 lib/mechanize/page.rb
  6. +8 −8 test/tc_checkboxes.rb
  7. +4 −4 test/tc_radiobutton.rb
View
@@ -0,0 +1,125 @@
+= Getting Started With WWW::Mechanize
+This guide is meant to get you started using Mechanize. By the end of this
+guide, you should be able to fetch pages, click links, fill out and submit
+forms, scrape data, and many other hopefully useful things. This guide
+really just scratches the surface of what is available, but should be enough
+information to get you really going!
+
+== Let's Fetch a Page!
+First thing is first. Make sure that you've required mechanize and that you
+instantiate a new mechanize object:
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = WWW::Mechanize.new
+Now we'll use the agent we've created to fetch a page. Let's fetch google
+with our mechanize agent:
+ page = agent.get('http://google.com/')
+What just happened? We told mechanize to go pick up google's main page.
+Mechanize stored any cookies that were set, and followed any redirects that
+google may have sent. The agent gave us back a page that we can use to
+scrape data, find links to click, or find forms to fill out.
+
+Next, lets try finding some links to click.
+
+== Finding Links
+Mechanize returns a page object whenever you get a page, post, or submit a
+form. When a page is fetched, the agent will parse the page and put a list
+of links on the page object.
+
+Now that we've fetched google's homepage, lets try listing all of the links:
+ page.links.each do |link|
+ puts link.text
+ end
+We can list the links, but Mechanize gives a few shortcuts to help us find a
+link to click on. Lets say we wanted to click the link whose text is 'News'.
+Normally, we would have to do this:
+ page = agent.click page.links.find { |l| l.name == 'News' }
+But Mechanize gives us a shortcut. Instead we can say this:
+ page = agent.click page.links.name('News')
+That shortcut says "find all links with the name 'News'". You're probably
+thinking "there could be multiple links with that text!", and you would be
+correct! If you pass a list of links to the "click" method, Mechanize will
+click on the first one. If you wanted to click on the second news link, you
+could do this:
+ agent.click page.links.name('News')[1]
+We can even find a link with a certain href like so:
+ page.links.href('/something')
+Or chain them together to find a link with certain text and certain href:
+ page.links.name('News').href('/something')
+
+These shortcuts that mechanize provides are available on any list that you
+can fetch like frames, iframes, or forms. Now that we know how to find and
+click links, lets try something more complicated like filling out a form.
+
+== Filling Out Forms
+Lets continue with our google example. Here's the code we have so far:
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = WWW::Mechanize.new
+ page = agent.get('http://google.com/')
+If we pretty print the page, we can see that there is one form named 'f',
+that has a couple buttons and a few fields:
+ pp page
+Now that we know the name of the form, lets fetch it off the page:
+ google_form = page.form('f')
+Mechanize lets you access form input fields in a few different ways, but the
+most convenient is that you can access input fields as accessors on the
+object. So lets set the form field named 'q' on the form to 'ruby mechanize':
+ google_form.q = 'ruby mechanize'
+To make sure that we set the value, lets pretty print the form, and you should
+see a line similar to this:
+ #<WWW::Mechanize::Field:0x1403488 @name="q", @value="ruby mechanize">
+If you saw that the value of 'q' changed, you're on the right track! Now we
+can submit the form and 'press' the submit button and print the results:
+ page = agent.submit(google_form, google_form.buttons.first)
+ pp page
+What we just did was equivalent to putting text in the search field and
+clicking the 'Google Search' button. If we had submitted the form without
+a button, it would be like typing in the text field and hitting the return
+button.
+
+Lets take a look at the code all together:
+ require 'rubygems'
+ require 'mechanize'
+
+ agent = WWW::Mechanize.new
+ page = agent.get('http://google.com/')
+ google_form = page.form('f')
+ google_form.q = 'ruby mechanize'
+ page = agent.submit(google_form)
+ pp page
+
+Before we go on to screen scraping, lets take a look at forms a little more
+in depth. Unless you want to skip ahead!
+
+== Advanced Form Techniques
+In this section, I want to touch on using the different types in input fields
+possible with a form. Password and textarea fields can be treated just like
+text input fields. Select fields are very similar to text fields, but they
+have many options associated with them. If you select one option, mechanize
+will deselect the other options (unless it is a multi select!).
+
+For example, lets select an option on a list:
+ form.fields.name('list').options[0].select
+
+Now lets take a look at checkboxes and radio buttons. To select a checkbox,
+just check it like this:
+ form.checkboxes.name('box').check
+Radio buttons are very similar to checkboxes, but they know how to uncheck
+other radio buttons of the same name. Just check a radio button like you
+would a checkbox:
+ form.radiobuttons.name('box')[1].check
+Mechanize also makes file uploads easy! Just find the file upload field, and
+tell it what file name you want to upload:
+ form.file_uploads.file_name = "somefile.jpg"
+
+== Scraping Data
+Mechanize uses hpricot[http://code.whytheluckystiff.net/hpricot/] to parse
+html. What does this mean for you? You can treat a mechanize page like
+an hpricot object. After you have used Mechanize to navigate to the page
+that you need to scrape, then scrape it using hpricot methods:
+ agent.get('http://someurl.com/').search("//p[@class='posted']")
+For more information on this powerful scraper, take a look at
+HpricotBasics[http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics]
View
@@ -7,14 +7,15 @@ submitted. A history of URL's is maintained and can be queried.
== Dependencies
* ruby 1.8.2
+* hpricot[http://code.whytheluckystiff.net/hpricot/]
Note that the files in the net-overrides/ directory are taken from Ruby 1.9.0.
-* ruby-web 1.1.0 (http://rubyforge.org/projects/ruby-web/)
== Examples
-See the EXAMPLES[link://files/EXAMPLES.html] file
+If you are just starting, check out the GUIDE[link://files/GUIDE.html].
+Also, check out the EXAMPLES[link://files/EXAMPLES.html] file.
== Authors
View
@@ -25,7 +25,8 @@ spec = Gem::Specification.new do |s|
s.files = Dir.glob("{bin,test,lib,doc}/**/*").delete_if {|item| item.include?(".svn") }
s.require_path = "lib"
s.has_rdoc = true
- s.extra_rdoc_files = ["README", "EXAMPLES", "CHANGELOG", "LICENSE", "NOTES"]
+ s.extra_rdoc_files = ["README", "EXAMPLES", "CHANGELOG", "LICENSE", "NOTES",
+ "GUIDE"]
s.rdoc_options << "--main" << 'README' << "--title" << "'WWW::Mechanize RDoc'"
s.rubyforge_project = PKG_NAME
s.add_dependency('hpricot')
@@ -40,7 +41,8 @@ end
Rake::RDocTask.new do |p|
p.main = "README"
p.rdoc_dir = "doc"
- p.rdoc_files.include("README", "CHANGELOG", "LICENSE", "EXAMPLES", "NOTES", "lib/**/*.rb")
+ p.rdoc_files.include("README", "CHANGELOG", "LICENSE", "EXAMPLES", "NOTES",
+ "GUIDE", "lib/**/*.rb")
p.options << "--main" << 'README' << "--title" << "WWW::Mechanize RDoc"
end
@@ -71,12 +71,12 @@ def initialize(name, value, checked, form)
super(name, value)
end
- def tick
+ def check
uncheck_peers
@checked = true
end
- def untick
+ def uncheck
@checked = false
end
@@ -88,7 +88,7 @@ def click
def uncheck_peers
@form.radiobuttons.name(name).each do |b|
next if b.value == value
- b.untick
+ b.uncheck
end
end
end
@@ -44,6 +44,10 @@ def search(*args)
@root.search(*args)
end
+ def at(*args)
+ @root.at(*args)
+ end
+
alias :/ :search
def watch_for_set=(obj)
View
@@ -15,7 +15,7 @@ def setup
def test_select_one
form = @page.forms.first
- form.checkboxes.name('green').tick
+ form.checkboxes.name('green').check
assert_equal(true, form.checkboxes.name('green').checked)
assert_equal(false, form.checkboxes.name('red').checked)
assert_equal(false, form.checkboxes.name('blue').checked)
@@ -26,7 +26,7 @@ def test_select_one
def test_select_all
form = @page.forms.first
form.checkboxes.each do |b|
- b.tick
+ b.check
end
form.checkboxes.each do |b|
assert_equal(true, b.checked)
@@ -36,29 +36,29 @@ def test_select_all
def test_select_none
form = @page.forms.first
form.checkboxes.each do |b|
- b.untick
+ b.uncheck
end
form.checkboxes.each do |b|
assert_equal(false, b.checked)
end
end
- def test_tick_one
+ def test_check_one
form = @page.forms.first
assert_equal(2, form.checkboxes.name('green').length)
- form.checkboxes.name('green')[1].tick
+ form.checkboxes.name('green')[1].check
assert_equal(false, form.checkboxes.name('green')[0].checked)
assert_equal(true, form.checkboxes.name('green')[1].checked)
page = @agent.submit(form)
assert_equal(1, page.links.length)
assert_equal('green:on', page.links.first.text)
end
- def test_tick_two
+ def test_check_two
form = @page.forms.first
assert_equal(2, form.checkboxes.name('green').length)
- form.checkboxes.name('green')[0].tick
- form.checkboxes.name('green')[1].tick
+ form.checkboxes.name('green')[0].check
+ form.checkboxes.name('green')[1].check
assert_equal(true, form.checkboxes.name('green')[0].checked)
assert_equal(true, form.checkboxes.name('green')[1].checked)
page = @agent.submit(form)
@@ -16,7 +16,7 @@ def setup
def test_select_one
form = @page.forms.first
button = form.radiobuttons.name('color')
- form.radiobuttons.name('color').value('green').tick
+ form.radiobuttons.name('color').value('green').check
assert_equal(true, button.value('green').checked)
assert_equal(false, button.value('red').checked)
assert_equal(false, button.value('blue').checked)
@@ -28,9 +28,9 @@ def test_select_all
form = @page.forms.first
button = form.radiobuttons.name('color')
form.radiobuttons.name('color').each do |b|
- b.tick
+ b.check
end
- form.radiobuttons.name('color').value('green').tick
+ form.radiobuttons.name('color').value('green').check
assert_equal(true, button.value('green').checked)
assert_equal(false, button.value('red').checked)
assert_equal(false, button.value('blue').checked)
@@ -42,7 +42,7 @@ def test_unselect_all
form = @page.forms.first
button = form.radiobuttons.name('color')
form.radiobuttons.name('color').each do |b|
- b.untick
+ b.uncheck
end
assert_equal(false, button.value('green').checked)
assert_equal(false, button.value('red').checked)

0 comments on commit 1c07f21

Please sign in to comment.