Skip to content
This repository
Browse code

Document that Mechanize::Page#search accepts an XPath or CSS expressi…

…on. Fixes #199
  • Loading branch information...
commit 30eb16188f0d7535c872009b32e65139fe1dab64 1 parent 05be267
Eric Hodel drbrain authored
4 CHANGELOG.rdoc
Source Rendered
@@ -22,7 +22,9 @@
22 22 filename.
23 23 * Added Mechanize::DirectorySaver which saves responses in a single
24 24 directory. Issue #187 by yoshie902a.
25   - * Added Link#noreferrer?.
  25 + * Added Mechanize::Page::Link#noreferrer?
  26 + * The documentation for Mechanize::Page#search and #at now show that both
  27 + XPath and CSS expressions are allowed. Issue #199 by Shane Becker.
26 28
27 29 * Bug fixes
28 30 * Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue #198
41 EXAMPLES.rdoc
Source Rendered
... ... @@ -1,8 +1,9 @@
1 1 = Mechanize examples
2 2
3 3 Note: Several examples show methods chained to the end of do/end blocks.
4   -Do...end is the same as curly braces ({...}). For example, do ... end.submit
5   -is the same as { ... }.submit.
  4 +<code>do...end</code> is the same as curly braces (<code>{...}</code>). For
  5 +example, <code>do ... end.submit</code> is the same as <code>{ ...
  6 +}.submit</code>.
6 7
7 8 == Google
8 9
@@ -81,7 +82,8 @@ Upload a file to flickr.
81 82 end
82 83
83 84 == Pluggable Parsers
84   -Lets say you want html pages to automatically be parsed with Rubyful Soup.
  85 +
  86 +Lets say you want HTML pages to automatically be parsed with Rubyful Soup.
85 87 This example shows you how:
86 88
87 89 require 'rubygems'
@@ -115,10 +117,10 @@ Beautiful Soup for that page.
115 117
116 118 == The transact method
117 119
118   -transact runs the given block and then resets the page history. I.e. after the
119   -block has been executed, you're back at the original page; no need count how
120   -many times to call the back method at the end of a loop (while accounting for
121   -possible exceptions).
  120 +Mechanize#transact runs the given block and then resets the page history. I.e.
  121 +after the block has been executed, you're back at the original page; no need
  122 +count how many times to call the back method at the end of a loop (while
  123 +accounting for possible exceptions).
122 124
123 125 This example also demonstrates subclassing Mechanize.
124 126
@@ -154,17 +156,12 @@ This example also demonstrates subclassing Mechanize.
154 156
155 157 == Client Certificate Authentication (Mutual Auth)
156 158
157   -In most cases a client certificate is created as an additional layer of security
158   -for certain websites. The specific case that this was initially tested on was
159   -for automating the download of archived images from a banks (Wachovia) lockbox
160   -system. Once the certificate is installed into your browser you will have to
161   -export it and split the certificate and private key into separate files.
162   -Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for
163   -PKCS #12. You can convert them from p12 to pem format by using the following
164   -commands:
165   -
166   -openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
167   -openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
  159 +In most cases a client certificate is created as an additional layer of
  160 +security for certain websites. The specific case that this was initially
  161 +tested on was for automating the download of archived images from a banks
  162 +(Wachovia) lockbox system. Once the certificate is installed into your
  163 +browser you will have to export it and split the certificate and private key
  164 +into separate files.
168 165
169 166 require 'rubygems'
170 167 require 'mechanize'
@@ -185,3 +182,11 @@ openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
185 182
186 183 # submit login form
187 184 agent.submit(login_form, login_form.buttons.first)
  185 +
  186 +Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands
  187 +for PKCS #12. You can convert them from p12 to pem format by using the
  188 +following commands:
  189 +
  190 + openssl pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
  191 + openssl pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
  192 +
15 GUIDE.rdoc
Source Rendered
@@ -130,7 +130,7 @@ In this section, I want to touch on using the different types in input fields
130 130 possible with a form. Password and textarea fields can be treated just like
131 131 text input fields. Select fields are very similar to text fields, but they
132 132 have many options associated with them. If you select one option, mechanize
133   -will deselect the other options (unless it is a multi select!).
  133 +will de-select the other options (unless it is a multi select!).
134 134
135 135 For example, lets select an option on a list:
136 136
@@ -154,10 +154,15 @@ tell it what file name you want to upload:
154 154
155 155 == Scraping Data
156 156
157   -Mechanize uses nokogiri[http://nokogiri.org/] to parse
158   -html. What does this mean for you? You can treat a mechanize page like
159   -an nokogiri object. After you have used Mechanize to navigate to the page
160   -that you need to scrape, then scrape it using nokogiri methods:
  157 +Mechanize uses nokogiri[http://nokogiri.org/] to parse HTML. What does this
  158 +mean for you? You can treat a mechanize page like an nokogiri object. After
  159 +you have used Mechanize to navigate to the page that you need to scrape, then
  160 +scrape it using nokogiri methods:
  161 +
  162 + agent.get('http://someurl.com/').search("p.posted")
  163 +
  164 +The expression given to Mechanize::Page#search may be a CSS expression or an
  165 +XPath expression:
161 166
162 167 agent.get('http://someurl.com/').search(".//p[@class='posted']")
163 168
21 lib/mechanize/page.rb
@@ -186,9 +186,26 @@ def content_type
186 186 @meta_content_type || response['content-type']
187 187 end
188 188
189   - # Search through the page like HPricot
  189 + ##
  190 + # :method: search
  191 + #
  192 + # Search for +paths+ in the page using Nokogiri's #search. The +paths+ can
  193 + # be XPath or CSS and an optional Hash of namespaces may be appended.
  194 + #
  195 + # See Nokogiri::XML::Node#search for further details.
  196 +
190 197 def_delegator :parser, :search, :search
191   - def_delegator :parser, :/, :/
  198 +
  199 + alias / search
  200 +
  201 + ##
  202 + # :method: at
  203 + #
  204 + # Search through the page for +path+ under +namespace+ using Nokogiri's #at.
  205 + # The +path+ may be either a CSS or XPath expression.
  206 + #
  207 + # See also Nokogiri::XML::Node#at
  208 +
192 209 def_delegator :parser, :at, :at
193 210
194 211 ##

0 comments on commit 30eb161

Please sign in to comment.
Something went wrong with that request. Please try again.