Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
61 lines (43 sloc) 2.08 KB

Searching the response body

String search

With the doc.text_search method, you can find out if the response body contains a certain string or not:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.text_search(u'tes')
True

If you prefer to raise an exception if string was not found, then use the doc.text_assert method:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.text_assert(u'tez')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lorien/web/grab/grab/document.py", line 109, in text_assert
    raise DataNotFound(u'Substring not found: %s' % anchor)
grab.error.DataNotFound: Substring not found: tez

By default, all text search methods operate with unicode; i.e., you should pass unicode arguments to these methods and these methods will search inside document's body converted to unicode. There is an option to work with raw bytes, just pass byte=True to any method:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.text_search(b'tez', byte=True)

Regexp Search

You can search for a regular expression with doc.rex_search method that accepts compiled regexp object or just a text of regular expression:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.rex_search('<.+?>').group(0)
u'<h1>'

Method doc.rex_text returns you text contents of .group(1) of the found match object:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.rex_text('<.+?>(.+)<')
u'test'

Method doc.rex_assert raises DataNotFound exception if no match is found:

>>> g = Grab('<h1>test</h1>')
>>> g.doc.rex_assert('\w{10}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lorien/web/grab/grab/document.py", line 189, in rex_assert
    self.rex_search(rex, byte=byte)
  File "/home/lorien/web/grab/grab/document.py", line 180, in rex_search
    raise DataNotFound('Could not find regexp: %s' % regexp)
grab.error.DataNotFound: Could not find regexp: <_sre.SRE_Pattern object at 0x7fa40e97d1f8>