Skip to content

Commit

Permalink
add xpath
Browse files Browse the repository at this point in the history
  • Loading branch information
paulbradshaw committed Jan 16, 2017
1 parent 24230ec commit 9193c15
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions scraper.py
Expand Up @@ -12,3 +12,9 @@
xmldata = scraperwiki.pdftoxml(pdfdata)
print "After converting to xml it has %d bytes" % len(xmldata)
root = lxml.etree.fromstring(xmldata)

# this line uses xpath to find <text tags
lines = root.findall('.//text')
print lines
for line in lines:
print line.text

0 comments on commit 9193c15

Please sign in to comment.