Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to run the example sample code. #6

Closed
moon13 opened this issue Jun 7, 2013 · 7 comments
Closed

trying to run the example sample code. #6

moon13 opened this issue Jun 7, 2013 · 7 comments

Comments

@moon13
Copy link

moon13 commented Jun 7, 2013

0 down vote favorite

I just installed pdfquery in my machine, and I'm trying to run the example sample code:

import pdfquery
pdf = pdfquery.PDFQuery("examples/sample.pdf")
pdf.load()
label = pdf.pq(':contains("Your first name and initial")')
left_corner = float(label.attr('x0'))
bottom_corner = float(label.attr('y0'))
name = pdf.pq(':in_bbox("%s, %s, %s, %s")' % (left_corner, bottom_corner-30, left_corner+150, bottom_corner)).text()
print name

the problem is that I get this error

Traceback (most recent call last):
File "testePdfQuery.py", line 1, in
import pdfquery
File "/home/ubuntu/Downloads/pdfquery-0.1.3/pdfquery/init.py", line 1, in
from .pdfquery import PDFQuery
File "/home/ubuntu/Downloads/pdfquery-0.1.3/pdfquery/pdfquery.py", line 23, in
cssselect.Function._xpath_in_bbox = _xpath_in_bbox
AttributeError: 'module' object has no attribute 'Function'

any ideas how I can fix this and run the example? Thanks in advance.

@jcushman
Copy link
Owner

jcushman commented Jun 7, 2013

Sorry - this is a known issue that has to do with incompatibility with recent versions of lxml. I just merged this pull request that catches the error ( #5 ), so try reinstalling pdfquery from github trunk and see if that fixes it.

This is a temporary fix that means you can't use xpath_in_bbox as described in the docs. It would be good to have a solution that keeps functionality with the new lxml -- this pull request might do it, but I haven't had time to play with it ( #3 ).

@moon13
Copy link
Author

moon13 commented Jun 7, 2013

Hi, thanks for answering. How can I uninstall this version I installed? :) sorry, I'm a python newbie.

NVM, I got it with pip uninstall. I will try getting the code from github trunk and will get you notified if I can run the example. Thanks.

@jcushman
Copy link
Owner

jcushman commented Jun 7, 2013

Welcome! First, if you haven't yet, you want to get pip working, the python
package manager. See http://www.pip-installer.org . Then (if you have git
installed) you should be able to do something like:

pip uninstall pdfquery
pip install -e git+https://github.com/jcushman/pdfquery.git#egg=pdfquery

This uninstalls the package and installs from source. As far as I know, all
this is doing behind the scenes is adding and removing files to your
site-packages directory (
http://stackoverflow.com/questions/122327/how-do-i-find-the-location-of-my-python-site-packages-directory
),
so in theory you could also do that directly.

(As you dig into python you might want to take a look at virtualenv, which
lets you keep a separate set of packages for each project you work on
instead of having them all jammed into site-packages. No need to complexify
it too much at this point though.)

On Fri, Jun 7, 2013 at 2:10 PM, moon13 notifications@github.com wrote:

Hi, thanks for answering. How can I uninstall this version I installed? :)
sorry, I'm a python newbie.


Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-19123523
.

@moon13
Copy link
Author

moon13 commented Jun 7, 2013

Jcushman, I followed your instructions and I installed the pdfquery from the source code on github. Now I try to run the sample code and I get this:

Traceback (most recent call last):
File "testePdfQuery.py", line 3, in
pdf = pdfquery.PDFQuery("examples/sample.pdf")
NameError: name 'pdfquery' is not defined

I get this error even if I have the import in my file "import pdfquery".

print sys.path gives me this

['/home/ubuntu/Downloads', '/usr/local/lib/python2.7/dist-packages/pyquery-1.2.4-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/cssselect-0.8-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/roman-2.0.0-py2.7.egg', '/usr/local/lib/python2.7/dist-packages', '/home/ubuntu/src/pdfquery', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/ubuntuone-client', '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel', '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']

@jcushman
Copy link
Owner

jcushman commented Jun 7, 2013

Hmm, that one's tough to diagnose from here. pip I think would probably put
the files in /usr/lib/python2.7/dist-packages, so you would end up with
/usr/lib/python2.7/dist-packages/pdfquery/pdfquery.py and it would import
from there. If it installed as a .egg (really a zip file), you might try
unzipping it.

On Fri, Jun 7, 2013 at 2:43 PM, moon13 notifications@github.com wrote:

Jcushman, I followed your instructions and I installed the pdfquery from
the source code on github. Now I try to run the sample code and I get this:

Traceback (most recent call last):
File "testePdfQuery.py", line 3, in
pdf = pdfquery.PDFQuery("examples/sample.pdf")
NameError: name 'pdfquery' is not defined

I get this error even if I have the import in my file "import pdfquery".

print sys.path gives me this

['/home/ubuntu/Downloads',
'/usr/local/lib/python2.7/dist-packages/pyquery-1.2.4-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/cssselect-0.8-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/roman-2.0.0-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages', '/home/ubuntu/src/pdfquery',
'/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client',
'/usr/lib/python2.7/dist-packages/ubuntuone-client',
'/usr/lib/python2.7/dist-packages/ubuntuone-control-panel',
'/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']


Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-19125474
.

@jcushman
Copy link
Owner

jcushman commented Jun 7, 2013

Er, /usr/ _local_ /lib/python2.7/dist-packages , I meant.

On Fri, Jun 7, 2013 at 3:21 PM, Jack Cushman jcushman@gmail.com wrote:

Hmm, that one's tough to diagnose from here. pip I think would probably
put the files in /usr/lib/python2.7/dist-packages, so you would end up
with /usr/lib/python2.7/dist-packages/pdfquery/pdfquery.py and it would
import from there. If it installed as a .egg (really a zip file), you might
try unzipping it.

On Fri, Jun 7, 2013 at 2:43 PM, moon13 notifications@github.com wrote:

Jcushman, I followed your instructions and I installed the pdfquery from
the source code on github. Now I try to run the sample code and I get this:

Traceback (most recent call last):
File "testePdfQuery.py", line 3, in
pdf = pdfquery.PDFQuery("examples/sample.pdf")
NameError: name 'pdfquery' is not defined

I get this error even if I have the import in my file "import pdfquery".

print sys.path gives me this

['/home/ubuntu/Downloads',
'/usr/local/lib/python2.7/dist-packages/pyquery-1.2.4-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/cssselect-0.8-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/roman-2.0.0-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages', '/home/ubuntu/src/pdfquery',
'/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client',
'/usr/lib/python2.7/dist-packages/ubuntuone-client',
'/usr/lib/python2.7/dist-packages/ubuntuone-control-panel',
'/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']


Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-19125474
.

@moon13
Copy link
Author

moon13 commented Jun 7, 2013

Right, I checked how pip installed it. It is installed as "pdfquery.egg-link". I've tried to unzip it, but no success :(

root@ubuntu-DQ77PRO:/usr/local/lib/python2.7/dist-packages# unzip pdfquery.egg-link
Archive: pdfquery.egg-link
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of pdfquery.egg-link or
pdfquery.egg-link.zip, and cannot find pdfquery.egg-link.ZIP, period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants