Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extruct - 0.13.0 is not compatible with the latest rdflib #181

Closed
kalravparsana opened this issue Sep 9, 2021 · 5 comments
Closed

Extruct - 0.13.0 is not compatible with the latest rdflib #181

kalravparsana opened this issue Sep 9, 2021 · 5 comments

Comments

@kalravparsana
Copy link

This error is observed while importing extruct

import extruct
File "/usr/local/lib/python3.9/dist-packages/extruct/__init__.py", line 4, in <module>
 from extruct.rdfa import RDFaExtractor
File "/usr/local/lib/python3.9/dist-packages/extruct/rdfa.py", line 12, in <module>
 from rdflib.plugins.parsers.pyRdfa import pyRdfa as PyRdfa, Options, logger as pyrdfa_logger
ModuleNotFoundError: No module named 'rdflib.plugins.parsers.pyRdfa'
@lopuhin
Copy link
Member

lopuhin commented Sep 9, 2021

@kalravparsana which versions of the libraries are you using? I'm not getting a failure with rdflib==6.0.0 extruct==0.13.0 pyrdfa3==3.5.3

@kalravparsana
Copy link
Author

@lopuhin thanks for checking.

This is the o/p of pip freeze

awslambdaric==1.2.2
beautifulsoup4==4.10.0
boto3==1.18.38
botocore==1.21.38
certifi==2019.11.28
chardet==3.0.4
click==8.0.1
cssselect==1.1.0
dbus-python==1.2.16
distro-info===0.23ubuntu1
extruct==0.3.0
feedfinder2==0.0.4
feedparser==6.0.8
filelock==3.0.12
html5lib==1.1
idna==2.8
isodate==0.6.0
jieba3k==0.35.1
jmespath==0.10.0
joblib==1.0.1
langdetect==1.0.9
loggers==0.1.4
lxml==4.6.3
newspaper3k==0.2.8
nltk==3.6.2
Pillow==8.3.2
PyGObject==3.36.0
pyparsing==2.4.7
pyRdfa3==3.5.3
python-apt==2.0.0+ubuntu0.20.4.6
python-dateutil==2.8.2
PyYAML==5.4.1
rdflib==6.0.0
regex==2021.8.28
requests==2.22.0
requests-file==1.5.1
requests-unixsocket==0.2.0
s3transfer==0.5.0
sgmllib3k==1.0.0
simplejson==3.17.2
six==1.14.0
soupsieve==2.2.1
textdistance==4.2.1
tinysegmenter==0.3
tldextract==3.1.2
tqdm==4.62.2
unattended-upgrades==0.1
urllib3==1.25.8
webencodings==0.5.1

And this is my requirements.txt file

boto3
requests
loggers
extruct
langdetect
textdistance
newspaper3k
feedparser
python-dateutil
rdflib
pyrdfa3

You can see that version for extruct is taken 0.3.0 by default which is way older and if I pin to 0.13.0, I get this error
#19 7.597 error in rdflib-jsonld setup command: use_2to3 is invalid. #19 7.597 ---------------------------------------- #19 7.597 WARNING: Discarding https://files.pythonhosted.org/packages/a7/60/267b54976f779d0c5b22448525495524c069285586dc22f21bfb29c25cf6/rdflib-jsonld-0.2.tar.gz#sha256=aed044b9c9eb7b136446e169e88c9626b53991066696a533482051c0ccf84375 (from https://pypi.org/simple/rdflib-jsonld/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. #19 7.598 ERROR: Could not find a version that satisfies the requirement rdflib-jsonld (from extruct) (from versions: 0.2, 0.3, 0.4.0, 0.5.0) #19 7.598 ERROR: No matching distribution found for rdflib-jsonld

Just to add the extra information, we are trying to install inside docker. And our base image is FROM ubuntu:latest

here is the docker file

FROM ubuntu:latest

ENV DEBIAN_FRONTEND noninteractive

# install basic packages
RUN apt-get update
RUN apt-get install software-properties-common curl unzip gcc git -y

# install python
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update
RUN apt install python3.9 python3.9-distutils python3.9-dev -y

# install aws
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install

# install pip
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3.9 install boto3
RUN pip3.9 install awslambdaric


COPY dir/requirements.txt /home/requirements.txt
RUN pip3.9 install -r /home/requirements.txt

@lopuhin
Copy link
Member

lopuhin commented Sep 10, 2021

Aha I see, must be some issue with a newer pip version resolver which does not allow 0.13.0 to be installed. So downgrading pip to some version from mid-2020 may help (or disabling this resolver), but we should reproduce and fix it on our side as well.

@kalravparsana
Copy link
Author

@lopuhin This seems to have been solved after recent changes on https://github.com/RDFLib/rdflib-jsonld.
Thanks for your support anyway.

@lopuhin
Copy link
Member

lopuhin commented Sep 13, 2021

Nice, thank you @kalravparsana 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants