Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQLWrapper > 1.7.1 seems to break upstream rdflib test #70

Closed
joernhees opened this issue Nov 22, 2015 · 12 comments
Closed

SPARQLWrapper > 1.7.1 seems to break upstream rdflib test #70

joernhees opened this issue Nov 22, 2015 · 12 comments
Assignees
Labels
Milestone

Comments

@joernhees
Copy link
Member

i noticed that some PRs on upstream rdflib broke in seemingly unrelated code... (see RDFLib/rdflib#550 )

the error reported is this:

======================================================================
ERROR: testNamedGraphUpdate (test.test_sparqlupdatestore.TestSparql11)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/RDFLib/rdflib/test/test_sparqlupdatestore.py", line 237, in testNamedGraphUpdate
    for v in g.objects(michel, says):
  File "/home/travis/build/RDFLib/rdflib/rdflib/graph.py", line 634, in objects
    for s, p, o in self.triples((subject, predicate, None)):
  File "/home/travis/build/RDFLib/rdflib/rdflib/graph.py", line 424, in triples
    for (s, p, o), cg in self.__store.triples((s, p, o), context=self):
  File "/home/travis/build/RDFLib/rdflib/rdflib/plugins/stores/sparqlstore.py", line 414, in triples
    doc = ElementTree.parse(SPARQLWrapper.query(self).response)
  File "/opt/python/2.7.9/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
  File "/opt/python/2.7.9/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
    parser.feed(data)
  File "/opt/python/2.7.9/lib/python2.7/xml/etree/ElementTree.py", line 1640, in feed
    self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 328-329: ordinal not in range(128)

as the PRs locally passed all tests and i still had SPARQLWrapper 1.6.4, i started bisecting... it seems that when depending on SPARQLWrapper > 1.7.1 that one test suddenly breaks.

Any ideas which of the changes in 1.7.1...1.7.2 could be the cause?

@joernhees joernhees added the bug label Nov 22, 2015
@joernhees
Copy link
Member Author

notice that i don't have any intention to merge RDFLib/rdflib#550 as that would cut SPARQLWrapper to 1.7.1 instead of current 1.7.5, RDFLib/rdflib#550 and this issue is just to figure out where to fix this best (and quick)

@uholzer
Copy link

uholzer commented Nov 22, 2015

I think I found the problem.

Let's look at rdflib/plugins/stores/sparqlstore.py from rdflib, line 414:

doc = ElementTree.parse(SPARQLWrapper.query(self).response)

This line exptects SPARQLWrapper.query(self).response to be a file-like object in binary mode, i.e. yielding str, not unicode when calling read() on it in Python 2. (As it is the body of an HTTP response, this is what I would expect too.) Adding the line

print repr(SPARQLWrapper.query(self).response.read())

right in front of it and running the unit tests again gives a unicode string as output:

u'<?xml version="1.0"?>\n<sparql xmlns="http://www.w3.org/2005/sparql-results#">\n  <head>\n    <variable name="o"/>\n  </head>\n  <results>\n    <result>\n      <binding name="o">\n        <literal>1: adfk { \' \\" " { </literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>2: adfk } &lt;foo&gt; #\xe9\xef \\</literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>3: adfk { " \\\' \' { </literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>4: adfk } &lt;foo&gt; #\xe9\xef \\</literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>5: adfk { \' \\" " { </literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>6: adfk } &lt;foo&gt; #\xe9\xef \\</literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>7: ad adsfj &#x0A; { &#x0A; sadfj</literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>8: adfk { " \\\' \' { </literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>9: adfk } &lt;foo&gt; #\xe9\xef \\</literal>\n      </binding>\n    </result>\n    <result>\n      <binding name="o">\n        <literal>10: ad adsfj &#x0A; { &#x0A; sadfj</literal>\n      </binding>\n    </result>\n  </results>\n</sparql>\n'

Clearly, ElementTree then tries to decode this as UTF-8, which causes Python to implicitely encode it as ASCII beforehand. Replacing line 414 with

doc = ElementTree.fromstring(SPARQLWrapper.query(self).response.read().encode('UTF-8'))

fixes the unit test. Of course, I wouldn't recomend this as workaround. I think that SPARQLWrapper should return a file-like object in binary mode. Could it be that the new keepalive package introduced in commit 609961a is at fault?

@uholzer
Copy link

uholzer commented Nov 22, 2015

It turns out it is. keepalive.py provides a HTTPResponse which derives from httplib.HTTPResponse. It overrides the read function like this:

    def read(self, amt=None):
        ...

        s = self._rbuf + self._raw_read(amt).decode('UTF-8', 'ignore')
        self._rbuf = ''
        return s

@uholzer
Copy link

uholzer commented Nov 22, 2015

Reported this as an Issue against keepalive: wikier/keepalive#2

@joernhees
Copy link
Member Author

@uholzer thanks for looking into this... can we just remove the .decode(...) stuff? @wikier ?

@wikier
Copy link
Member

wikier commented Nov 23, 2015

@joernhees I think what's causing troubles to @uholzer.... we'll discuss it in wikier/keepalive#2 and try to get a fix out asap. I'll keep you posted, guys.

@wikier
Copy link
Member

wikier commented Nov 24, 2015

@joernhees wikier/keepalive#2 provides a candidate patch. Besides manual testing, I'd welcome a unit test on our side testing the reported issue; is that possible?

@joernhees
Copy link
Member Author

see https://travis-ci.org/RDFLib/rdflib/builds/92929930 (was fixed on py2 but some stuff still fails on py3)

@joernhees
Copy link
Member Author

a unit test inside SPARQLWrapper for queries/data with unicode chars would propbably be a good idea...

@wikier
Copy link
Member

wikier commented Nov 25, 2015

@joernhees see wikier/keepalive#2, fixed in wikier/keepalive@b1d2c32.

Definitely a unit test it's necessary. If no one contributed it, by the end of the week I'll to find time to take care of it before pushing out a new 1.7.6 release depending on keepalive>=0.5.

@wikier
Copy link
Member

wikier commented Dec 17, 2015

any idea how this could be tested?

@wikier
Copy link
Member

wikier commented Dec 18, 2015

OK, let's push this out for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants