Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foliavalidator: processdir raises UnboundLocalError instead of returning false #40

Closed
Duchadian opened this issue Nov 21, 2017 · 7 comments
Assignees
Labels

Comments

@Duchadian
Copy link

On validation of an entire dir, folialint returns a failure while foliavalidator raises an Error (see output below).
Behaviour should be consolidated (i.e. foliavalidator should return false)?

folialint
(lamachine)tvoets@applejack:/vol/tensusers/proycon/clin_spellingstaak/annotated_docs$ folialint --nooutput nnota/*.xml
nnota/page1341.tested.tagged.folia.xml failed:
XML error: Unresolvable id page1341.text.div.1.p.1.s.7.w.24in WordReference

foliavalidator
(lamachine)tvoets@applejack:/vol/tensusers/proycon/clin_spellingstaak/annotated_docs$ foliavalidator nnota/
Searching in nnota/
Traceback (most recent call last):
File "/vol/customopt/lamachine/bin/foliavalidator", line 11, in
sys.exit(main())
File "/vol/customopt/lamachine/lib/python3.4/site-packages/foliatools/foliavalidator.py", line 145, in main
r = processdir(x,schema,quick,settings.deep, settings.stricttextvalidation,settings.debug)
File "/vol/customopt/lamachine/lib/python3.4/site-packages/foliatools/foliavalidator.py", line 87, in processdir
if not r: success = False
UnboundLocalError: local variable 'r' referenced before assignment

@kosloot
Copy link
Collaborator

kosloot commented Nov 21, 2017

I think that folialint (and libfolia) is correct in rejecting forward word references.
We never supported this, imho. But maybe it is not explicitly stated in the documentation.
Forward references would make the life very hard for programs using SAX, XMLReadline etc.

@proycon
Copy link
Owner

proycon commented Nov 21, 2017

The foliavalidator error is problem in dir processing (working on a fix). When tested alone, the file does validate:

$ foliavalidator page1341.tested.tagged.folia.xml                                                                                      
Validated successfully: page1341.tested.tagged.folia.xml

@kosloot
Copy link
Collaborator

kosloot commented Nov 21, 2017

Nevertheless, i think forward references are WRONG

@Duchadian
Copy link
Author

Duchadian commented Nov 21, 2017

The other issue:

file: /vol/tensusers/proycon/clin_spellingstaak/annotated_docs/pkampschreur/page1161.tested.tagged.folia.xml

_foliavalidator_
(lamachine.dev)tvoets@applejack:/vol/tensusers/proycon/clin_spellingstaak/annotated_docs$ foliavalidator pkampschreur/page1161.tested.tagged.folia.xml
VALIDATION ERROR on full parse by library (stage 2/2), in pkampschreur/page1161.tested.tagged.folia.xml
ParseError: FoLiA exception in handling of < s > @ line 1038: [TypeError] Can't convert 'NoneType' object to str implicitly

Traceback (most recent call last):
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 2586, in parsexml
    e = doc.parsexml(subnode, Class)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 7273, in parsexml
    return Class.parsexml(node,self)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 2664, in parsexml
    instance = Class(doc, *args, **kwargs)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 5686, in __init__
    super(Sentence,self).__init__(doc, *args, **kwargs)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 3158, in __init__
    super(AbstractStructureElement,self).__init__(doc, *args, **kwargs)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 686, in __init__
    self.append(child)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 3169, in append
    e = super(AbstractStructureElement,self).append(child, *args, **kwargs)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 1590, in append
    if dopostappend: child.postappend()
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 3177, in postappend
    self.doc.textvalidationerrors += int(not self.textvalidation())
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 850, in textvalidation
    strictnormtext = self.text(cls,retaintokenisation=False,strict=True, normalize_spaces=True)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 900, in text
    return self.textcontent(cls, correctionhandling).text(normalize_spaces=normalize_spaces)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 3457, in text
    return super(TextContent,self).text(normalize_spaces=normalize_spaces) #AbstractElement will handle it now, merely overridden to get rid of parameters that dont make sense in this context
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 908, in text
    if s: s += e.TEXTDELIMITER #for AbstractMarkup, will usually be ""
TypeError: Can't convert 'NoneType' object to str implicitly

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/FoLiA_tools-1.5.0.59-py3.4.egg/foliatools/foliavalidator.py", line 55, in validate
    document = folia.Document(file=filename, deepvalidation=deep,textvalidation=True,verbose=True, debug=debug)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 6398, in __init__
    self.load(self.filename)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 6435, in load
    self.parsexml(self.tree.getroot())
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 7265, in parsexml
    e = self.parsexml(subnode)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 7273, in parsexml
    return Class.parsexml(node,self)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 2586, in parsexml
    e = doc.parsexml(subnode, Class)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 7273, in parsexml
    return Class.parsexml(node,self)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 2586, in parsexml
    e = doc.parsexml(subnode, Class)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 7273, in parsexml
    return Class.parsexml(node,self)
  File "/vol/customopt/lamachine.dev/lib/python3.4/site-packages/PyNLPl-1.2.4-py3.4.egg/pynlpl/formats/folia.py", line 2591, in parsexml
    raise ParseError("FoLiA exception in handling of <" + subnode.tag[len(NSFOLIA)+2:] + "> @ line " + str(subnode.sourceline) + ": [" + e.__class__.__name__ + "] " + str(e), cause=e)
pynlpl.formats.folia.ParseError: FoLiA exception in handling of < s > @ line 1038: [TypeError] Can't convert 'NoneType' object to str implicitly

folialint

(lamachine.dev)tvoets@applejack:/vol/tensusers/proycon/clin_spellingstaak/annotated_docs$ folialint pkampschreur/page1161.tested.tagged.folia.xml
pkampschreur/page1161.tested.tagged.folia.xml failed:
XML error: attempt to add an empty <t> to word: page1161.text.div.3.list.1.item.4.s.2.w.8

proycon added a commit to proycon/pynlpl that referenced this issue Nov 21, 2017
@proycon
Copy link
Owner

proycon commented Nov 21, 2017

bug reproduces (python lib), test fails indeed

@kosloot
Copy link
Collaborator

kosloot commented Nov 21, 2017

bug is already fixed in libfolia

@proycon
Copy link
Owner

proycon commented Nov 21, 2017

closing this then

@proycon proycon closed this as completed Nov 21, 2017
proycon added a commit to proycon/foliapy that referenced this issue Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants