Skip to content

Commit

Permalink
more robustness wrt sentence segmenter output
Browse files Browse the repository at this point in the history
  • Loading branch information
kermitt2 committed Mar 26, 2022
1 parent d8e2afd commit 79ebdd7
Showing 1 changed file with 5 additions and 2 deletions.
Expand Up @@ -1502,8 +1502,11 @@ public void segmentIntoSentences(Element curParagraph, List<LayoutToken> curPara

if (refPos >= pos+posInSentence && refPos <= pos+sentenceLength) {
Node valueNode = mapRefNodes.get(new Integer(refPos));
if (pos+posInSentence < refPos)
sentenceElement.appendChild(text.substring(pos+posInSentence, refPos));
if (pos+posInSentence < refPos) {
String local_text_chunk = text.substring(pos+posInSentence, refPos);
local_text_chunk = XmlBuilderUtils.stripNonValidXMLCharacters(local_text_chunk);
sentenceElement.appendChild(local_text_chunk);
}
valueNode.detach();
sentenceElement.appendChild(valueNode);
refIndex = j;
Expand Down

0 comments on commit 79ebdd7

Please sign in to comment.