Sentence splitting #67

mfilipov · 2015-06-08T14:02:16Z

Python version: 3.4.3, latest spaCy (0.85), redownloaded all data (python -m spacy.en.download all).

It looks like there's something wrong with sentence splitting. Example:

pipeline = spacy.en.English()

tokens = pipeline("The Germans have their Faust; but Faust is a tragedy with a cosmic philosophic theme.")

root_inds = [ind for ind, token in enumerate(tokens) if token.dep_ == "ROOT"]

root_inds has two elements - corresponding sentences are "The Germans have their Faust; but" and "Faust is a tragedy with a cosmic philosophical theme.".

I guess this is related to the ROOT bug #57.

The text was updated successfully, but these errors were encountered:

honnibal · 2015-06-11T23:03:52Z

Thanks — working on this.

honnibal · 2015-06-24T03:38:39Z

Just released version 0.86.

Your example now parses correctly, and accuracy is up on aggregate. Further improvements to sentence boundary detection accuracy should be forth-coming.

Please keep reporting prominent failures as they occur.

lock · 2018-05-09T17:32:06Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal closed this as completed Jun 26, 2015

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence splitting #67

Sentence splitting #67

mfilipov commented Jun 8, 2015

honnibal commented Jun 11, 2015

honnibal commented Jun 24, 2015

lock bot commented May 9, 2018

Sentence splitting #67

Sentence splitting #67

Comments

mfilipov commented Jun 8, 2015

honnibal commented Jun 11, 2015

honnibal commented Jun 24, 2015

lock bot commented May 9, 2018