Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence splitting #67

Closed
mfilipov opened this issue Jun 8, 2015 · 3 comments
Closed

Sentence splitting #67

mfilipov opened this issue Jun 8, 2015 · 3 comments

Comments

@mfilipov
Copy link

mfilipov commented Jun 8, 2015

Python version: 3.4.3, latest spaCy (0.85), redownloaded all data (python -m spacy.en.download all).

It looks like there's something wrong with sentence splitting. Example:

pipeline = spacy.en.English()

tokens = pipeline("The Germans have their Faust; but Faust is a tragedy with a cosmic philosophic theme.")

root_inds = [ind for ind, token in enumerate(tokens) if token.dep_ == "ROOT"]

root_inds has two elements - corresponding sentences are "The Germans have their Faust; but" and "Faust is a tragedy with a cosmic philosophical theme.".

I guess this is related to the ROOT bug #57.

@honnibal
Copy link
Member

Thanks — working on this.

@honnibal
Copy link
Member

Just released version 0.86.

Your example now parses correctly, and accuracy is up on aggregate. Further improvements to sentence boundary detection accuracy should be forth-coming.

Please keep reporting prominent failures as they occur.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants