Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy integration example is broken #96

Open
iibrahimli opened this issue Apr 25, 2021 · 0 comments · May be fixed by #97
Open

Spacy integration example is broken #96

iibrahimli opened this issue Apr 25, 2021 · 0 comments · May be fixed by #97

Comments

@iibrahimli
Copy link

Describe the bug
The example script is currently broken with the current latest versions of spacy and pysbd. Adding a pipe to spacy model throws an exception. Moreover, sentences are not split when extra spaces are present before/after them.

To Reproduce
Steps to reproduce the behavior:
Run the examples/pysbd_as_spacy_component.py script.
This will result in the exception being thrown. If the pipeline addition is fixed and the code proceeds further, the sentence segmentation will not match the output of using pysbd module without spacy.

Expected behavior
Expected the code to not raise an exception, and the output to be correct.

Example:
Input text - "Hello world. My name is Mr. Smith. I work for the U.S. Government and I live in the U.S. I live in New York."

Expected output - ["Hello world.", "My name is Mr. Smith.", "I work for the U.S. Government and I live in the U.S.", "I live in New York."]

Actual output - ["Hello world. My name is Mr. Smith. I work for the U.S. Government and I live in the U.S.", "I live in New York."]

Additional context
Versions tested:
spacy==3.0.6
pysbd==0.3.4

The error thrown when trying to add pipe to the spacy model:

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got <function pysbd_sentence_boundaries at 0x7f0498c62158> (name: 'None').
- If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead.
- If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`.
- If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline.
@iibrahimli iibrahimli linked a pull request Apr 25, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant