Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example not working with Spacy version 3.1 and 3.0.6 #101

Open
Atul997 opened this issue Sep 16, 2021 · 3 comments
Open

Example not working with Spacy version 3.1 and 3.0.6 #101

Atul997 opened this issue Sep 16, 2021 · 3 comments

Comments

@Atul997
Copy link

Atul997 commented Sep 16, 2021

I have installed current spacy version 3. 1 and running the example with some modifications but it keeps throwing error of
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe('sentencizer'). Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting doc[i].is_sent_start.

Below is the code that I am using-
import spacy
import pysbd
from spacy.language import Language

@Language.component("sbd")
def pysbd_sentence_boundaries(doc):
seg = pysbd.Segmenter(language="en", clean=False, char_span=True)
sents_char_spans = seg.segment(doc.text)
char_spans = [doc.char_span(sent_span.start, sent_span.end, alignment_mode='contract')
for sent_span in sents_char_spans]
start_token_ids = [span[0].idx for span in char_spans if span is not None]
for token in doc:
token.is_sent_start = True if token.idx in start_token_ids else False
return doc

if name == "main":
text = "My name is Jonas E. Smith. Please turn to p.55."
nlp = spacy.blank('en')
doc = nlp(text)
# add as a spacy pipeline
nlp.add_pipe('sbd')
print('sent_id', 'sentence', sep='\t|\t')
for sent_id, sent in enumerate(doc.sents, start=1):
print(sent_id, sent.text, sep='\t|\t')

@gserapio
Copy link

Facing the same issue

@alexhamiltonRN
Copy link

@gserapio - You might find this example from medspacy helpful https://github.com/medspacy/medspacy/blob/master/medspacy/sentence_splitting.py

@gserapio
Copy link

gserapio commented Jul 4, 2022

@gserapio - You might find this example from medspacy helpful https://github.com/medspacy/medspacy/blob/master/medspacy/sentence_splitting.py

Thanks @alexhamiltonRN !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants