You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sure – if that's what you want, you can implement a custom sentence segmentation strategy using the SentenceSegmenter hook. This isn't documented well at the moment, but I'll put this on my list for examples to add to the docs 😊
The SentenceSegmenter is a pipeline component that can be initialised with a strategy argument (the function used to compute the boundaries). For example:
importspacyfromspacy.pipelineimportSentenceSegmenterdefsplit_on_newlines(doc):
# compute your sentence boundaries here# and yield Span objectsnlp=spacy.load('en')
sbd=SentenceSegmenter(nlp.vocab, strategy=split_on_newlines)
nlp.add_pipe(sbd, first=True)
Setting first=True will add it before all other pipeline components and also before the parser (which normally sets the sentence boundaries). The parser should respect already set boundaries and ideally, you might also see better predictions – however, if you find that the POS tags and dependency labels are less accurate this way, you can also add it last in the pipeline by setting last=True instead.
is there a way to recognize sentences as each newline in document? I did not find this in the docs
for example:
The result is this :
I went to the store and then got beck.
he said.
well, I thats should be ok.
where the desired result is:
I went to the store and then got beck. he said.
well, thats should be ok.
Thanks
The text was updated successfully, but these errors were encountered: