segment sentences on newlines #2299

ghost · 2018-05-05T23:49:49Z

is there a way to recognize sentences as each newline in document? I did not find this in the docs
for example:

text=" I went to the store and got beck. he said.\n well,  thats should be ok."


doc=nlp(text)
for sent in doc.sents:
    print (sent)

The result is this :
I went to the store and then got beck.
he said.
well, I thats should be ok.

where the desired result is:
I went to the store and then got beck. he said.
well, thats should be ok.

Thanks

ines · 2018-05-07T16:32:01Z

Sure – if that's what you want, you can implement a custom sentence segmentation strategy using the SentenceSegmenter hook. This isn't documented well at the moment, but I'll put this on my list for examples to add to the docs 😊

The SentenceSegmenter is a pipeline component that can be initialised with a strategy argument (the function used to compute the boundaries). For example:

import spacy
from spacy.pipeline import SentenceSegmenter

def split_on_newlines(doc):
   # compute your sentence boundaries here
   # and yield Span objects

nlp = spacy.load('en')
sbd = SentenceSegmenter(nlp.vocab, strategy=split_on_newlines)
nlp.add_pipe(sbd, first=True)

Setting first=True will add it before all other pipeline components and also before the parser (which normally sets the sentence boundaries). The parser should respect already set boundaries and ideally, you might also see better predictions – however, if you find that the POS tags and dependency labels are less accurate this way, you can also add it last in the pipeline by setting last=True instead.

Edit: https://spacy.io/usage/linguistic-features#sbd-custom

lock · 2018-06-10T05:55:27Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added the usage General spaCy usage label May 7, 2018

ines closed this as completed May 7, 2018

lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment sentences on newlines #2299

segment sentences on newlines #2299

ghost commented May 5, 2018 •

edited by ines

Loading

ines commented May 7, 2018 •

edited

Loading

lock bot commented Jun 10, 2018

segment sentences on newlines #2299

segment sentences on newlines #2299

Comments

ghost commented May 5, 2018 • edited by ines Loading

ines commented May 7, 2018 • edited Loading

lock bot commented Jun 10, 2018

ghost commented May 5, 2018 •

edited by ines

Loading

ines commented May 7, 2018 •

edited

Loading