In [1]:
# Update Spacy to latest version
# !pip install -U pip setuptools wheel
# !pip install -U spacy

# Install language model
# !python -m spacy download nb_core_news_lg
# !python -m spacy download en_core_web_sm
# !pip install -U spacy-lookups-data

In [2]:
import spacy.lang.sv

In [3]:
import spacy
spacy.__version__

'3.0.5'

## English - model

In [7]:
text1 = '''Definitely encourage you to continue making big bets in 2018. The new project seems like a great opportunity for us to invest in an area where the org needs better tooling. It says alot when you made the internal team swap to address this bet. It's easy to think "lets do it", but committing to it by moving parts that existing stakeholders were previously happy with (I hope) is a big bet in itself. I've developed the opinion that with our current team size and the kind of requests I've seen come down the pipeline, if every stakeholder is perfectly happy, it's likely we are not really taking those big bets.'''
text2 = "Continue helping us push back on smaller (lower impact) requests to keep time, not only for big bets, but also tech debt, better documentation, internal improvements and tooling. There is delicate balance needed to keep helping our partners in the short term, while working for the long term objectives of XYZ. So far, you've been a big help with this."

#### Dependency parse based


In [8]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text1)
for sent in doc.sents:
    print(sent.text)

Definitely encourage you to continue making big bets in 2018.
The new project seems like a great opportunity for us to invest in an area where the org needs better tooling.
It says alot when you made the internal team swap to address this bet.
It's easy to think "lets do it", but committing to it by moving parts that existing stakeholders were previously happy with (I hope) is a big bet in itself.
I've developed the opinion that with our current team size and the kind of requests I've seen come down the pipeline, if every stakeholder is perfectly happy, it's likely we are not really taking those big bets.


#### Statistical sentence segmenter

In [10]:
nlp = spacy.load("en_core_web_sm", exclude=["parser"])
nlp.enable_pipe("senter")
doc = nlp(text1)
for sent in doc.sents:
    print(sent.text)

Definitely encourage you to continue making big bets in 2018.
The new project seems like a great opportunity for us to invest in an area where the org needs better tooling.
It says alot when you made the internal team swap to address this bet.
It's easy to think "lets do it", but committing to it by moving parts that existing stakeholders were previously happy with (I hope) is a big bet in itself.
I've developed the opinion that with our current team size and the kind of requests I've seen come down the pipeline, if every stakeholder is perfectly happy, it's likely we are not really taking those big bets.


#### Rule-based pipeline component


In [9]:
from spacy.lang.en import English

nlp = English()  # just the language with no pipeline
nlp.add_pipe("sentencizer")
doc = nlp(text1)
for sent in doc.sents:
    print(sent.text)

Definitely encourage you to continue making big bets in 2018.
The new project seems like a great opportunity for us to invest in an area where the org needs better tooling.
It says alot when you made the internal team swap to address this bet.
It's easy to think "lets do it", but committing to it by moving parts that existing stakeholders were previously happy with (I hope) is a big bet in itself.
I've developed the opinion that with our current team size and the kind of requests I've seen come down the pipeline, if every stakeholder is perfectly happy, it's likely we are not really taking those big bets.


## Norwegian Bokmål - model

In [4]:
nlp = spacy.load("nb_core_news_lg")
doc = nlp("En ny jätteundersökning avslöjar vilka bilar som hamnar på verkstaden oftast. Och hur kostsamma de kan bli att äga. Det är smått skrämmande läsning för vissa biltillverkare. Här är listan du INTE kan vara utan om du ska köpa ny eller begagnad bil.")
print([(ent.text, ent.label) for ent in doc.ents])

[]


In [5]:
doc = nlp("Paolo Roberto berättar varför han köpte sex.")
print([(ent.text, ent.label) for ent in doc.ents])

[('Paolo Roberto', 4317129024397789502)]
