# Processing texts using Stanza

See the learning materials associated with this exercise <a href="https://applied-language-technology.mooc.fi/html/notebooks/part_iii/01_multilingual_nlp.html" target="blank_">here</a>.

For instructions on how to use TestMyCode (TMC) to test your code and submit it to the server, see <a href="https://applied-language-technology.mooc.fi/html/tmc.html" target="blank_">here</a>.

Remember to save this Notebook before testing your code. Press <kbd>Control</kbd>+<kbd>s</kbd> or select the *File* menu and click *Save*.

**The maximum number of points for this exercise is 25.**

## 1. Import the Stanza library (1 point)

Import the Stanza natural language processing library (`stanza`) into Python.

In [2]:
# Write your answer below this line
import stanza


[1m[TMC][0m [92mThe Stanza library was imported successfully! 1 point.[0m

## 2. Initialise a Stanza Pipeline object (5 points)

Use the Stanza *Pipeline* class to create a Stanza natural language processing pipeline.

Load the default language model for Finnish (`fi`) into the *Pipeline* object. The language model has been downloaded on your server.

Assign the resulting *Pipeline* object under the variable `nlp`.

In [3]:
# Write your answer below this line
nlp = stanza.Pipeline(lang='fi')


[1m[TMC][0m [92mThe variable "nlp" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe variable "nlp" contains a Stanza Pipeline object! 3 points.[0m

[1m[TMC][0m [92mThe variable "nlp" contains a Pipeline for Finnish! 1 point.[0m

## 3. Feed text to the language model for processing (3 points)

The variable `text` contains a few sentences in Finnish.

Feed the string object stored under the variable `text` to the language model under `nlp`.

Store the resulting Stanza *Document* object under the variable `my_doc`.

In [5]:
# Create a string object with some text in Finnish
text = "Tässä on yksi esimerkki. Ja tässä on toinen."

# Write your answer below this line
my_doc = nlp(text)


[1m[TMC][0m [92mThe variable "my_doc" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe variable "my_doc" contains a Stanza Document object! 2 points.[0m

## 4. Get the first sentence in the Stanza *Document* object (3 points)

Use the `sentences` attribute to retrieve the first sentence in the Stanza *Document* object under `my_doc`.

Assign the first sentence under the variable `my_sent`.

In [22]:
# Write your answer below this line
my_sent = my_doc.sentences[0]

[1m[TMC][0m [92mThe variable "my_sent" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe variable "my_sent" contains a Stanza Sentence object! 1 point.[0m

[1m[TMC][0m [92mThe variable "my_sent" contains the expected objects! 1 point.[0m

In [27]:
my_sent

[
  {
    "id": 1,
    "text": "Tässä",
    "lemma": "tämä",
    "upos": "PRON",
    "xpos": "Pron",
    "feats": "Case=Ine|Number=Sing|PronType=Dem",
    "head": 0,
    "deprel": "root",
    "start_char": 0,
    "end_char": 5,
    "ner": "O"
  },
  {
    "id": 2,
    "text": "on",
    "lemma": "olla",
    "upos": "AUX",
    "xpos": "V",
    "feats": "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act",
    "head": 1,
    "deprel": "cop",
    "start_char": 6,
    "end_char": 8,
    "ner": "O"
  },
  {
    "id": 3,
    "text": "yksi",
    "lemma": "yksi",
    "upos": "NUM",
    "xpos": "Num",
    "feats": "Case=Nom|NumType=Card|Number=Sing",
    "head": 4,
    "deprel": "nummod",
    "start_char": 9,
    "end_char": 13,
    "ner": "O"
  },
  {
    "id": 4,
    "text": "esimerkki",
    "lemma": "esimerkki",
    "upos": "NOUN",
    "xpos": "N",
    "feats": "Case=Nom|Number=Sing",
    "head": 1,
    "deprel": "nsubj:cop",
    "start_char": 14,
    "end_char": 23,
    "ner": "

In [24]:
type(my_sent)

stanza.models.common.doc.Sentence

## 5. Convert the Stanza *Sentence* object into a Python dictionary (2 points)

Convert the linguistic annotations stored in the Stanza *Sentence* object under the variable `my_sent` into Python dictionaries using the `to_dict()` method.

Assign the resulting list of dictionaries under the variable `my_list`.

In [30]:
# Write your answer below this line
my_list = my_sent.to_dict()

[1m[TMC][0m [92mThe variable "my_list" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe variable "my_list" contains the expected objects! 1 point.[0m

In [30]:
my_list

[{'id': 1,
  'text': 'Tässä',
  'lemma': 'tämä',
  'upos': 'PRON',
  'xpos': 'Pron',
  'feats': 'Case=Ine|Number=Sing|PronType=Dem',
  'head': 0,
  'deprel': 'root',
  'start_char': 0,
  'end_char': 5,
  'ner': 'O'},
 {'id': 2,
  'text': 'on',
  'lemma': 'olla',
  'upos': 'AUX',
  'xpos': 'V',
  'feats': 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act',
  'head': 1,
  'deprel': 'cop',
  'start_char': 6,
  'end_char': 8,
  'ner': 'O'},
 {'id': 3,
  'text': 'yksi',
  'lemma': 'yksi',
  'upos': 'NUM',
  'xpos': 'Num',
  'feats': 'Case=Nom|NumType=Card|Number=Sing',
  'head': 4,
  'deprel': 'nummod',
  'start_char': 9,
  'end_char': 13,
  'ner': 'O'},
 {'id': 4,
  'text': 'esimerkki',
  'lemma': 'esimerkki',
  'upos': 'NOUN',
  'xpos': 'N',
  'feats': 'Case=Nom|Number=Sing',
  'head': 1,
  'deprel': 'nsubj:cop',
  'start_char': 14,
  'end_char': 23,
  'ner': 'O'},
 {'id': 5,
  'text': '.',
  'lemma': '.',
  'upos': 'PUNCT',
  'xpos': 'Punct',
  'head': 1,
  'deprel': 'punc

## 6. Collect lemmas from the dictionaries under `my_list` (4 points)

Collect the lemmas for each token from the dictionaries in the list `my_list`.

Use the `append()` method of a Python list to append the lemmas into the list named `lemmas`.

Tip: You need to define a `for` loop to retrieve each lemma.

In [36]:
# Create a placeholder list for the lemmas
lemmas = []

# Write your answer below this line
for i in my_list:
    lemmas.append(i['lemma'])

[1m[TMC][0m [92mThe variable "lemmas" exists![0m

[1m[TMC][0m [92mThe variable "lemmas" contains a list! [0m

[1m[TMC][0m [92mThe variable "lemmas" contains the expected values! 5 points.[0m

In [36]:
lemmas

['tämä', 'olla', 'yksi', 'esimerkki', '.']

## 7. Import the spacy-stanza library (1 point)

Import the spacy-stanza library (`spacy_stanza`) into Python.

In [38]:
# Write your answer below this line
import spacy_stanza

[1m[TMC][0m [92mThe spacy-stanza library was imported successfully! 1 point.[0m

## 8. Load a Stanza language model into spaCy (3 points)

Use the spacy-stanza library to load a Stanza language model for Finnish into spaCy.

Store the resulting *Language* object under the variable `spacy_fi`.

In [39]:
# Write your answer below this line
spacy_fi = spacy_stanza.load_pipeline(name='fi')


[1m[TMC][0m [92mThe variable "spacy_fi" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe Stanza language model for Finnish was loaded successfully into spaCy! 2 points.[0m

## 9. Feed text to the spaCy language model (3 points)

Provide the text stored under the variable `text` as input to the language model under `spacy_fi`.

Store the resulting *Doc* object under the variable `doc_fi`.

In [40]:
# Write your answer below this line
doc_fi =spacy_fi(text)

[1m[TMC][0m [92mThe variable "doc_fi" was defined successfully! 1 point.[0m

[1m[TMC][0m [92mThe variable "doc_fi" contains a spaCy Doc object! 1 point.[0m

[1m[TMC][0m [92mThe variable "doc_fi" contains the expected value ! 1 point.[0m