# Universal Dependencies

See the learning materials associated with this exercise <a href="https://applied-language-technology.mooc.fi/html/notebooks/part_iii/02_universal_dependencies.html" target="blank_">here</a>.

For instructions on how to use TestMyCode (TMC) to test your code and submit it to the server, see <a href="https://applied-language-technology.mooc.fi/html/tmc.html" target="blank_">here</a>.

Remember to save this Notebook before testing your code. Press <kbd>Control</kbd>+<kbd>s</kbd> or select the *File* menu and click *Save*.

**The maximum number of points for this exercise is 25.**

## 1. Import the spaCy and spacy-stanza libraries (3 points)

Import the spaCy (`spacy`) and spacy-stanza (`spacy_stanza`) natural language processing libraries into Python.

In [1]:
# Write your answer below this line
import spacy
import spacy_stanza

2023-09-04 11:18:05.441944: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-09-04 11:18:05.442010: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## 2. Load a Stanza language model for Finnish into spaCy using spacy-stanza (3 points)

Use the spacy-stanza library to load a Stanza language model for Finnish into spaCy.

The language model has been pre-installed on your server.

Store the resulting *Language* object under the variable `pipe_fi`.

In [2]:
# Write your answer below this line
pipe_fi = spacy_stanza.load_pipeline(name='fi')


2023-09-04 11:18:10 INFO: Loading these models for language: fi (Finnish):
| Processor | Package |
-----------------------
| tokenize  | tdt     |
| mwt       | tdt     |
| pos       | tdt     |
| lemma     | tdt     |
| depparse  | tdt     |
| ner       | turku   |

2023-09-04 11:18:10 INFO: Use device: cpu
2023-09-04 11:18:10 INFO: Loading: tokenize
2023-09-04 11:18:10 INFO: Loading: mwt
2023-09-04 11:18:10 INFO: Loading: pos
2023-09-04 11:18:11 INFO: Loading: lemma
2023-09-04 11:18:11 INFO: Loading: depparse
2023-09-04 11:18:12 INFO: Loading: ner
2023-09-04 11:18:13 INFO: Done loading processors!


## 3. Process the string stored under the variable `example` using the language model (3 points)

Provide the text stored under the variable `example` as input to the spaCy *Language* object under `pipe_fi`.

Store the resulting *Doc* object under the variable `example_doc`.

In [3]:
# Create an example string
example = 'Papua-Uuteen-Guineaan kuuluva Bougainvillen autonominen alue pyrkii itsenäistymään vuoteen 2027 mennessä.'

# Write your answer below this line
example_doc = pipe_fi(example)
example_doc

Papua-Uuteen-Guineaan kuuluva Bougainvillen autonominen alue pyrkii itsenäistymään vuoteen 2027 mennessä.

## 4. Examine dependencies (6 points)

Run the cell below to import the displacy module from spaCy, and to render the syntactic dependencies in the *Doc* object `example_doc`.

In [4]:
# Import the displacy module
from spacy import displacy

# Render the parse tree for the example
displacy.render(example_doc, style='dep')

### 4.1 Loop over the *Doc* object `example_doc` (2 points)

Loop over the *Doc* object under the variable `example_doc` and print out each *Token*, its index in the *Doc* object, and its morphological features.

In [5]:
# Write your answer below this line
for token in example_doc:
    
    # Print the token, its lemma, dependency and morphological features
    print(token, token.i, token.morph)

Papua-Uuteen-Guineaan 0 Case=Ill|Number=Sing
kuuluva 1 Case=Nom|Degree=Pos|Number=Sing|PartForm=Pres|VerbForm=Part|Voice=Act
Bougainvillen 2 Case=Gen|Number=Sing
autonominen 3 Case=Nom|Degree=Pos|Derivation=Inen|Number=Sing
alue 4 Case=Nom|Number=Sing
pyrkii 5 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act
itsenäistymään 6 Case=Ill|InfForm=3|Number=Sing|VerbForm=Inf|Voice=Act
vuoteen 7 Case=Ill|Number=Sing
2027 8 NumType=Card
mennessä 9 AdpType=Post
. 10 


### 4.2 Examine the dependencies and answer the following questions (4 points)

Examine the visualisation above and answer the following questions.

- How many dependents does the *Token* at index 4 have? Assign the number under the variable `deps`.
- What is the index of the Token that acts as the **head** of the dependency relation `nsubj`? Assign the index under the variable `head`.
- What is the index of the Token that acts as the **root** of the parse tree? Assign the index under the variable `root`.
- What are the morphological features of the *Token* at index 9? Assign these features as a string under the variable `morph`.

Insert your answers in the cell below.

Note that you do **not** have to retrieve the answers programmatically – simply examine the visualisation above!

In [6]:
# Write your answer below this line
deps= 3
head = 5
root = 5
morph = 'AdpType=Post'

## 5. Get the dependents of a *Token* using spaCy (3 points)

Use the attributes of a *Token* object to retrieve the *Tokens* that depend on the *Token* at index 7 in the *Doc* object `example_doc`.

Cast the output into a list and store the list under the variable `deps_7`.

In [7]:
# Write your answer below this line

for token in example_doc:
    if token.i == 7:
        deps_7 = list(token.children)
deps_7        

[2027, mennessä]

## 6. Get dependents positioned left and right of a *Token* (3 points)

Use the attributes of a *Token* object to retrieve the syntactic dependents positioned left and right of the *Token* at index 1.

Cast the results into lists, and store the result under the variables `left` and `right`, respectively.

In [9]:
# Write your answer below this line
left = list(example_doc[1].lefts)
right = list(example_doc[1].rights)

## 7. Get the subtree for the head of the dependency relation `acl` (4 points)

Use the attributes of the *Token* object to retrieve the subtree for the head of the dependency relation `acl` in the *Doc* object `example_doc`.

Cast the result into a list and assign the result under the variable `stree`.

In [40]:
# Write your answer below this line


for token in example_doc:
    
    # Check if the Token has the dependency relation 'acl:relcl',
    # which stands for a relative clause
    if token.dep_ == 'acl':
        
        # If the Token has this dependency, use the subtree attribute
        # to fetch all dependents below this Token. The subtree attribute
        # returns a generator, so cast the result into a list and print.
        head_token = token.head

        stree = list(head_token.subtree)

In [41]:
stree

[Papua-Uuteen-Guineaan, kuuluva, Bougainvillen, autonominen, alue]