## Part 2. Dependency Grammar

In this section you will draw syntax trees that visualize the dependency structure of words in sentences.

First, install the NLTK package:

In [None]:
# Import the NLTK library
import sys
!{sys.executable} -m pip install nltk
import nltk

Then run this piece of code from the NLTK book:

In [None]:
# Define the grammar
groucho_dep_grammar = nltk.DependencyGrammar.fromstring("""
  'shot' -> 'I' | 'elephant' | 'in'
  'elephant' -> 'an' | 'in'
  'in' -> 'pajamas'
  'pajamas' -> 'my'
  """)

# Parse a sentence using a dependency parser for the given grammar
pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
sent = 'I shot an elephant in my pajamas'.split()
trees = pdp.parse(sent)
for tree in trees:
    print(tree.pformat(parens='[]'))

As is mentioned in the NLTK book, this grammar only captures bare dependency information without specifying the type of dependency.

There are two alternative parses for the sentence. Copy-paste the strings into the syntax tree generator at http://mshang.ca/syntree/ (one tree at a time), and expect to see these pictures:

![image.png](attachment:image.png)



Your task is now to write your own dependency grammar that covers the following four Finnish sentences from the [Universal Dependencies](http://universaldependencies.org/) treebank collection. You don't need all information in the data, which is in the [CoNLL-U format](http://universaldependencies.org/format.html), but you need to pick out the dependency information.

<pre>
# sentence-text: Mitä minä puhun?
1       Mitä    mikä    PRON    Pron    Case=Par|Number=Sing|PronType=Int       3       dobj    _       _
2       minä    minä    PRON    Pron    Case=Nom|Number=Sing|Person=1|PronType=Prs      3       nsubj   _       _
3       puhun   puhua   VERB    V       Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 0       root    _       SpaceAfter=No
4       ?       ?       PUNCT   Punct   _       3       punct   _       _

# sentence-text: Mitä minä ihan oikeasti puhun?
1       Mitä    mikä    PRON    Pron    Case=Par|Number=Sing|PronType=Int       5       dobj    _       _
2       minä    minä    PRON    Pron    Case=Nom|Number=Sing|Person=1|PronType=Prs      5       nsubj   _       _
3       ihan    ihan    ADV     Adv     _       4       advmod  _       _
4       oikeasti        oikeasti        ADV     Adv     _       5       advmod  _       _
5       puhun   puhua   VERB    V       Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 0       root    _       SpaceAfter=No
6       ?       ?       PUNCT   Punct   _       5       punct   _       _

# sentence-text: Millainen minä olen?
1       Millainen       millainen       ADJ     A       Case=Nom|Degree=Pos|Number=Sing 0       root    _       _
2       minä    minä    PRON    Pron    Case=Nom|Number=Sing|Person=1|PronType=Prs      1       nsubj:cop       _       _
3       olen    olla    VERB    V       Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 1       cop     _       SpaceAfter=No
4       ?       ?       PUNCT   Punct   _       1       punct   _       _

# sentence-text: Millainen lapsi minä olen?
1       Millainen       millainen       ADJ     A       Case=Nom|Degree=Pos|Number=Sing 2       amod    _       _
2       lapsi   lapsi   NOUN    N       Case=Nom|Number=Sing    0       root    _       _
3       minä    minä    PRON    Pron    Case=Nom|Number=Sing|Person=1|PronType=Prs      2       nsubj:cop       _       _
4       olen    olla    VERB    V       Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 2       cop     _       SpaceAfter=No
5       ?       ?       PUNCT   Punct   _       2       punct   _       _
</pre>

Enter your grammar in the code cell below:

In [None]:
finnish_dep_grammar = nltk.DependencyGrammar.fromstring("""
  'Write' -> 'your' | 'own' | 'grammar' | 'here' | '!'
  """)

Then parse the following Finnish sentences with your grammar. Every sentence should get at least one parse. You do not have to modify or fully understand the code in the cell below:

In [None]:
sents = [ \
         'Millainen minä olen ?',
         'minä puhun oikeasti',
         'ihan oikeasti puhun',
         'Millainen lapsi minä olen ?',
         'minä olen Millainen lapsi ?',
        ]

pdp = nltk.ProjectiveDependencyParser(finnish_dep_grammar)
for i, sent in enumerate(sents):
    print("Sentence #{:d}: {:s}".format(i, sent))
    trees = pdp.parse(sent.split())
    for tree in trees:
        print(tree.pformat(parens='[]'))
    print()

Do the parses make sense to you? What about the alternative parses for some of the sentences?

Copy-paste some of the parses into the syntax tree generator and view the trees as pictures.

When you are done here, you can continue to Part 3.