# Syntax Trees

The way we described the structure of a sentence is using a syntax tree.

## Representing Syntactic Structure

In this example, we'll work on a sentence that contains the same word. 

`buffalo` is an animal, and thus it's a `Noun`. But `buffalo` is also a verb that means "to intimidate". 

Buffalo can intimidate other buffalos. We'll make a sentence:

**"Buffalo buffalo buffalo"**

<img src = 'buffalo.png' width = 300/>

With the structure above, the `Verb Phrase` contains a structure: it consists of a `Verb` and a `Noun` phrase. 

<img src = 'buffalo2.png' width = 300/>

We want to be able to represent this structure via Python program. We'll use 2 different classes:

1. A `Tree` class represents a whole phrase (e.g. verb phrase). It has:
    * `tag`, tells us what kind of phrase it is. We'll use the first letter for indicator. For example:
        * `S`entence
        * `N`oun
        * `V`erb
    * `branches`, sequence of `Tree` or `Leaf` components
    
2. A `Leaf` represents a single word. It has:
    * `tag`, tells us the syntactic category (what kind of word, e.g. `N`oun or `V`erb)
    * `word`, the word itself
    
<img src = 'class.png' width = 500>

Above, the parts enclosed green are `Tree`, while the part enclosed red are `Leaf`.

What's the equivalent Python code?

We can create a `Leaf` by the following,

In [2]:
beasts = Leaf('N', 'buffalo') # This is buffalo as a noun
intimidate = Leaf('V', 'buffalo') # This is buffalo as a verb
S, NP, VP = "S", "NP", "VP"

Then we'll build the following `Tree`,

In [None]:
Tree(S, [Tree(NP, [beasts]),
        Tree(VP, [intimidate,
                 Tree(NP, [beasts])])])

The Python code needed to support this representation is fairly straightforward.



In [3]:
class Tree:
    def __init__(self, tag, branches):
        assert len(branches) >= 1
        for b in branches:
            assert isinstance(b, (Tree, Leaf)) # Checks if all the branches are trees or leaves
        self.tag = tag
        self.branches = branches
        
class Leaf:
    def __init__(self, tag, word):
        self.tag = tag
        self.word = word
        
beasts = Leaf('N', 'buffalo') # This is buffalo as a noun
intimidate = Leaf('V', 'buffalo') # This is buffalo as a verb
S, NP, VP = "S", "NP", "VP"

s = Tree(S, [Tree(NP, [beasts]),
             Tree(VP, [intimidate,
                       Tree(NP, [beasts])
                      ])
            ])

Here we try to see the components within the tree `s`.

In [4]:
s.tag

'S'

In [5]:
s.branches

[<__main__.Tree at 0x10d9c8400>, <__main__.Tree at 0x10d9cbe80>]

In [6]:
s.branches[0]

<__main__.Tree at 0x10d9c8400>

In [7]:
s.branches[0].tag

'NP'

In [8]:
s.branches[1]

<__main__.Tree at 0x10d9cbe80>

In [9]:
s.branches[1].tag

'VP'

It's a lot of work to explore a tree this way! Instead, we can define the function `print_tree`

In [12]:
"""Pretty-print Trees as indented S-expressions."""

import heapq
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

from io import StringIO

Leaf.__str__ = lambda leaf: '({tag} {word})'.format(**leaf.__dict__)

def print_tree(t, indent=0, end='\n'):
    """Print Tree or Leaf t with indentation.

    >>> np = Tree('NP', [Leaf('N', 'buffalo')])
    >>> t = Tree('S', [np, Tree('VP', [Leaf('V', 'buffalo'), np])])
    >>> print_tree(t)
    (S (NP (N buffalo))
       (VP (V buffalo)
           (NP (N buffalo))))
    """
    if isinstance(t, Leaf):
        print(t, end='')
    else:
        s = '(' + t.tag + ' '
        indent += len(s)
        print(s, end='')
        print_tree(t.branches[0], indent, '')
        for b in t.branches[1:]:
            print('\n' + ' '*indent, end='')
            print_tree(b, indent, '')
        print(')', end=end)



In [13]:
print_tree(s)

(S (NP (N buffalo))
   (VP (V buffalo)
       (NP (N buffalo))))
