# Week 06: Collocation Extraction
In Assignment 5, we found all skip-grams and their frequencies in <u>*wiki1G.txt*</u>. This week, we want to use the result of assignment 5 to extract collocations of [AKL verbs](https://uclouvain.be/en/research-institutes/ilc/cecl/academic-keyword-list.html). We will use [Smadja’s algorithm](https://aclanthology.org/J93-1007.pdf) to do it. Here are some basic terms need to be explain. 

We take "*dpend*" as an example:

<img src="https://imgur.com/cPyd7Gr.jpg" >

In this case, we want to find the collocations of "depend". Then, "depend" is called **base word** and marked as $W$. As for "on", "the", "for"..., they are called **collocate** and marked as $W_{i}$ where **i** represents their serial number. $P_{j}$ means the frequency of $W$ and $W_{i}$ with distance j. And **Freq** is the sum of frequencies of all distances.

There are three conditions to filter the skipgram to find collocations. We will go through three conditions below.

Considering that some students did not complete Assignment 5, in order to avoid them being unable to do assignment 6, we provide you with a file of calculated skipgram with frequencies, called **AKL_skipgram.tsv**. It only keeps the skipgrams with any AKL verb.

## Read Data
<font color="red">**[ TODO ]**</font> Please read <u>*AKL_skipgram.tsv*</u> and store it in the way you like.

In [1]:
import os
import statistics
import math

In [2]:
#### here are some hyperparameter
k0 = 1
k1 = 1
U0 = 10
base_word = "depend"

In [3]:
base_collocate = {}

with open(os.path.join('data', 'AKL_skipgram.tsv')) as f:
    
    for line in f:
        
        tokens = line.split()
        
        if not tokens[0] in base_collocate:
            collocate_freq = {}
            collocate_freq[tokens[1]] = tokens[2:]
            base_collocate[tokens[0]] = collocate_freq
        else:
            base_collocate[tokens[0]][tokens[1]] = tokens[2:]
    

In [4]:
base_collocate['depend']['all']

['69', '9', '14', '16', '10', '11', '0', '3', '2', '2', '2']

## C1 Condition
C1 helps eliminate the collocates that are not frequent enough. This condition specifies that the frequency of appearance of $W_{i}$ in the neighborhood of $W$ must be at least one standard deviation above the average.

The formula is here:

$$strength = \frac{freq - \bar{f}}{\sigma} \geq k_{0} = 1$$

where $freq$ is the frequency of certain collocate, (e.g., 2573 for "on") and 

$\bar{f}$ is the average frequencies of all collocates and 

${\sigma}$ is the standard deviation of frequencies of all collocates.

<font color="red">**[ TODO ]**</font> Please follow the condition to filter the skipgrams of "depend" and keep some which pass the condition.

The ouput sholud have `collocate` with its `strength`.

In [5]:
def C1_filter(base_word: str, base_collocate: dict):
    
    sigma = statistics.stdev([ int(f[0]) for f in (base_collocate[base_word][c] for c in base_collocate[base_word]) ])
    avg = statistics.mean([ int(f[0]) for f in (base_collocate[base_word][c] for c in base_collocate[base_word]) ])
    
    filtered_collocate = []
    
    for c in base_collocate[base_word]:
        
        #根據公式實作
        strength = ((float(base_collocate[base_word][c][0]) - avg) / sigma)
        
        if strength >= 1:
            filtered_collocate.append([c, strength])
            
    return filtered_collocate

filtered_by_C1 = C1_filter('depend', base_collocate)
for l in filtered_by_C1:
    print('{} {{strength:{:.3f}}}'.format(l[0], l[1]))

a {strength:6.380}
all {strength:1.150}
also {strength:1.132}
an {strength:1.367}
and {strength:15.181}
are {strength:1.962}
as {strength:2.395}
but {strength:1.529}
by {strength:1.042}
can {strength:1.421}
do {strength:1.655}
does {strength:5.298}
for {strength:4.685}
formula {strength:1.565}
in {strength:5.875}
is {strength:2.611}
it {strength:2.287}
its {strength:1.818}
may {strength:2.864}
not {strength:8.436}
of {strength:23.459}
on {strength:46.308}
only {strength:1.295}
or {strength:2.485}
other {strength:1.655}
properties {strength:1.042}
s {strength:2.160}
some {strength:1.187}
such {strength:1.439}
that {strength:7.246}
the {strength:44.703}
their {strength:2.828}
these {strength:1.944}
they {strength:2.233}
this {strength:1.908}
to {strength:8.418}
type {strength:1.295}
upon {strength:4.902}
which {strength:4.379}
will {strength:3.783}
would {strength:1.601}


<font color="green">Expected output: </font> (The order isn't important.)

> a {'strength': 6.381}   
> all {'strength': 1.151}   
> also {'strength': 1.133}   
> an {'strength': 1.367}   
> and {'strength': 15.183}   
> are {'strength': 1.962}   
> as {'strength': 2.395}   
> but {'strength': 1.529}   
> by {'strength': 1.042}   
> can {'strength': 1.421}   
> do {'strength': 1.656}   
> does {'strength': 5.299}   
> for {'strength': 4.686}   
> formula {'strength': 1.565}   
> in {'strength': 5.876}   
> is {'strength': 2.611}   
> it {'strength': 2.287}   
> its {'strength': 1.818}   
> may {'strength': 2.864}   
> not {'strength': 8.437}   
> of {'strength': 23.461}   
> on {'strength': 46.313}   
> only {'strength': 1.295}   
> or {'strength': 2.485}   
> other {'strength': 1.656}   
> properties {'strength': 1.042}   
> s {'strength': 2.161}   
> some {'strength': 1.187}   
> such {'strength': 1.439}   
> that {'strength': 7.247}   
> the {'strength': 44.707}   
> their {'strength': 2.828}   
> these {'strength': 1.944}   
> they {'strength': 2.233}   
> this {'strength': 1.908}   
> to {'strength': 8.419}   
> type {'strength': 1.295}   
> upon {'strength': 4.902}   
> which {'strength': 4.379}   
> will {'strength': 3.784}   
> would {'strength': 1.601}   

## C2 Condition
C2 requires that the histogram of the 10 relative frequencies of appearance of $W_i$ within five words of $W$ (or $p^j_i$s) have at least one spike. If the histogram is flat, it will be rejected by this condition.

The formula is here:

$$spread = \frac{\Sigma^{10}_{j=1}(p^j_i - \bar{p_i})^2}{10} \geq U_{0} = 10$$

where $p^j_i$ is the frequency of certain collocate with a distance of *j*, (e.g., 16 for "on" when its distance is -5) and 

$\bar{p_i}$ is the average frequencies of "on" with any distance 

<font color="red">**[ TODO ]**</font> Please follow C2 to filter the result of C1 and keep some which pass C2.

The ouput sholud have `collocate` with `strength` and `spread`.

In [6]:
def C2_filter(base_word: str, filtered_by_C1: list, base_collocate: dict):
    
    collocate_spread = []
    
    #根據公式實作
    for fc in filtered_by_C1:
        avg = statistics.mean([int(f) for f in base_collocate[base_word][fc[0]][1:]])
        spread = sum(pow((int(f) - avg), 2) for f in base_collocate[base_word][fc[0]][1:]) / 10
        collocate_spread.append([fc[0], fc[1], spread])
        
    return collocate_spread

In [7]:
filtered_by_C2 = C2_filter(base_word, filtered_by_C1, base_collocate)
for l in filtered_by_C2:
    print('{} {{strength:{:.3f}, spread:{:.2f}}}'.format(l[0], l[1], l[2]))

a {strength:6.380, spread:777.29}
all {strength:1.150, spread:29.89}
also {strength:1.132, spread:208.96}
an {strength:1.367, spread:56.29}
and {strength:15.181, spread:2170.41}
are {strength:1.962, spread:98.84}
as {strength:2.395, spread:104.96}
but {strength:1.529, spread:24.40}
by {strength:1.042, spread:26.21}
can {strength:1.421, spread:208.24}
do {strength:1.655, spread:410.21}
does {strength:5.298, spread:6477.09}
for {strength:4.685, spread:376.65}
formula {strength:1.565, spread:46.16}
in {strength:5.875, spread:396.09}
is {strength:2.611, spread:148.20}
it {strength:2.287, spread:112.76}
its {strength:1.818, spread:94.24}
may {strength:2.864, spread:1352.24}
not {strength:8.436, spread:12938.41}
of {strength:23.459, spread:20132.64}
on {strength:46.308, spread:420371.01}
only {strength:1.295, spread:134.01}
or {strength:2.485, spread:85.61}
other {strength:1.655, spread:31.61}
properties {strength:1.042, spread:30.21}
s {strength:2.160, spread:125.85}
some {strength:1.187, s

<font color="green">Expected output: </font> (The order isn't important.)

> a {'strength': 6.381, 'spread': 777.29}   
> all {'strength': 1.151, 'spread': 29.89}   
> also {'strength': 1.133, 'spread': 208.96}   
> an {'strength': 1.367, 'spread': 56.29}   
> and {'strength': 15.183, 'spread': 2170.41}   
> are {'strength': 1.962, 'spread': 98.84}   
> as {'strength': 2.395, 'spread': 104.96}   
> but {'strength': 1.529, 'spread': 24.4}   
> by {'strength': 1.042, 'spread': 26.21}   
> can {'strength': 1.421, 'spread': 208.24}   
> do {'strength': 1.656, 'spread': 410.21}   
> does {'strength': 5.299, 'spread': 6477.09}   
> for {'strength': 4.686, 'spread': 376.65}   
> formula {'strength': 1.565, 'spread': 46.16}   
> in {'strength': 5.876, 'spread': 396.09}   
> is {'strength': 2.611, 'spread': 148.2}   
> it {'strength': 2.287, 'spread': 112.76}   
> its {'strength': 1.818, 'spread': 94.24}   
> may {'strength': 2.864, 'spread': 1352.24}   
> not {'strength': 8.437, 'spread': 12938.41}   
> of {'strength': 23.461, 'spread': 20132.64}   
> on {'strength': 46.313, 'spread': 420371.01}   
> only {'strength': 1.295, 'spread': 134.01}   
> or {'strength': 2.485, 'spread': 85.61}   
> other {'strength': 1.656, 'spread': 31.61}   
> properties {'strength': 1.042, 'spread': 30.21}   
> s {'strength': 2.161, 'spread': 125.85}   
> some {'strength': 1.187, 'spread': 15.29}   
> such {'strength': 1.439, 'spread': 27.45}   
> that {'strength': 7.247, 'spread': 1492.61}   
> the {'strength': 44.707, 'spread': 98586.04}   
> their {'strength': 2.828, 'spread': 209.56}   
> these {'strength': 1.944, 'spread': 180.01}   
> they {'strength': 2.233, 'spread': 316.09}   
> this {'strength': 1.908, 'spread': 71.09}   
> to {'strength': 8.419, 'spread': 3941.16}   
> type {'strength': 1.295, 'spread': 213.41}   
> upon {'strength': 4.902, 'spread': 4984.01}   
> which {'strength': 4.379, 'spread': 346.16}   
> will {'strength': 3.784, 'spread': 2250.05}   
> would {'strength': 1.601, 'spread': 412.44}   

## C3 Condition
C3 keeps the interesting collocates by pulling out the peaks of the $p^j_i$ distributions.

Formula:

$$p^j_i \geq \bar{p_i} + (k_1 \times \sqrt{U_{i}})$$

where $U_i$ is *spread* in C2 and

$k_1$ is equal to 1 

<font color="red">**[ TODO ]**</font> Please follow the condition to filter the result of last step and keep some which pass C3.

The ouput sholud have `base word, collocate, distance, strength, spread, peak, count`.

In [8]:
def C3_filter(base_word, filtered_by_C2, base_collocate):
    
    collocate_peak = []
    
    for fc in filtered_by_C2:
        
        #根據公式實作
        avg = statistics.mean([int(f) for f in base_collocate[base_word][fc[0]][1:]])
        peak = avg + 1 * pow(fc[2], 0.5)
        
        for i in range(1, len(base_collocate[base_word][fc[0]])):
            if int(base_collocate[base_word][fc[0]][i]) > peak:
                
                if i < 6:
                    distance = i - 6
                else:
                    distance = i - 5
                
                collocate_peak.append([base_word,
                                       fc[0],
                                       distance,
                                       fc[1],
                                       fc[2],
                                       peak,
                                       base_collocate[base_word][fc[0]][i]])
                
    return collocate_peak

In [9]:
filtered_by_C3 = C3_filter(base_word, filtered_by_C2, base_collocate)
for l in filtered_by_C3:
    print('({}, {}, {}) {{ strength:{:.2f}, spread:{:.2f}, peak:{:.2f}, count:{} }}'.format(l[0], l[1], l[2], l[3], l[4], l[5], l[6]))

(depend, a, 2) { strength:6.38, spread:777.29, peak:63.78, count:94 }
(depend, all, -4) { strength:1.15, spread:29.89, peak:12.37, count:14 }
(depend, all, -3) { strength:1.15, spread:29.89, peak:12.37, count:16 }
(depend, also, -1) { strength:1.13, spread:208.96, peak:21.26, count:50 }
(depend, an, 2) { strength:1.37, spread:56.29, peak:15.60, count:24 }
(depend, an, 5) { strength:1.37, spread:56.29, peak:15.60, count:19 }
(depend, and, 4) { strength:15.18, spread:2170.41, peak:131.29, count:149 }
(depend, are, -5) { strength:1.96, spread:98.84, peak:21.34, count:27 }
(depend, are, -4) { strength:1.96, spread:98.84, peak:21.34, count:22 }
(depend, as, 4) { strength:2.39, spread:104.96, peak:24.04, count:30 }
(depend, as, 5) { strength:2.39, spread:104.96, peak:24.04, count:28 }
(depend, but, -2) { strength:1.53, spread:24.40, peak:13.94, count:14 }
(depend, but, 5) { strength:1.53, spread:24.40, peak:13.94, count:15 }
(depend, by, -5) { strength:1.04, spread:26.21, peak:11.42, count:1

<font color="green">Expected output: </font> (The order isn't important.)

> ('depend', 'a', 2) {'strength': 6.381, 'spread': 777.29, 'peak': 63.78, 'count': 94}   
> ('depend', 'all', -4) {'strength': 1.151, 'spread': 29.89, 'peak': 12.367, 'count': 14}   
> ('depend', 'all', -3) {'strength': 1.151, 'spread': 29.89, 'peak': 12.367, 'count': 16}   
> ('depend', 'also', -1) {'strength': 1.133, 'spread': 208.96, 'peak': 21.255, 'count': 50}   
> ('depend', 'an', 2) {'strength': 1.367, 'spread': 56.29, 'peak': 15.603, 'count': 24}   
> ('depend', 'an', 5) {'strength': 1.367, 'spread': 56.29, 'peak': 15.603, 'count': 19}   
> ('depend', 'and', 4) {'strength': 15.183, 'spread': 2170.41, 'peak': 131.288, 'count': 149}   
> ('depend', 'are', -5) {'strength': 1.962, 'spread': 98.84, 'peak': 21.342, 'count': 27}   
> ('depend', 'are', -4) {'strength': 1.962, 'spread': 98.84, 'peak': 21.342, 'count': 22}   
> ('depend', 'as', 4) {'strength': 2.395, 'spread': 104.96, 'peak': 24.045, 'count': 30}   
> ('depend', 'as', 5) {'strength': 2.395, 'spread': 104.96, 'peak': 24.045, 'count': 28}   
> ('depend', 'but', -2) {'strength': 1.529, 'spread': 24.4, 'peak': 13.94, 'count': 14}   
> ('depend', 'but', 5) {'strength': 1.529, 'spread': 24.4, 'peak': 13.94, 'count': 15}   
> ('depend', 'by', -5) {'strength': 1.042, 'spread': 26.21, 'peak': 11.42, 'count': 13}   
> ('depend', 'by', -4) {'strength': 1.042, 'spread': 26.21, 'peak': 11.42, 'count': 12}   
> ('depend', 'by', 4) {'strength': 1.042, 'spread': 26.21, 'peak': 11.42, 'count': 13}   
> ('depend', 'can', -1) {'strength': 1.421, 'spread': 208.24, 'peak': 22.831, 'count': 49}   
> ('depend', 'do', -2) {'strength': 1.656, 'spread': 410.21, 'peak': 29.954, 'count': 70}   
> ('depend', 'does', -2) {'strength': 5.299, 'spread': 6477.09, 'peak': 110.38, 'count': 271}   
> ('depend', 'for', 4) {'strength': 4.686, 'spread': 376.65, 'peak': 45.907, 'count': 69}   
> ('depend', 'formula', -4) {'strength': 1.565, 'spread': 46.16, 'peak': 15.994, 'count': 19}   
> ('depend', 'formula', 2) {'strength': 1.565, 'spread': 46.16, 'peak': 15.994, 'count': 17}   
> ('depend', 'formula', 5) {'strength': 1.565, 'spread': 46.16, 'peak': 15.994, 'count': 19}   
> ('depend', 'in', -5) {'strength': 5.876, 'spread': 396.09, 'peak': 53.002, 'count': 55}   
> ('depend', 'in', 4) {'strength': 5.876, 'spread': 396.09, 'peak': 53.002, 'count': 62}   
> ('depend', 'is', -5) {'strength': 2.611, 'spread': 148.2, 'peak': 27.174, 'count': 37}   
> ('depend', 'is', 5) {'strength': 2.611, 'spread': 148.2, 'peak': 27.174, 'count': 29}   
> ('depend', 'it', -3) {'strength': 2.287, 'spread': 112.76, 'peak': 23.819, 'count': 39}   
> ('depend', 'it', -2) {'strength': 2.287, 'spread': 112.76, 'peak': 23.819, 'count': 24}   
> ('depend', 'its', 2) {'strength': 1.818, 'spread': 94.24, 'peak': 20.308, 'count': 36}   
> ('depend', 'may', -1) {'strength': 2.864, 'spread': 1352.24, 'peak': 53.173, 'count': 126}   
> ('depend', 'not', -1) {'strength': 8.437, 'spread': 12938.41, 'peak': 161.047, 'count': 388}   
> ('depend', 'of', 4) {'strength': 23.461, 'spread': 20132.64, 'peak': 272.49, 'count': 495}   
> ('depend', 'on', 1) {'strength': 46.313, 'spread': 420371.01, 'peak': 905.66, 'count': 2195}   
> ('depend', 'only', 1) {'strength': 1.295, 'spread': 134.01, 'peak': 19.276, 'count': 40}   
> ('depend', 'or', 4) {'strength': 2.485, 'spread': 85.61, 'peak': 23.553, 'count': 29}   
> ('depend', 'or', 5) {'strength': 2.485, 'spread': 85.61, 'peak': 23.553, 'count': 25}   
> ('depend', 'other', 3) {'strength': 1.656, 'spread': 31.61, 'peak': 15.322, 'count': 19}   
> ('depend', 'other', 5) {'strength': 1.656, 'spread': 31.61, 'peak': 15.322, 'count': 17}   
> ('depend', 'properties', -4) {'strength': 1.042, 'spread': 30.21, 'peak': 11.796, 'count': 12}   
> ('depend', 'properties', -1) {'strength': 1.042, 'spread': 30.21, 'peak': 11.796, 'count': 15}   
> ('depend', 'properties', 3) {'strength': 1.042, 'spread': 30.21, 'peak': 11.796, 'count': 15}   
> ('depend', 's', 4) {'strength': 2.161, 'spread': 125.85, 'peak': 23.718, 'count': 41}   
> ('depend', 'some', -3) {'strength': 1.187, 'spread': 15.29, 'peak': 11.01, 'count': 13}   
> ('depend', 'some', 2) {'strength': 1.187, 'spread': 15.29, 'peak': 11.01, 'count': 14}   
> ('depend', 'such', 4) {'strength': 1.439, 'spread': 27.45, 'peak': 13.739, 'count': 17}   
> ('depend', 'that', -3) {'strength': 7.247, 'spread': 1492.61, 'peak': 79.334, 'count': 84}   
> ('depend', 'that', -1) {'strength': 7.247, 'spread': 1492.61, 'peak': 79.334, 'count': 132}   
> ('depend', 'the', 2) {'strength': 44.707, 'spread': 98586.04, 'peak': 562.384, 'count': 1140}   
> ('depend', 'their', 2) {'strength': 2.828, 'spread': 209.56, 'peak': 30.676, 'count': 52}   
> ('depend', 'these', -2) {'strength': 1.944, 'spread': 180.01, 'peak': 24.717, 'count': 48}   
> ('depend', 'they', -1) {'strength': 2.233, 'spread': 316.09, 'peak': 30.679, 'count': 63}   
> ('depend', 'this', -4) {'strength': 1.908, 'spread': 71.09, 'peak': 19.531, 'count': 28}   
> ('depend', 'this', -2) {'strength': 1.908, 'spread': 71.09, 'peak': 19.531, 'count': 22}   
> ('depend', 'to', -1) {'strength': 8.419, 'spread': 3941.16, 'peak': 109.979, 'count': 228}   
> ('depend', 'type', 3) {'strength': 1.295, 'spread': 213.41, 'peak': 22.309, 'count': 50}   
> ('depend', 'upon', 1) {'strength': 4.902, 'spread': 4984.01, 'peak': 98.298, 'count': 239}   
> ('depend', 'which', -1) {'strength': 4.379, 'spread': 346.16, 'peak': 43.405, 'count': 66}   
> ('depend', 'will', -1) {'strength': 3.784, 'spread': 2250.05, 'peak': 68.935, 'count': 159}   
> ('depend', 'would', -1) {'strength': 1.601, 'spread': 412.44, 'peak': 29.709, 'count': 70}   

## Strongest Collocation
There are too many collocations to check your result easily. Hence, we want you use the rules below to find out one strongest collocation for "depend".

Rule:
1. find the collocate with maximum **`strength`** value
2. find the collocate with maximum **`count`** value

If there're more than two collocations sharing same maximum `strength` value, please use rule 2 to find one as the answer. Otherwise, you can ignore Rule 2.

<font color="red">**[ TODO ]**</font> Please find out the strongest collocation for "depend" by the rules.

The ouput format sholud be `(base word, collocate, distance)`.

In [10]:
def find_strongest_collocation(base_word, filtered_by_C3):
    
    max_index = 0
    max_strength = filtered_by_C3[max_index][3]
    max_count = filtered_by_C3[max_index][6]
    
    for i in range(0, len(filtered_by_C3)):
        if filtered_by_C3[i][3] > max_strength:
            max_index = i
            max_strength = filtered_by_C3[i][3]
            max_count = filtered_by_C3[i][6]
        elif filtered_by_C3[i][3] == max_strength:
            if filtered_by_C3[i][6] > max_count:
                max_index = i
                max_strength = filtered_by_C3[i][3]
                max_count = filtered_by_C3[i][6]
    
    return [base_word, filtered_by_C3[max_index][1], filtered_by_C3[max_index][2]]

In [11]:
strongest_collocation = find_strongest_collocation(base_word, filtered_by_C3)
print('({}, {}, {})'.format(strongest_collocation[0], strongest_collocation[1], strongest_collocation[2]))

(depend, on, 1)


<font color="green">Expected output: </font>

> ('depend', 'on', 1)

## Find Helpful AKL Collocation
Only one example cannot express how amazing what we just did, so here are some other AKL verbs selected for you to experience. 

<font color="red">**[ TODO ]**</font> Please finish **combination** function to combine last four functions together and use it to find out strongest collocations for **AKL_verbs**. 

The ouput format sholud be `(base word, collocate, distance)`.

In [12]:
AKL_verbs = ['argue', 'can', 'consist', 'contrast', 'favour', 'lack', 'may', 
            'neglect', 'participate', 'present', 'rely', 'suggest']

In [13]:
def combination(base_word: str, base_collocate: dict):
    filtered_by_C1 = C1_filter(base_word, base_collocate)
    filtered_by_C2 = C2_filter(base_word, filtered_by_C1, base_collocate)
    filtered_by_C3 = C3_filter(base_word, filtered_by_C2, base_collocate)
    strongest_collocation = find_strongest_collocation(base_word, filtered_by_C3)
    print('({}, {}, {})'.format(strongest_collocation[0], strongest_collocation[1], strongest_collocation[2]))

In [14]:
for v in AKL_verbs:
    combination(v, base_collocate)

(argue, that, 1)
(can, be, 1)
(consist, of, 1)
(contrast, in, -1)
(favour, of, 1)
(lack, of, 1)
(may, be, 1)
(neglect, of, 1)
(participate, in, 1)
(present, with, -3)
(rely, on, 1)
(suggest, that, 1)


<font color="green">Expected output: </font>

> ('argue', 'that', 1)   
> ('can', 'be', 1)   
> ('consist', 'of', 1)   
> ('contrast', 'in', -1)   
> ('favour', 'of', 1)   
> ('lack', 'of', 1)   
> ('may', 'be', 1)   
> ('neglect', 'of', 1)   
> ('participate', 'in', 1)   
> ('present', 'with', -3)   
> ('rely', 'on', 1)   
> ('suggest', 'that', 1)  

## TA's Notes

If you complete the Assignment, please use [this link](https://docs.google.com/spreadsheets/d/1QGeYl5dsD9sFO9SYg4DIKk-xr-yGjRDOOLKZqCLDv2E/edit#gid=206119035) to reserve demo time.  
The score is only given after TAs review your implementation, so <u>**make sure you make a appointment with a TA before you miss the deadline**</u> .  <br>After demo, please upload your assignment to eeclass. You just need to hand in this ipynb file and rename it as XXXXXXXXX(Your student ID).ipynb.
<br>Note that **late submission will not be allowed**.  

## Reference
[Frank Smadja, Retrieving Collocations from Texts: Xtract, Computational Linguistics, Volume 19, 1993](https://aclanthology.org/J93-1007.pdf)