# 1. Language Modeling

In this part, let's generate text using a trigram language model.

Go to https://drive.google.com/drive/folders/1pR0koayRSgXfTD72HZUHN14uec0SrnXy?usp=sharing and click add shortcut to drive. This will add the data required for this problem set to your Google drive.

<img src="https://drive.google.com/uc?id=1LqHisiziX8Ri94Xs6Cv8mhx6vivFM3kS" alt="Drawing" height="300"/>


Run the below code snippet. It will generate a URL which generates an authorization code.* Enter it below to give Colab access to your Google drive. 

*Copy function may not work. If so, manually copy the authorization code.

In [1]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


When you run the `ls` command below, you should see these folders.




In [2]:
!ls "/content/drive/My Drive/nl2ds"

semantic-parser  tweets


Import packages.

In [3]:
import json
import string
import nltk
import gensim
import numpy as np
from math import log
from collections import Counter
from nltk.data import find
from sklearn.linear_model import LogisticRegression

nltk.download('word2vec_sample')

[nltk_data] Downloading package word2vec_sample to /root/nltk_data...
[nltk_data]   Package word2vec_sample is already up-to-date!


True

Let's load the trigrams first. You can change the below code as you see fit.

In [4]:
bigram_prefix_to_trigram = {}
bigram_prefix_to_trigram_weights = {}

lines = open("/content/drive/My Drive/nl2ds/tweets/covid-tweets-2020-08-10-2020-08-21.trigrams.txt").readlines()
# lines = open("nl2ds/tweets/covid-tweets-2020-08-10-2020-08-21.trigrams.txt").readlines()
for line in lines:
    word1, word2, word3, count = line.strip().split()
    if (word1, word2) not in bigram_prefix_to_trigram:
        bigram_prefix_to_trigram[(word1, word2)] = []
        bigram_prefix_to_trigram_weights[(word1, word2)] = []
    bigram_prefix_to_trigram[(word1, word2)].append(word3)
    bigram_prefix_to_trigram_weights[(word1, word2)].append(int(count))

# freeup memory
lines = None

## Problem 1.1: Retrieve top next words and their probability given a bigram prefix.

For the following prefixes **word1=middle, word2=of, and n=10**, the output is:



```
a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
...
...
...
```



In [5]:
def top_next_word(word1, word2, n=10):
    # write your code here
    all_word3 = bigram_prefix_to_trigram[(word1, word2)]
    all_word3_weights = bigram_prefix_to_trigram_weights[(word1, word2)]
    
    sum_weights = sum(all_word3_weights)
    
    next_words = []
    probs = []
    
    for i in range(n):
        if i < len(all_word3):
            next_words.append(all_word3[i])
            probs.append(all_word3_weights[i]/sum_weights)
        
    return next_words, probs

next_words, probs = top_next_word("middle", "of", 10)
for word, prob in zip(next_words, probs):
    print(word, prob)

a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
covid 0.009389671361502348
nowhere 0.008450704225352112
it 0.004694835680751174
lockdown 0.002347417840375587
summer 0.002347417840375587


## Problem 1.2: Sampling n words

Sample next n words given a bigram prefix. Use the probablity distribution defined by the frequency counts. Functions like **numpy.random.choice** will be useful here. Sample without repitition, otherwise all your samples will contain the most frequent trigram.


For the following prefixes **word1=middle, word2=of, and n=10**, the output could be as follows (our outputs may differ): 

```
a 0.807981220657277
pandemic 0.023943661971830985
nowhere 0.008450704225352112
the 0.06948356807511737
...
...
...
...
...
```



In [6]:
def sample_next_word(word1, word2, n=10):
    # write your code here
    all_word3 = bigram_prefix_to_trigram[(word1, word2)]
    all_word3_weights = bigram_prefix_to_trigram_weights[(word1, word2)]
    
    sum_weights = sum(all_word3_weights)
        
    next_words = []
    probs = []
    
    if len(all_word3) >= n:
        indices = np.random.choice(len(all_word3), n, replace=False)
    else:
        indices = np.random.choice(len(all_word3), len(all_word3), replace=False)
    
    for i in indices:
        next_words.append(all_word3[i])
        probs.append(all_word3_weights[i]/sum_weights)
        
    return next_words, probs


next_words, probs = sample_next_word("middle", "of", 10)
for word, prob in zip(next_words, probs):
    print(word, prob)

planning 0.00046948356807511736
night 0.00046948356807511736
may 0.0009389671361502347
stage 0.0018779342723004694
their 0.00046948356807511736
tour 0.00046948356807511736
highway 0.00046948356807511736
#covid19 0.0018779342723004694
armageddon 0.00046948356807511736
writing 0.00046948356807511736


## Problem 1.3: Generate sentences starting with a prefix

Generates n-sentences starting with a given sentence prefix. Use [beam search](https://en.wikipedia.org/wiki/Beam_search) to generate multiple sentences. Depending on which method you use to generate next word, you will get different outputs. When you generate <EOS> in a path, stop exploring that path. If you are not careful with your implementation, you may end up in an infinite loop.

If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output is as follows:
```
<BOS1> <BOS2> trump eyes new unproven coronavirus treatment URL <EOS> 0.00021893147502903603
<BOS1> <BOS2> trump eyes new unproven coronavirus cure URL <EOS> 0.0001719607222046247
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL <EOS> 9.773272077557522e-05
...
...
...
```


If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output is as follows:
```
<BOS1> <BOS2> biden calls for a 30 bonus URL #cashgem #cashappfriday #stayathome <EOS> 0.0002495268686322749
<BOS1> <BOS2> biden says all u.s. governors should mandate masks <EOS> 1.6894510541025754e-05
<BOS1> <BOS2> biden says all u.s. governors question cost of a pandemic <EOS> 8.777606198953028e-07
...
...
...
```


If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output may look as follows (since this is sampling, our outputs will difer):

```
<BOS1> <BOS2> trump signs executive orders URL <EOS> 7.150992253427233e-05
<BOS1> <BOS2> trump signs executive actions URL <EOS> 7.117242889600614e-05
<BOS1> <BOS2> trump news president attacked over it <EOS> 1.0546494007903964e-05
<BOS1> <BOS2> trump news president attacked over executive orders URL <EOS> 1.0126405114118984e-05
```

If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output may look as follows:

```
<BOS1> <BOS2> biden harris 2020 <EOS> 0.0015758924114719264
<BOS1> <BOS2> biden harris 2020 URL <EOS> 0.0006443960952032196
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do it URL <EOS> 4.105215709355001e-07
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do our best to stay home <EOS> 1.3158806336098573e-09
...
...
...
...
...
```

Hope you see that sampling gives different outputs compared to deterministically picking the top n-words.


In [7]:
# assume n=10 for sampler
def generate_sentences(prefix, sampler, beam=10):
    # write your code
    n = 10

    # len(sentences) = beam = 10
    # len(all_probs) = beam = 10
    sentences = []
    all_probs = []
    
    trigram = prefix.split()
    word1 = trigram[0]
    word2 = trigram[1]
    word3 = trigram[2]
    
    # add first three words to sentences
    for i in range(beam):
        sentences.append(f'{word1} {word2} {word3}')
        all_probs.append(1)
        
    if word3 == '<EOS>':
        return sentences, all_probs
    
    else:
        # list of n=10 words and list of n=10 probabilities
        next_words_23, probs_23 = sampler(word2, word3, n)

        # beam*n = 10*10 = 100
        sentences_100 = []
        all_probs_100 = []

        # count for populating 100 list
        count = 0
        # get word4
        for i in range(len(next_words_23)):
            word4 = next_words_23[i]
            prob4 = probs_23[i]

            # add to 10 list
            sentences[i] = sentences[i] + ' ' + word4
            all_probs[i] = all_probs[i]*prob4
            
            if word4 != '<EOS>':               
                next_words_34, probs_34 = sampler(word3, word4, n)

                # get word5
                for j in range(len(next_words_34)):
                    word5 = next_words_34[j]
                    prob5 = probs_34[j]

                    # populate 100 list
                    sentences_100.append(sentences[i])
                    all_probs_100.append(all_probs[i])

                    # add to 100 list
                    sentences_100[count] = sentences_100[count] + ' ' + word5
                    all_probs_100[count] = all_probs_100[count]*prob5

                    count += 1

        # top 10 sorted
        top_10 = [(x,y) for y,x in sorted(zip(all_probs,sentences), key=lambda pair: pair[0], reverse=True)]
        top_10 = [(x,y) for (x,y) in top_10 if x[-5:] == '<EOS>']

        # top 100 sorted
        top_100 = [(x,y) for y,x in sorted(zip(all_probs_100,sentences_100), key=lambda pair: pair[0], reverse=True)]

        while (len(top_10) < beam):
            for i in range(beam-len(top_10)):
                if i < len(top_100):
                    top_10.append(top_100[i])

            sentences = [x for (x,y) in top_10]
            all_probs = [y for (x,y) in top_10]

            sentences_100 = []
            all_probs_100 = []

            count = 0
            # if not <EOS>, then find next word
            for i in range(len(sentences)):
                x = sentences[i]
                if x[-5:] != '<EOS>':
                    ngram = x.split()
                    word1 = ngram[-2]
                    word2 = ngram[-1]

                    next_words_12, probs_12 = sampler(word1, word2, n)

                    # get word3
                    for j in range(len(next_words_12)):
                        word3 = next_words_12[j]
                        prob3 = probs_12[j]

                        sentences_100.append(sentences[i])
                        all_probs_100.append(all_probs[i])

                        sentences_100[count] = sentences_100[count] + ' ' + word3
                        all_probs_100[count] = all_probs_100[count]*prob3

                        count += 1

            # top 10 sorted
            top_10 = [(x,y) for y,x in sorted(zip(all_probs,sentences), key=lambda pair: pair[0], reverse=True)]
            top_10 = [(x,y) for (x,y) in top_10 if x[-5:] == '<EOS>']

            # top 100 sorted
            top_100 = [(x,y) for y,x in sorted(zip(all_probs_100,sentences_100), key=lambda pair: pair[0], reverse=True)]

        sentences = [x for (x,y) in top_10]
        all_probs = [y for (x,y) in top_10]

        return sentences, all_probs

In [8]:
sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
    print(sent, prob)
print("--------------------------------------------------")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
    print(sent, prob)
print("--------------------------------------------------")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
    print(sent, prob)
print("--------------------------------------------------")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
    print(sent, prob)
print("--------------------------------------------------")

<BOS1> <BOS2> trump eyes new unproven coronavirus treatment URL <EOS> 0.00021893147502903603
<BOS1> <BOS2> trump eyes new unproven coronavirus cure URL <EOS> 0.0001719607222046247
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL <EOS> 9.773272077557522e-05
<BOS1> <BOS2> trump eyes new unproven coronavirus therapeutic mypillow creator over unproven therapeutic URL <EOS> 8.212549111137046e-05
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by ben carson and mypillow founder URL <EOS> 1.2095697936835552e-05
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL via @USER <EOS> 7.432226908194607e-06
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven and dangerous <EOS> 5.61685494684627e-06
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven and dangerous covid-19 treatment URL <EOS> 5.235550241426875e-06
<

# 2. Semantic Parsing

In this part, you are going to build your own virtual assistant! We will be developing two modules: an intent classifier and a slot filler.

In [9]:
!ls "/content/drive/My Drive/nl2ds/semantic-parser"
parser_files = "/content/drive/My Drive/nl2ds/semantic-parser"
# parser_files = "nl2ds/semantic-parser"

test_answers.txt  test_questions.txt  train_questions_answers.txt


In [10]:
train_data = []
for line in open(f'{parser_files}/train_questions_answers.txt'):
    train_data.append(json.loads(line))

# print a few examples
for i in range(5):
    print(train_data[i])
    print("-"*100)

{'question': 'Add an album to my Sylvia Plath playlist.', 'intent': 'AddToPlaylist', 'slots': {'music_item': 'album', 'playlist_owner': 'my', 'playlist': 'Sylvia Plath'}}
----------------------------------------------------------------------------------------------------
{'question': 'add Diarios de Bicicleta to my la la playlist', 'intent': 'AddToPlaylist', 'slots': {'playlist': 'Diarios de Bicicleta', 'playlist_owner': 'my', 'entity_name': 'la la'}}
----------------------------------------------------------------------------------------------------
{'question': 'book a table at a restaurant in Lucerne Valley that serves chicken nugget', 'intent': 'BookRestaurant', 'slots': {'restaurant_type': 'restaurant', 'city': 'Lucerne Valley', 'served_dish': 'chicken nugget'}}
----------------------------------------------------------------------------------------------------
{'question': 'add iemand als jij to my playlist named In The Name Of Blues', 'intent': 'AddToPlaylist', 'slots': {'entity

In [11]:
test_questions = []
for line in open(f'{parser_files}/test_questions.txt'):
    test_questions.append(json.loads(line))

test_answers = []
for line in open(f'{parser_files}/test_answers.txt'):
    test_answers.append(json.loads(line))

# print a few examples
for i in range(5):
    print(test_questions[i])
    print(test_answers[i])
    print("-"*100)

Add an artist to Jukebox Boogie Rhythm & Blues
{'intent': 'AddToPlaylist', 'slots': {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}}
----------------------------------------------------------------------------------------------------
Will it be rainy at Sunrise in Ramey Saudi Arabia?
{'intent': 'GetWeather', 'slots': {'condition_description': 'rainy', 'timeRange': 'Sunrise', 'city': 'Ramey', 'country': 'Saudi Arabia'}}
----------------------------------------------------------------------------------------------------
Weather in two hours  in Uzbekistan
{'intent': 'GetWeather', 'slots': {'timeRange': 'in two hours', 'country': 'Uzbekistan'}}
----------------------------------------------------------------------------------------------------
Will there be a cloud in VI in 14 minutes ?
{'intent': 'GetWeather', 'slots': {'condition_description': 'cloud', 'state': 'VI', 'timeRange': 'in 14 minutes'}}
--------------------------------------------------------------------

## Problem 2.1: Keyword-based intent classifier

In this part, you will build a keyword-based intent classifier. For each intent, come up with a list of keywords that are important for that intent, and then classify a given question into an intent. If an input question matches multiple intents, pick the best one. If it does not match any keyword, return None.

Caution: You are allowed to look at training questions and answers to come up with a set of keywords, but it is a bad practice to look at test answers. 

In [12]:
# list of all intents
intents = set()
for example in train_data:
    intents.add(example['intent'])
print(intents)

{'BookRestaurant', 'AddToPlaylist', 'GetWeather'}


In [13]:
# 1. Get top 50 words for each intent by frequency
# 2. Manually choose 12 words as keywords
# 3. Predict intent based on number of matches to keyword lists

def get_keywords(train_data, intents, n=50):
    keywords = {intent:{} for intent in intents}
    
    for data in train_data:
        intent = data['intent']
        if intent in keywords:
            question = data['question'].lower()
            question_copy = question
            
            # replace punctuation with space
            for c in question_copy:
                if c in string.punctuation:
                    question = question.replace(c, ' ')
                    
            # split question by space
            question = question.split()
                    
            for word in question:
                if word.isalpha():
                    if word in keywords[intent]:
                        keywords[intent][word] += 1
                    else:
                        keywords[intent][word] = 1
                    
    n_keywords = {intent:[] for intent in intents}
    
    for intent in keywords:
        sorted_intent = dict(sorted(keywords[intent].items(), key=lambda item: item[1], reverse=True))
        
        sorted_keys = list(sorted_intent.keys())
        n_keywords[intent] = sorted_keys[:n]
        
    return n_keywords

# classify a given question into an intent
def predict_intent_using_keywords(question):
    # fill in your code here
    
    # for each intent, come up with a list of keywords that are improtant for that intent
    # look at training questions and answers to come up with a set of keywords
    keywords_p = ['add', 'playlist', 'tune', 'song', 'music', 'track', 'album', 'artist', 'metal', 'rock', 'indie', 'pop']
    keywords_w = ['weather', 'forecast', 'current', 'hot', 'warm', 'cold', 'chill', 'freezing', 'area', 'what', 'tell', 'park']
    keywords_r = ['book', 'restaurant', 'table', 'reservation', 'serve', 'spot', 'rated', 'food', 'bar', 'eat', 'two', 'make']

    n_keywords = {'AddToPlaylist': keywords_p, 'GetWeather': keywords_w, 'BookRestaurant': keywords_r}
    
    count_p = 0
    count_w = 0
    count_r = 0

    for k in keywords_p:
        if k in question:
            count_p += 1
            
    for k in keywords_w:
        if k in question:
            count_w += 1
            
    for k in keywords_r:
        if k in question:
            count_r += 1
    
    all_counts = {'AddToPlaylist': count_p, 'GetWeather': count_w, 'BookRestaurant': count_r}    
    sorted_counts = dict(sorted(all_counts.items(), key=lambda item: item[1], reverse=True))
        
    sorted_keys = list(sorted_counts.keys())
    
    return sorted_keys[0]

In [14]:
'''Gives intent wise accuracy of your model'''
def evaluate_intent_accuracy(prediction_function_name):
    correct = Counter()
    total = Counter()
    for i in range(len(test_questions)):
        q = test_questions[i]
        gold_intent = test_answers[i]['intent']
        if prediction_function_name(q) == gold_intent:
            correct[gold_intent] += 1
        total[gold_intent] += 1
    for intent in intents:
        print(intent, correct[intent]/total[intent], total[intent])
    
# Evaluating the intent classifier. 
# In our implementation, a simple keyword based classifier has achieved an accuracy of greater than 65 for each intent
evaluate_intent_accuracy(predict_intent_using_keywords)

BookRestaurant 0.97 100
AddToPlaylist 1.0 100
GetWeather 0.78 100


## Problem 2.2: Statistical intent classifier

Now, let's build a statistical intent classifier. Instead of making use of keywords like what you did above, you will first extract features from a given input question. In order to build a feature representation for a given sentence, make use of word2vec embeddings of each word and take an average to represent the sentence. Then train a logistic regression. Feel free to use any libraries you like.

In [15]:
word2vec_sample = str(find('models/word2vec_sample/pruned.word2vec.txt'))
word2vec_model = gensim.models.KeyedVectors.load_word2vec_format(word2vec_sample, binary=False)

In [16]:
# extract features from a given input question
# to build a feature representation for a given sentence, make use of word2vec embeddings of each word and
# take an average to represent the sentence
def build_sentence_features(question):    
    question = question.lower()
    question_copy = question

    # replace punctuation with space
    for c in question_copy:
        if c in string.punctuation:
            question = question.replace(c, ' ')

    # split question by space
    question = question.split()
    
    count = 0
    for word in question:
        if word in word2vec_model:
            word_embedding = word2vec_model[word]
            word_embedding = np.array([word_embedding])
        # word embedding filled with zeros
        else:
            word_embedding = np.zeros((1,300))
            
        if count == 0:
            all_embeddings = word_embedding
            count += 1
        else:
            all_embeddings = np.concatenate((all_embeddings, word_embedding), axis=0)
            
    average = np.mean(all_embeddings, axis=0)
            
    return average
                
def get_X(train_data):
    count = 0
    
    for data in train_data:
        question = data['question']
        average = build_sentence_features(question)
        average = np.array([average])
        
        if count == 0:
            X = average
            count += 1
        else:
            X = np.concatenate((X, average), axis=0)
            
    return X

def get_y(train_data):
    count = 0
    
    for data in train_data:
        intent = data['intent']
        
        if count == 0:
            y = np.array([intent])
            count += 1
        else:
            y = np.append(y, [intent])
            
    return y
            
X_train = get_X(train_data)
y_train = get_y(train_data)

In [17]:
'''Trains a logistic regression model on the entire training data. For an input question (x), the model learns to predict an intent (Y).'''
def train_logistic_regression_intent_classifier():
    # fill in your code here
    lr_model = LogisticRegression()
    lr_model.fit(X_train, y_train)
    
    return lr_model

lr_model = train_logistic_regression_intent_classifier()

In [18]:
'''For an input question, the model predicts an intent'''
def predict_intent_using_logistic_regression(question):
    # fill in your code here
    X_test = np.array([build_sentence_features(question)])
    C = lr_model.predict(X_test)
    
    return C

In [19]:
# evaluate the intent classifier
# your intent classifier performance will be close to 100 if you have done a good job
evaluate_intent_accuracy(predict_intent_using_logistic_regression)

BookRestaurant 1.0 100
AddToPlaylist 1.0 100
GetWeather 1.0 100


## Problem 2.3: Slot filling

Build a slot filling model. We will just work with `AddToPlaylist` intent. Ignore other intents.

Hint: No need to rely on machine learning here. You can use ideas like maximum string matching to identify which slots are active and what thier values are. This problem's solution is intentionally left underspecified.

In [20]:
# let's stick to one target intent
target_intent = "AddToPlaylist"

# this intent has the following slots
target_intent_slot_names = set()
for sample in train_data:
    if sample['intent'] == target_intent:
        for slot_name in sample['slots']:
            target_intent_slot_names.add(slot_name)
print(target_intent_slot_names)

# extract all the relevant questions of this target intent from the test examples
target_intent_questions = [] 
for i, question in enumerate(test_questions):
    if test_answers[i]['intent'] == target_intent:
        target_intent_questions.append(question)
print(len(target_intent_questions))

# extract all the relevant questions of this target intent from the train examples
target_intent_data = [] 
for sample in train_data:
    if sample['intent'] == target_intent:
        target_intent_data.append(sample)
print(len(target_intent_data))

{'playlist', 'music_item', 'playlist_owner', 'artist', 'entity_name'}
100
1942


In [21]:
def initialize_slots():
    slots = {}
    for slot_name in target_intent_slot_names:
        slots[slot_name] = None

    for data in target_intent_data:
        for slot_name in data['slots']:
            if slots[slot_name] == None:
                slots[slot_name] = set()
            
            cleaned_value = data['slots'][slot_name].lower()
            slots[slot_name].add(cleaned_value)
            
    for slot_name in slots:
        sorted_values = sorted(slots[slot_name])
        slots[slot_name] = sorted_values

    return slots

def predict_slot_values(question):
    slots = initialize_slots()
    predicted_slots = {}
    for slot_name in target_intent_slot_names:
        # fill in your code to idenfity the slot value
        predicted_slots[slot_name] = None

        for unique_value in slots[slot_name]:
            if unique_value in question.lower():
                predicted_slots[slot_name] = unique_value
            
    return predicted_slots

def evaluate_slot_prediction_recall(slot_prediction_function):
    correct = Counter()
    total = Counter()
    # predict slots for each question
    for i, question in enumerate(target_intent_questions):
        i = test_questions.index(question) 
        gold_slots = test_answers[i]['slots']
        predicted_slots = slot_prediction_function(question)
        for name in target_intent_slot_names:
            if name in gold_slots:
                total[name] += 1.0
                if (predicted_slots.get(name, None) != None) and (predicted_slots.get(name).lower() == gold_slots.get(name).lower()): # This line is updated after the assignment release
                    correct[name] += 1.0
                # else:
                    # print(predicted_slots.get(name), '|', gold_slots.get(name))
    for name in target_intent_slot_names:
        print(f"{name}: {correct[name] / total[name]}")


# Our reference implementation got these numbers. You can ask others on Slack what they got.
# music_item 1.0
# playlist 0.67
# artist  0.021739130434782608
# playlist_owner 0.9444444444444444
# entity_name 0.05555555555555555
print("Slot accuracy for your slot prediction model")
evaluate_slot_prediction_recall(predict_slot_values)


Slot accuracy for your slot prediction model
playlist: 0.81
music_item: 1.0
playlist_owner: 0.9444444444444444
artist: 0.13043478260869565
entity_name: 0.05555555555555555


In [22]:
def tp_fp_tn_fn(slot_prediction_function):
    tp_list = []
    fp_list = []
    tn_list = []
    fn_list = []

    # predict slots for each question
    for i, question in enumerate(target_intent_questions):
        i = test_questions.index(question) 
        gold_slots = test_answers[i]['slots']
        predicted_slots = slot_prediction_function(question)
        for name in target_intent_slot_names:
            if name in gold_slots:
                if (predicted_slots.get(name, None) != None) and (predicted_slots.get(name).lower() == gold_slots.get(name).lower()): # This line is updated after the assignment release
                    tp = {}
                    tp['Question'] = question
                    tp['Ground truth slots'] = gold_slots
                    tp['Predicted'] = {name: predicted_slots.get(name, None)}
                    tp_list.append(tp)
                    # break
                elif (predicted_slots.get(name, None) == None):
                    fn = {}
                    fn['Question'] = question
                    fn['Ground truth slots'] = gold_slots
                    fn['Predicted'] = {name: predicted_slots.get(name, None)}
                    fn_list.append(fn)
                    # break
            else:
                if (predicted_slots.get(name, None) == None):
                    tn = {}
                    tn['Question'] = question
                    tn['Ground truth slots'] = gold_slots
                    tn['Predicted'] = {name: predicted_slots.get(name, None)}
                    tn_list.append(tn)
                    # break
                elif (predicted_slots.get(name, None) != None):
                    fp = {}
                    fp['Question'] = question
                    fp['Ground truth slots'] = gold_slots
                    fp['Predicted'] = {name: predicted_slots.get(name, None)}
                    fp_list.append(fp)
                    # break

    return tp_list, fp_list, tn_list, fn_list

tp_list, fp_list, tn_list, fn_list = tp_fp_tn_fn(predict_slot_values)

In [23]:
# find a true positive prediction for each slot
# fill in your code below along with printing your prediction and gold answer
for i in range(3):
    print('Question:', tp_list[i]['Question'])
    print('Ground truth slots:', tp_list[i]['Ground truth slots'])
    print('Predicted:', tp_list[i]['Predicted'])
    print('--------------------------------------------------')

Question: Add an artist to Jukebox Boogie Rhythm & Blues
Ground truth slots: {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted: {'playlist': 'jukebox boogie rhythm & blues'}
--------------------------------------------------
Question: Add an artist to Jukebox Boogie Rhythm & Blues
Ground truth slots: {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted: {'music_item': 'artist'}
--------------------------------------------------
Question: add nuba to my Metal Party playlist
Ground truth slots: {'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
Predicted: {'playlist_owner': 'my'}
--------------------------------------------------


In [24]:
# find a false positive prediction for each slot
# fill in your code below along with print statement
for i in range(3):
    print('Question:', fp_list[i]['Question'])
    print('Ground truth slots:', fp_list[i]['Ground truth slots'])
    print('Predicted:', fp_list[i]['Predicted'])
    print('--------------------------------------------------')

Question: Can you put this song from Yutaka Ozaki onto my this is miles davis playlist?
Ground truth slots: {'music_item': 'song', 'artist': 'Yutaka Ozaki', 'playlist_owner': 'my', 'playlist': 'this is miles davis'}
Predicted: {'entity_name': 'om'}
--------------------------------------------------
Question: Add porter wagoner to the The Sleep Machine Waterscapes playlist.
Ground truth slots: {'artist': 'porter wagoner', 'playlist': 'The Sleep Machine Waterscapes'}
Predicted: {'entity_name': 'go'}
--------------------------------------------------
Question: Add the chris clark tune to my women of the blues playlist.
Ground truth slots: {'artist': 'chris clark', 'music_item': 'tune', 'playlist_owner': 'my', 'playlist': 'women of the blues'}
Predicted: {'entity_name': 'om'}
--------------------------------------------------


In [25]:
# find a true negative prediction for each slot
# fill in your code below along with a print statement
for i in range(3):
    print('Question:', tn_list[i]['Question'])
    print('Ground truth slots:', tn_list[i]['Ground truth slots'])
    print('Predicted:', tn_list[i]['Predicted'])
    print('--------------------------------------------------')

Question: Add an artist to Jukebox Boogie Rhythm & Blues
Ground truth slots: {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted: {'playlist_owner': None}
--------------------------------------------------
Question: Add an artist to Jukebox Boogie Rhythm & Blues
Ground truth slots: {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted: {'artist': None}
--------------------------------------------------
Question: Add an artist to Jukebox Boogie Rhythm & Blues
Ground truth slots: {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted: {'entity_name': None}
--------------------------------------------------


In [26]:
# find a false negative prediction for each slot
# fill in your code below along with a print statement
for i in range(3):
    print('Question:', fn_list[i]['Question'])
    print('Ground truth slots:', fn_list[i]['Ground truth slots'])
    print('Predicted:', fn_list[i]['Predicted'])
    print('--------------------------------------------------')

Question: add nuba to my Metal Party playlist
Ground truth slots: {'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
Predicted: {'entity_name': None}
--------------------------------------------------
Question: Add the album to the The Sweet Suite playlist.
Ground truth slots: {'music_item': 'album', 'playlist': 'The Sweet Suite'}
Predicted: {'playlist': None}
--------------------------------------------------
Question: Can you put this song from Yutaka Ozaki onto my this is miles davis playlist?
Ground truth slots: {'music_item': 'song', 'artist': 'Yutaka Ozaki', 'playlist_owner': 'my', 'playlist': 'this is miles davis'}
Predicted: {'playlist': None}
--------------------------------------------------
