# Intent Recognition with Sequential Models and Word2Vec
The goal of this notebook will be to classify intents of sentences. <br>For the purpose of demonstration, we will be using the ATIS (Airline travel information system) dataset. 
This can be accomplished with the following steps:
- Reading the dataset (from iob files) and Understanding the labels
- Encoding the intent labels
- Loading the word2vec model and embedding the words.
- Creating our sequential model (Bi-RNN) with PyTorch
- Splitting the data and training our model
- Testing the model

## Reading the dataset and Understanding labels

In [8]:
import random
from utils import fetch_data

sents,labels,intents = fetch_data('data2/atis.train.w-intent.iob')

def display(n):
    sense = []
    print ("INTENT : ",intents[n])
    for i in range(len(sents[n])):
    #     sense.append({"word_index":word_indices[0][i],"word":words2idx[word_indices[0][i]],"entity_index":name_entities[0][i],"entity":tables2idx[name_entities[0][i]],"label_index":labels[0][i],"label":labels2idx[labels[0][i]]})
        sense.append({"word":sents[n][i],"label":labels[n][i]})
    return pd.DataFrame(sense)

print "Number of sentences :",len(sents)
print "Number of unique intents :",len(set(intents))

Number of sentences : 4978
Number of unique intents : 22


In [18]:
# sents - List of sentences where each sentence is a list of words
# intents - List of labelled intents
display(random.randint(0,len(sents)))

('INTENT : ', 'atis_flight')


Unnamed: 0,label,word
0,O,i
1,O,want
2,O,to
3,O,fly
4,B-fromloc.city_name,dallas
5,O,to
6,B-toloc.city_name,san
7,I-toloc.city_name,francisco
8,O,on
9,B-depart_date.day_name,monday


## Encoding the intent labels

In [29]:
from sklearn import preprocessing
intent_encoder = preprocessing.LabelEncoder()
enc_intents = intent_encoder.fit_transform(intents)
# print enc_labels
pd.DataFrame({"Intents":intents[:5],"Encoded Intents":enc_intents[:5]})

Unnamed: 0,Encoded Intents,Intents
0,12,atis_flight
1,12,atis_flight
2,15,atis_flight_time
3,3,atis_airfare
4,3,atis_airfare


## Loading the word2vec model and embedding the words.

In [36]:
from gensim.models import KeyedVectors
import pandas as pd
import os

MODEL_PATH = '/home/b/Downloads/GoogleNews-vectors-negative300.bin.gz'

if not os.path.exists(MODEL_PATH):
    raise ValueError("SKIP: You need to download the google news model")
    
w2v_model = KeyedVectors.load_word2vec_format(MODEL_PATH, binary=True,limit=2500000)

In [39]:
def embed_sentence(sent):
    return [w2v_model.word_vec(word) for word in sent]

enc_sents = []
exceptions = []
for s in sents:
    try:
        enc_sents.append(embed_sentence(s))
    except KeyError:
        exceptions.append(s)

In [42]:
print len(enc_sents)
print len(exceptions)

546
4432
