# Intent parsing

 author: Steeve Laquitaine

 TABLE OF CONTENTS

 * Packages
 * Paths
 * Parameters
 * Load data
 * Filtering
   * by query complexity
   * by grammatical mood
   * by syntactical similarity
 * Parsing


 Observations:

   * So far the best parameters are:

       SEED            = " VB NP" <br>
       THRES_NUM_SENT  = 1 <br>
       NUM_SENT        = 1 <br>
       THRES_SIM_SCORE = 1 <br>
       FILT_MOOD       = ("ask",) <br>

# PACKAGES

In [1]:
import os
from collections import defaultdict

import pandas as pd
import spacy


In [2]:
proj_path = "/Users/steeve_laquitaine/desktop/CodeHub/intent/"
os.chdir(proj_path)
from intent.src.intent.nodes import features, parsing, preprocess, retrieval, similarity
from intent.src.tests import test_run


# PATHS

In [3]:
cfg_path = (
    proj_path + "intent/data/02_intermediate/cfg_25_02_2021_18_16_42.xlsx"
)
sim_path = proj_path + "intent/data/02_intermediate/sim_matrix.xlsx"

# PARAMETERS

In [4]:
SEED = " VB NP"  # seed for comparison
THRES_NUM_SENT = 1  # keep query with max one sentence
NUM_SENT = 1  # keep query with max one sentence
THRES_SIM_SCORE = 1  # Keep queries syntactically similar to seed
FILT_MOOD = ("ask",)  # ("state", "wish-or-excl", "ask")  # Keep statements

 LOAD DATA

In [5]:
cfg = pd.read_excel(cfg_path)
sim_matx = pd.read_excel(sim_path)
# test
test_run.test_len_similarity_matx(cfg, sim_matx)


# FILTERING

## by query complexity

In [6]:
cfg_cx = preprocess.filter_by_sent_count(cfg, THRES_NUM_SENT, verbose=True)
# cfg_cx = preprocess.filter_n_sent_eq(cfg, NUM_SENT, verbose=True)

There are 100 original queries.
88 after filtering < 1 sentence queries.


## by grammatical mood

In [7]:
cfg_mood = preprocess.filter_in_only_mood(cfg_cx, FILT_MOOD)

In [8]:
tag = parsing.chunk_cfg(cfg_mood["cfg"])

## by syntactical similarity

In [9]:
posting_list = retrieval.create_posting_list(tag)
sim_ranked = similarity.rank_nearest_to_seed(sim_matx, seed=SEED, verbose=True)
ranked = similarity.print_ranked_VPs(cfg_mood, posting_list, sim_ranked)
filtered = similarity.filter_by_similarity(ranked, THRES_SIM_SCORE)
# test [TODO]
test_run.test_rank_nearest_to_seed(sim_matx, seed=SEED)
test_run.test_posting_list(posting_list, sim_matx, seed=SEED)
test_run.test_get_posting_index(cfg_mood, posting_list, sim_ranked)

0 duplicated syntaxes were dropped.
9 querie(s) is(are) left after filtering.


# PARSING

 * Apply dependency parsing to each query
 * Collect intent's action (ROOT) and object (dobj)

In [10]:
intents = parsing.parse_intent(filtered)
intents

[{'intent': ['track'], 'intendeed': ['card', 'me']},
 {'intent': ['track'], 'intendeed': ['card']},
 {'intent': ['check'], 'intendeed': ['delivery']},
 {'intent': ['get'], 'intendeed': ['tracking']},
 {'intent': ['track'], 'intendeed': ['card']},
 {'intent': ['track'], 'intendeed': ['card', 'me']},
 {'intent': ['recieve'], 'intendeed': ['card']},
 {'intent': ['have'], 'intendeed': ['info']},
 {'intent': ['number']}]

In [11]:
## OUTPUT

In [12]:
filtered["intent"] = intents
out = cfg_mood.merge(filtered, left_index=True, right_index=True)
out


Unnamed: 0.1,level_0,Unnamed: 0,index,VP_x,annots,text,cfg,mood_0,mood_1,VP_y,score,intent
0,0,0,26,track the card you sent me,yes,How do I track the card you sent me?,VP -> VB NP,0,ask,track the card you sent me,1.0,"{'intent': ['track'], 'intendeed': ['card', 'm..."
1,2,2,63,track the card you sent to me,yes,Can I track the card you sent to me?,VP -> VB NP,2,ask,track the card you sent to me,1.0,"{'intent': ['track'], 'intendeed': ['card']}"
7,9,9,54,check the delivery of the card you sent,yes,How can I periodically check the delivery of t...,VP -> VB NP,9,ask,check the delivery of the card you sent,1.0,"{'intent': ['check'], 'intendeed': ['delivery']}"
9,11,11,94,get tracking on the card,yes,Could I get tracking on the card?,VP -> VB NP,11,ask,get tracking on the card,1.0,"{'intent': ['get'], 'intendeed': ['tracking']}"
10,12,12,83,track the card that was just sent to me,yes,Can I track the card that was just sent to me?,VP -> VB NP,12,ask,track the card that was just sent to me,1.0,"{'intent': ['track'], 'intendeed': ['card']}"
29,37,37,138,track the card that you sent me in the mail,yes,Can I track the card that you sent me in the m...,VP -> VB NP,34,ask,track the card that you sent me in the mail,1.0,"{'intent': ['track'], 'intendeed': ['card', 'm..."
30,38,38,27,recieve my new card,no,When will I recieve my new card?,VP -> VB NP,35,ask,recieve my new card,1.0,"{'intent': ['recieve'], 'intendeed': ['card']}"
59,81,81,6,have info about the card on delivery,yes,Do you have info about the card on delivery?,VP -> VB NP,73,ask,have info about the card on delivery,1.0,"{'intent': ['have'], 'intendeed': ['info']}"
61,85,85,100,share card tracking number,yes,can you share card tracking number?,VP -> VB NP,75,ask,share card tracking number,1.0,{'intent': ['number']}
