# INTENT PARSING

* **Purpose** :
  * Test intent parsing with ALLENLP

 * TABLE OF CONTENT
 * SETUP
   * paths
 * PARAMETERS
 * PARSING
   * Allennlp VP parsing
   * Parsing performance
   * Focus on the class well parsed
 * ANNOTATION
   * Annotate well-formed intent VPs vs. not
 * CFG FEATURES

 To read
 * Berkeley Neural Parser w/ spacy:
   https://spacy.io/universe/project/self-attentive-parser
   https://www.analyticsvidhya.com/blog/2020/07/part-of-speechpos-tagging-dependency-parsing-and-constituency-parsing-in-nlp/

# SETUP

In [1]:
import os
from datetime import datetime
from time import time

import numpy as np
import pandas as pd
import yaml
from nltk.tree import ParentedTree
from pigeon import annotate

proj_path = "/Users/steeve_laquitaine/desktop/CodeHub/intent/"
os.chdir(proj_path)
# in root
from intent.src.intent.nodes import annotation, mood, parsing, preprocess
from intent.src.tests import test_run

# dataframe display
pd.set_option("display.max_colwidth", 100)
pd.set_option("display.max_rows", 1000)

# pd.set_option('display.notebook_repr_html', True)

# to display df w/ nbconvert to pdf
# def _repr_latex_(self):
    # return "\centering{%s}" % self.to_latex()
# pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame

## paths

In [2]:
# load catalog
with open(proj_path+"intent/conf/base/catalog.yml") as file:
    catalog = yaml.load(file)
with open(proj_path+"intent/conf/base/parameters.yml") as file:
    prms = yaml.load(file)
tr_data_path = proj_path + "intent/data/01_raw/banking77/train.csv"
test_data_path = proj_path + "intent/data/01_raw/banking77/test.csv"

In [3]:
# read queries data
tr_data = pd.read_csv(tr_data_path)

In [4]:
sample = preprocess.sample(tr_data)

In [5]:
sample.head(5)

Unnamed: 0,index,text,category
0,26,How do I track the card you sent me?,card_arrival
1,135,I was expecting my new card by now.,card_arrival
2,63,Can I track the card you sent to me?,card_arrival
3,105,What is the expected delivery date of my card?,card_arrival
4,24,where is my new card?,card_arrival


# PARSING

## ALLENLP VP PARSING

In [6]:
al_prdctor = parsing.init_allen_parser()

(Instantiation) took 33.68 secs


In [7]:
tic = time()
out = al_prdctor.predict(sentence=sample['text'].iloc[0])
parsed_txt = out["trees"]
print(f"(Inference) took {round(time()-tic,2)} secs")
print(f"Parsed sample:\n{parsed_txt}")

Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.
(Inference) took 0.36 secs
Parsed sample:
(SBARQ (WHADVP (WRB How)) (SQ (VBP do) (NP (PRP I)) (VP (VB track) (NP (NP (DT the) (NN card)) (SBAR (S (NP (PRP you)) (VP (VBD sent) (NP (PRP me)))))))) (. ?))


In [8]:
tree = ParentedTree.fromstring(parsed_txt)
test_run.test_extract_VP(al_prdctor)

In [9]:
# Speed up (1 hour / 10K queries)
VP_info = parsing.extract_all_VPs(sample, al_prdctor)
test_run.test_extract_all_VPs(VP_info, sample, prms)

Time to completion: 34.08
Time to completion: 34.18
Time to completion: 32.7
Time to completion: 31.84
Time to completion: 29.47
Time to completion: 31.35
Time to completion: 32.14
Time to completion: 34.58
Time to completion: 33.97
Time to completion: 34.18
Time to completion: 34.34
33.56


In [10]:
VPs = parsing.make_VPs_readable(VP_info)

In [11]:
VP_info = parsing.get_CFGs(VP_info)

In [12]:
sample["VP"] = np.asarray(VPs)
sample["cfg"] = np.asarray([VP['cfg'] if not len(VP)==0 else None for VP in VP_info])

 Write parsed data

In [13]:
sample.to_excel(catalog['parsed'])

In [14]:
# verb_p[0].pretty_print()

## PARSING PERFORMANCE

 * **Parser works in 62% of the cases for "card_arrival" and never for other classes**

   * see 2a_eda_parsing.py
   * We will analyse why later.
   * We now focus on the class well parsed: "card_arrival".

## FOCUS ON THE CLASS WELL PARSED

 moods = mood.classify_mood(data["text"])
 moods

### ANNOTATE

 1. Annotate well-formed intent VPs vs. not

In [15]:
# can't be made into a function because of Pigeon "annotate"
if prms['annotation'] == 'do':
    annots = annotate(sample["VP"], options=["yes", "no"])
else:
    annots, myfile, myext = annotation.get_annotation(catalog, prms, sample)

In [16]:
annots_df = annotation.index_annots(prms, sample, annots)

In [17]:
annotation.write_annotation(catalog, prms, annots_df, myfile, myext)



Unnamed: 0,index,VP,annots
0,26,track the card you sent me,yes
1,135,was expecting my new card by now,no
2,63,track the card you sent to me,yes
3,105,is the expected delivery date of my card,no
4,24,is my new card,no
5,7,do if I still have not received my new card,no
6,44,tell me why I have n't received my new card,yes
7,101,have not received my card,no
8,112,have n't received my card in the mail,no
9,54,check the delivery of the card you sent,yes


In [18]:
annots_df['annots'][annots_df['VP'].isnull()] = np.nan
annots_df['text'] = sample['text']
annots_df['cfg'] = sample['cfg']

In [19]:
parsing.write_cfg(annots_df)

 **Fig. Queries are sorted by annotation result below.**

In [20]:
sorted_annots = annots_df.sort_values(by='annots', ascending=False)

In [21]:
sorted_annots

Unnamed: 0,index,VP,annots,text,cfg
0,26,track the card you sent me,yes,How do I track the card you sent me?,VP -> VB NP
17,111,tell me where my card is ? I ordered it 2 weeks ago,yes,Can you please tell me where my card is? I ordered it 2 weeks ago!,VP -> VB NP SBAR . NP VP
75,15,be able to track the card that was sent to me,yes,Will I be able to track the card that was sent to me?,VP -> VB ADJP
71,13,to track the delivery of my card,yes,Is there a way to track the delivery of my card?,VP -> TO VP
78,3,track my card while it is in the process of delivery,yes,Can I track my card while it is in the process of delivery?,VP -> VB NP SBAR
70,48,track when my card will be delivered,yes,Can I track when my card will be delivered?,VP -> VB SBAR
31,126,get my new card,yes,How much longer until I get my new card?,VP -> VBP NP
80,90,to track the new card you sent me,yes,Is there a way to track the new card you sent me?,VP -> TO VP
81,6,have info about the card on delivery,yes,Do you have info about the card on delivery?,VP -> VB NP
43,59,want to find out what happened to my new card,yes,I want to find out what happened to my new card?,VP -> VBP S


 **Fig. Only 16% of intents are well formed.**

In [22]:
n_total = len(sorted_annots)
n_null = sorted_annots['annots'].isnull().sum()
n_yes = sorted_annots['annots'].eq('yes').sum()
n_no = sorted_annots['annots'].eq('no').sum()
stats = pd.DataFrame({
    'annots': ['null', 'yes', 'no','Total'], 
    'count': [n_null, n_yes, n_no, n_total],
    '%': [n_null/n_total*100, n_yes/n_total*100, n_no/n_total*100, 100]
    })
stats

Unnamed: 0,annots,count,%
0,,0,0.0
1,yes,28,28.0
2,no,72,72.0
3,Total,100,100.0



 2. Can we detect intent VPs automatically in task-oriented queries?
   2.1. How do "intent" VPs differ from non-intent "VPs"?
       2.1.1 Candidate hypotheses:
           - sentence mood: declarative vs. interrogative syntax
           - tense: present vs. past ?
           - lexical: some verbs and not others
           - dependency structure: direct object vs. indirect ?

 Observations:
   1.1 Grammar features of the VPs that we labelled as intents:
       - intent -> VB_present + NP
       - intent -> VB_present + VB_present + NP
       - intent -> VB_infinitive + NP
       - intent -> VB_present + clause

   1.2 Grammar features of VPs that we did not label as intents:
       1.2.1. Implicit, intents at the level of semantics/pragmatics:
       - failed intent -> gerund VB | auxiliary VB | past tense VB | interrogative phrase

       1.2.2. Grammar features of intents that we are not exploiting:
       - intent -> need | want + VB_infinitive

### Parse well-formed intent's intent and entities (slot analysis)

 We formalized an intent as follows:

   e.g., : "track my card for me"

   intent -> VB + NP
       entity: PP

 POS terminology
   - SBAR: Subordinate Clause (e.g., after ..)

 CFG FEATURES

In [23]:
sorted_annots.head()

Unnamed: 0,index,VP,annots,text,cfg
0,26,track the card you sent me,yes,How do I track the card you sent me?,VP -> VB NP
17,111,tell me where my card is ? I ordered it 2 weeks ago,yes,Can you please tell me where my card is? I ordered it 2 weeks ago!,VP -> VB NP SBAR . NP VP
75,15,be able to track the card that was sent to me,yes,Will I be able to track the card that was sent to me?,VP -> VB ADJP
71,13,to track the delivery of my card,yes,Is there a way to track the delivery of my card?,VP -> TO VP
78,3,track my card while it is in the process of delivery,yes,Can I track my card while it is in the process of delivery?,VP -> VB NP SBAR


In [24]:
# jupyter nbconvert --no-input --to=pdf 2_Intent_parsing.ipynb
