# Explore VP parsing results
 author: steeve laquitaine
 * Summary:
   * Take a sample of N queries from each intent class and parse VPs
   * calculate parsing performances:
       * parsed / or not
       * intent / or not

 INPUT: queries stored in .csv
 OUTPUT:

In [1]:
import os
from time import time

import pandas as pd

proj_path = "/Users/steeve_laquitaine/desktop/CodeHub/intent/"
os.chdir(proj_path)

from intent.src.intent.nodes import parsing

to_df = pd.DataFrame

# PARAMETERS

In [2]:
prm = dict()
prm["sample"] = 100
prm["mood"] = ["declarative"]
prm["intent_class"] = [
    "card_arrival",
    "card_linking",
    "exchange_rate",
    "card_payment_wrong_exchange_rate",
    "extra_charge_on_statement",
    "pending_cash_withdrawal",
    "fiat_currency_support",
    "card_delivery_estimate",
    "automatic_top_up",
    "card_not_working",
    "exchange_via_app",
]  # take a sample of 10 classes


In [3]:
# set data path
data_path = proj_path + "intent/data/01_raw/banking77/train.csv"
os.chdir(proj_path)

In [4]:
# read data
data = pd.read_csv(data_path)

# PARSING PERFORMANCE

In [5]:
# instantiate parser
tic = time()
predictor = parsing.init_allen_parser()
print(f"(run_parsing_pipe)(Instantiation) took {round(time()-tic,2)} secs")

(Instantiation) took 37.64 secs
(run_parsing_pipe)(Instantiation) took 37.64 secs


In [6]:
# parse (40 sec / 100 samples =~ 1h for dataset)
parsed_data = parsing.run_parsing_pipe(data, predictor, prm, verbose=True)
parsed_data

Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.
(run_parsing_pipe)(Inference) took 0.29 secs
Parsed sample:
(S (NP (PRP I)) (VP (VBP am) (ADVP (RB still)) (VP (VBG waiting) (PP (IN on) (NP (PRP$ my) (NN card))))) (. ?))

Time to completion: 395.3
Time to completion: 533.7
Time to completion: 558.12
Time to completion: 561.7
Time to completion: 579.62
Time to completion: 553.86
Time to completion: 535.48
Time to completion: 536.26
Time to completion: 524.95
Time to completion: 520.12
Time to completion: 510.02
579.88
(run_parsing_pipe) took 580.36 secs



Unnamed: 0,text,category,VP
0,I am still waiting on my card?,card_arrival,am still waiting on my card
1,What can I do if my card still hasn't arrived ...,card_arrival,do if my card still has n't arrived after 2 weeks
2,I have been waiting over a week. Is the card s...,card_arrival,have been waiting over a week . Is the card st...
3,Can I track my card while it is in the process...,card_arrival,track my card while it is in the process of de...
4,"How do I know if I will get my card, or if it ...",card_arrival,"know if I will get my card , or if it is lost"
...,...,...,...
1470,Can this app exchange American and English cur...,exchange_via_app,exchange American and English currency
1471,My plans may change so I may need to change fr...,exchange_via_app,may change so I may need to change from GBP to...
1472,Does the app allow for exchanges between curre...,exchange_via_app,allow for exchanges between currencies
1473,"I may no longer need GBP, but instead I will n...",exchange_via_app,may no longer need GBP


# PARSING PERFORMANCE
 Parsed / or not

 **Fig**. Performance for across intent classes

In [7]:
def get_perf(parsed_data):
    return round(
        100 - parsed_data["VP"].isnull().sum() / len(parsed_data["VP"]) * 100,
        2,
    )


perf_parse_or_not = get_perf(parsed_data)
print(f'The "parse/or not" performances are : {perf_parse_or_not} %')

The "parse/or not" performances are : 100.0 %


 **Fig**. Performance per intent class
 * Only `card_arrival` had non-zero performance (61%)
 * All other intent classes had 0% performance (they always failed to parsed VPs)

In [8]:
df = parsed_data.groupby(by=["category"]).apply(lambda x: get_perf(x))
to_df(df, columns=["Performance (%)"])

Unnamed: 0_level_0,Performance (%)
category,Unnamed: 1_level_1
automatic_top_up,100.0
card_arrival,100.0
card_delivery_estimate,100.0
card_linking,100.0
card_not_working,100.0
card_payment_wrong_exchange_rate,100.0
exchange_rate,100.0
exchange_via_app,100.0
extra_charge_on_statement,100.0
fiat_currency_support,100.0


In [9]:
print("Done")


Done
