This notebook presents an example of how to use QTMM2012c+ table for extracting information for rules definitions. It also shows how it is possible to print the instances of the transition (TA) that make up the rule.

In [1]:
import numpy as np
import pandas as pd 
import json
import csv
import copy
import random

In [2]:
loc_prop = ['L1', 'L1Reported Speech', 'L2', 'L2Reported Speech', 'L3', 'L3Reported Speech', 'L4', 
            'L4Reported Speech', 'L5', 'L6', 'L7', 'PL1Reported Speech', 'PL2Reported Speech', 
            'PL3Reported Speech', 'PL4Reported Speech', 'P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7',]

argument_flow = ['L1Reported Speech to L2Reported Speech', 'L1Reported Speech to P2', 'L1Reported Speech to P3', 
                 'L2Reported Speech to L1Reported Speech', 'L2Reported Speech to PL1Reported Speech',
                 'L2Reported Speech to P1', 'L2Reported Speech to P2', 'L3Reported Speech to P1', 
                 'L4Reported Speech to P1', 'NOT L1Reported Speech to L1Reported Speech', 
                 'NOT PL1Reported Speech to P1', 'NOT PL2Reported Speech to PL2Reported Speech', 'NOT P1 to P1',
                 'PL1Reported Speech to PL2Reported Speech', 'PL1Reported Speech to P2', 
                 'PL1Reported Speech to P3', 'PL1Reported Speech to P4', 
                 'PL2Reported Speech to NOT PL2Reported Speech',
                 'PL2Reported Speech to PL1Reported Speech', 'PL2Reported Speech to P1',    
                 'PL2Reported Speech to P3', 'PL3Reported Speech to P1',
                 'P1 to L2Reported Speech', 'P1 to PL2Reported Speech', 'P1 to P2', 'P1 to P3', 'P1 to P4',
                 'P1 to P5', 'P2 to L1Reported Speech', 'P2 to Out of TA', 'P2 to PL1Reported Speech', 
                 'P2 to P1', 'P2 to P3', 'P2 to P4', 'P2 to P5', 'P3 to PL1Reported Speech', 
                 'P3 to P1', 'P3 to P2', 'P3 to P4', 'P3 to P5', 'P4 to PL1Reported Speech',  'P4 to P1', 
                 'P4 to P2', 'P4 to P3', 'P4 to P5', 'P5 to PL1Reported Speech', 'P5 to P1', 'P6 to P1', 
                 'P6 to P5', 'P7 to P1', 'P7 to P5', 'Rephrase P1 to P1', 'NOT L1Reported Speech',
                 'NOT PL1Reported Speech', 'NOT PL2Reported Speech', 'NOT P1',  'Out of TA',
                 'To PL1Reported Speech', 'To PL2Reported Speech',  'To PL3Reported Speech', 'To P1', 'To P2',
                 'Rephrase P1', 'TA is nonanchoring']

In [3]:
path2read = ## the path where is stored the file QTMM2012c_plus.csv
QTMM2012c_plus = pd.read_csv(path2read)

In [4]:
## This function is used to remove all the 0 and 'N_A' in the TA 

def CleanTA(Table):
    """
    This function takes as input a pandas dataframe and for each row stores all the information 
    but the one whose values are either 0 or 'N_A'.
    
    The function returns a list that stores all the cleaned TA. Each element in teh list is a 
    python dictionary.
        
    Parameters:
                Table: pandas dataframe
                
    Return:     
                TAs: list
                
    """
    TAs = []  
    
    ## store the TAs ID in the table and use it to extract the TA one by one
    row_ID = list(Table['ID'])
    for rowID in row_ID:
        ta = {}
        TA = Table.query('ID == @rowID')
        
        ## save the TA general inforamtion
        TAID = list(TA['ID'])[0]
        TAIDmap = list(TA['ID map'])[0]
        Dataset = list(TA['Dataset'])[0]
        Reported_Speech = list(TA['Reported Speech'])[0] 
        Speakers = list(TA['Speakers'])[0] 
        TADegree = list(TA['Degree'])[0]
        TAICTA = list(TA['ICTA'])[0]
        TAPropRel = list(TA['PropRel'])[0]
        
        ## store the inforamtion about the locutions (no reported speech)
        for label in loc_prop:
            sentence = list(TA[label])[0]
            if sentence != 'N_A' and label[0] == 'L' and len(label) == 2:
                sent = list(TA[label])[0]
                sentidflow = list(TA[label + ' ID dialogue flow'])[0]
                sentIC = list(TA[label + ' IC'])[0]
                sentrole = list(TA[label + ' Role'])[0]
                sentstance = list(TA[label + ' Stance'])[0]
                sentdataset = list(TA['Dataset'])[0]
                
                ta[label] = [sent, sentidflow, sentIC, sentrole, sentstance, sentdataset]
            
            ## store the inforamtion about the locution that are reported speech
            elif sentence != 'N_A' and label[0] == 'L' and len(label) > 2:
                sent = list(TA[label])[0]
                sentIC = list(TA[label + ' IC'])[0]
                sentrole = list(TA[label + ' Role'])[0]
                sentstance = list(TA[label + ' Stance'])[0]
                sentdataset = list(TA['Dataset'])[0]
                
                ta[label] = [sent, sentIC, sentrole, sentstance, sentdataset]
                
            ## store the inforamtion about the propositions    
            elif sentence != 'N_A' and label[0] == 'P':
                sent = list(TA[label])[0]
                
                ta[label] = [sent]
                
        ## save the inforamtion about the argumentation flow
        argumentation_flow = []
        for argflow in argument_flow:
            arg_flow = list(TA[argflow])[0]
            if arg_flow == 1:
                argumentation_flow.append(argflow)
        
        ## store the TA general inforamtion and the information about argumentation flow
        ta['ID'] = TAID
        ta['ID map'] = TAIDmap
        ta['Dataset'] = Dataset 
        ta['Reported Speech'] = Reported_Speech 
        ta['Speakers'] = Speakers
        ta['Degree'] = TADegree
        ta['ICTA'] = TAICTA
        ta['PropRel'] = TAPropRel
        ta['ArgFlow'] = argumentation_flow
        
        ## store the TA in the output list
        TAs.append(ta)
        
    return(TAs)

In [5]:
def Find_ArgFlow(Table):
    """
    This function takes as input a pandas dataframe and stores all the Find_ArgFlow that are
    different from 0.
    
    The function returns a list that stores all the Find_ArgFlow used in the input dataframe.
        
    Parameters:
                Table: pandas dataframe
                
    Return:     
                Argumentation_Flow: list
                
    """
    row_ID = list(Table['ID'])
    Argumentation_Flow = []
    for rowID in row_ID:
        TA = Table.query('ID == @rowID')
        for argflow in argument_flow:
            arg_flow = list(TA[argflow])[0]
            if arg_flow == 1:
                if argflow not in Argumentation_Flow:
                    Argumentation_Flow.append(argflow)
    return(Argumentation_Flow)

In [6]:
def Find_Locutions(Table):
    """
    This function takes as input a pandas dataframe and stores all the Locutions that are
    different from 'N_A'.
    
    The function returns a list that stores all the Locutions used in the input dataframe.
        
    Parameters:
                Table: pandas dataframe
                
    Return:     
                Locutions: list
                
    """
    row_ID = list(Table['ID'])
    Locutions = []
    for rowID in row_ID:
        TA = Table.query('ID == @rowID')
        for label in loc_prop:
            if label[0] == 'L':
                loc = list(TA[label])[0]
                if loc != 'N_A':
                    if label not in Locutions:
                        Locutions.append(label)
    return(Locutions)

In [7]:
def Add_IC(Locutions):
    """
    This function takes as input a list of locutions label and add IC to each of them. 
    
    The function returns a list that stores 'locutions IC' label  
        
    Parameters:
                Locutions: list
                
    Return:     
                Locutions_IC: list
                
    """
    Locutions_IC = []
    for locution in Locutions:
        Locutions_IC.append(locution + ' IC')
    return(Locutions_IC)    

In [8]:
def Find_Infromation_For_Rules(Table):
    """
    This function takes as input a pandas dataframe and stores the inforamtion for defining a rules from
    the input dataframe.
    
    The function returns a pandas dataframe that stores the infotmation for defining a rules from
    the input dataframe.
        
    Parameters:
                Table: pandas dataframe
                
    Return:     
                Rules: pandas dataframe
                
    """    
    Rule = Table.reset_index()
    Locutions = Find_Locutions(Table)
    Locutions.remove('L1')
    argumentation_flow = Find_ArgFlow(Table)
    IC = Add_IC(Locutions)
    list1 = ['ICTA', 'PropRel', 'Degree'] + IC + argumentation_flow + ['L1 IC']
    list2 = ['ICTA', 'PropRel', 'Degree'] + IC + argumentation_flow 
    Rules = Table[list1].groupby(list2).agg(['count'])
    return(Rules)

In [9]:
def Rule_Instances(Table):
    """
    This function takes as input a pandas dataframe and stores the cleaned TA for each row that 
    makes up the dataframe.   
    
    The function returns a list with the cleaned TA. 
        
    Parameters:
                Table: pandas dataframe
                
    Return:     
                Instances: list
                
    """     
    Instances = []
    row_ID = list(Table['ID'])
    for ID in row_ID: 
        TA_tofind = QTMM2012c_plus.query('`ID` == @ID ')
        clened_TA = CleanTA(TA_tofind)
        Instances.append(clened_TA[0])
    return(Instances)

# Example: rule extraction

Let's suppose we are interestd in defining a rule for the following case:
    
1) Same Speakers (Speakers == "Same")

2) L1 IC is Pure Questioning (L1 IC == "Pure Questioning")

3) the TA Degree is 2 (Degree == "2")

4) we are not taking intoaccount TA with Reported Speech == 0  (Reported Speech == 0 )

To do so we need:

A) Query QTMM2012c_plus to extract the TAs that satisfy the points 1, 2, 3 and 4.    

B) Use the function Find_Information_For_Rules.

C) Print the output of the function Find_Infromation_For_Rules for writing the rule.

In [10]:
#Step A
Rule1 = QTMM2012c_plus.query('`Speakers` == "Same" and `L1 IC` == "Pure Questioning" and `Degree` == "2" and `Reported Speech` == 0  ')

# Step B
Rules =  Find_Infromation_For_Rules(Rule1)

# Step C
Rules

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,L1 IC
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,count
ICTA,PropRel,Degree,L2 IC,P2 to P1,TA is nonanchoring,To P1,Unnamed: 7_level_2
Arguing,Inference,2,Asserting,1,0,0,1
Default Illocuting,Rephrase,2,Asserting,1,0,0,3
Default Illocuting,Rephrase,2,Assertive Questioning,1,0,0,2
Default Illocuting,Rephrase,2,Pure Questioning,1,0,0,3
Restating,N_A,2,Pure Questioning,0,0,1,2
Restating,Rephrase,2,Pure Questioning,1,0,0,1
TA is non-anchoring,N_A,2,Asserting,0,1,0,1
TA is non-anchoring,N_A,2,Assertive Questioning,0,1,0,1
TA is non-anchoring,N_A,2,Pure Questioning,0,1,0,11


# Example: Once a rule is extracted, see the instances of TA that make it up

Once printed the output of the function Find_Information_For_Rules can be printed the instances that made up the rule. For example let supose we want see the instances of:

1) L2 IC is "Asserting" 
2) ICTA is Default Illocuting

to do this we need:

A) Query the dataframe Rule1 defined before (step A previous IN) to extract the TAs that satisfy the points 1, 2.

B) Use the function Rule_Instances 

C) Print each istance in the output of Rule_Instances

In [11]:
see_instances = Rule1.query(' `L2 IC` == "Asserting" and `ICTA` == "Default Illocuting" ')

In [12]:
instances = Rule_Instances(see_instances)
for instance in instances:
    print(instance)
    print()
    

{'L1': ['Claire : what you thought about bankruptcy', 83, 'Pure Questioning', 'Panellist', 'Pro', 'Morality of Money'], 'L2': ['Claire : There are times I think that maybe it’s a good idea, in a free society, to free people if they do get into terrible debt', 84, 'Asserting', 'Panellist', 'Pro', 'Morality of Money'], 'P1': ['bankruptcy is xxx'], 'P2': ['it’s a good idea, in a free society, to free people if they do get into terrible debt'], 'ID': 639, 'ID map': 6134, 'Dataset': 'Morality of Money', 'Reported Speech': 0, 'Speakers': 'Same', 'Degree': 2, 'ICTA': 'Default Illocuting', 'PropRel': 'Rephrase', 'ArgFlow': ['P2 to P1']}

{'L1': ['MP : But do you think always, that poverty is the cause of multiple disadvantage? Or do you think that poverty is the result of multiple disadvantage?', 153, 'Pure Questioning', 'Panellist', 'Pro', 'Problem Families'], 'L2': ['MP : It seems to me what the Government is getting at, is that if child doesn’t go to school, if it’s truanting, it gets into 