# Rule based approach to detect Actionable items
<br>
<br>

* I will use token matching and Parts of speech tags to match sentences to actionable item patterns
<br>
- I will use Spacy pos tagger and matcher<br> https://spacy.io/usage/rule-based-matching
<br>spacy Installation guide https://spacy.io/usage

## Linguistic rule writting
<br>

refer: https://towardsdatascience.com/linguistic-rule-writing-for-nlp-ml-64d9af824ee8



In [1]:
import pandas as pd
import spacy
from tqdm import tqdm
# i am using the spacy small model
nlp = spacy.load("en_core_web_sm")

- Before starting lets check the pipeline component of spacy

In [2]:

nlp.pipeline
#only need tagger and parser

[('tagger', <spacy.pipeline.pipes.Tagger at 0x11395f01248>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x11395ef9888>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x11395ef9e28>)]

In [3]:
#i dont need the NER, so disabling it
nlp = spacy.load("en_core_web_sm", disable=['ner'])
nlp.pipeline


[('tagger', <spacy.pipeline.pipes.Tagger at 0x1139f8ada08>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x1139f8af4c8>)]

In [4]:
nlp.pipe_names

['tagger', 'parser']

## Load the sentence data

In [5]:
pd.options.display.max_colwidth = 1500
sentences_df = pd.read_csv("sentence_file.csv")
sentences_df

Unnamed: 0,sentence
0,Here is our forecast
1,Traveling to have a business meeting takes the fun out of the trip.
2,Especially if you have to prepare a presentation.
3,I would suggest holding the business plan meetings here then take a trip without any formal business meetings.
4,I would even try and get some honest opinions on whether a trip is even desired or necessary.
...,...
6627370,"Any review, use, distribution or disclosure by others is strictly prohibited."
6627371,"If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender or reply to Enron Corp. at enron.messaging.administration@enron.com and delete all copies of the message."
6627372,"This e-mail (and any attachments hereto) are not intended to be an offer (or an acceptance) and do not create or evidence a binding and enforceable contract between Enron Corp. (or any of its affiliates) and the intended recipient or any other party, and may not be relied on by anyone as the basis of a contract by estoppel or otherwise."
6627373,Thank you.


### Let me pass some strings and see how it works

In [7]:
example = "I would suggest holding the business plan meetings here then take a trip without any formal business meetings."

doc = nlp(example)
for ele in doc:
    print((ele, ele.pos_, ele.tag_, ele.dep_, ele.head))

(I, 'PRON', 'PRP', 'nsubj', suggest)
(would, 'VERB', 'MD', 'aux', suggest)
(suggest, 'VERB', 'VB', 'ROOT', suggest)
(holding, 'VERB', 'VBG', 'xcomp', suggest)
(the, 'DET', 'DT', 'det', meetings)
(business, 'NOUN', 'NN', 'compound', plan)
(plan, 'NOUN', 'NN', 'compound', meetings)
(meetings, 'NOUN', 'NNS', 'dobj', holding)
(here, 'ADV', 'RB', 'advmod', holding)
(then, 'ADV', 'RB', 'advmod', holding)
(take, 'VERB', 'VB', 'dep', holding)
(a, 'DET', 'DT', 'det', trip)
(trip, 'NOUN', 'NN', 'dobj', take)
(without, 'ADP', 'IN', 'prep', take)
(any, 'DET', 'DT', 'det', meetings)
(formal, 'ADJ', 'JJ', 'amod', meetings)
(business, 'NOUN', 'NN', 'compound', meetings)
(meetings, 'NOUN', 'NNS', 'pobj', without)
(., 'PUNCT', '.', 'punct', suggest)


In [18]:
doc[0].tag_

'PRP'

In [None]:
def gen_tags(sent_df):
    tags = []
    doc_lst = []
    for ele1 in tqdm(sent_df):
        doc = nlp(ele1)
        doc_lst.append(doc)
        for ele2 in doc:
            tags.append(ele2.tag_)
    return doc_lst,tags
        
        
lst_doc, lst_tags = gen_tags(sentences_df['sentence'])        

 13%|████████▉                                                           | 870916/6627375 [1:27:10<8:37:40, 185.33it/s]

## Generalization
<br>
some business action terms added along with action phrases

In [16]:
# pattern matching

def action_item_match(sent):
    sent = sent.lower()
    a=0

    action_pattern = ["let's", "let us", "we have to", "we have", 
                      "i have to", "please", "we need", "we need to",
                      "can you", "request", "define", "formulate",
                      "drill down", "call", "could you", "would you",
                      "you should", "we expect", "you have", "we've had",
                      "i expect"]

    for ptrn in action_pattern:
        if ptrn in sent:
            a=1

    return a

0

### Rules from the observation.

- Sentence starting with 'Modals' like *could,will...*
<br>(MD)
- Sentence starting with 'Verb in base form' like *call,run...*
<br>(VB)
- Sentence has 'adverb'(RB) immediately followed by 'base verb'(VB) *example: big task*
- sentence has 'interjection'(UH) and a 'base verb' or verb in present tense follows example: *ummm you may call*
- sentence has personal nouns like I,he... followed by a base verb or verb in present tense.
- sentence has 'noun' followed by 'base verb' or modal like *Ravi could*
- modal followed by adverb personal pronoun like *the task you should do*
- sentence has determiner and base verb follows
<br>
<br>

#refer:https://pythonprogramming.net/chunking-nltk-tutorial/?completed=/part-of-speech-tagging-nltk-tutorial/

<br>

https://linguistics.stackexchange.com/questions/11083/detecting-actions-within-text
<br>


In [None]:
#The above rules as regex

#{<RB><VB>}     # take prints

#{<UH><,>*<VB>} # hmm collect the file 

#{<UH><,><VBP>} 

#{<PRP><VB|VBP>} 

#{<NN.?>+<,>*<VB|MD>} 

#{<DT><,>*<VB>}

In [None]:
def tag_pattern(tags):

    patterns = r"""Pattern1: {<DT><,>*<VB>} pattern1: {<RB><VB>} 
    pattern1: {<UH><,>*<VB>} pattern1: {<UH><,><VBP>} 
    pattern1: {<PRP><VB|VBP>}
    pattern1: {<NN.?>+<,>*<VB|MD>}"""

    regex_parse = RegexpParser(patterns)
    return regex_parse.parse(tags)

In [None]:

def action_item_rule(docs,tags):
    
    a=0

    # sentence starts with base verb or modals like 'could you send me the file by eod'
    if tags[0] == "VB" or tags[0] == "MD":
        a = 1
    else:
        chunk = tag_pattern(tags)
        # check if the first chunk of the sentence is in pattern
        if type(chunk[0]) is Tree and chunk[0].label() == "VB-Phrase":
            a = 1

        # check if sentence contains the word 'please'
        a = action_item_match(docs.text)
        
    return a


## Classifying the sentences.

In [None]:
label = []

for i in range(len(sentences_df))

    l = action_item_rule(lst_docs, lst_tags)
    label.append(l)
    


In [None]:
data_label_df = sentences_df['sentence']

data_label_df['label'] = label


In [None]:
data_label_df.to_csv("rule_gen_label.csv", index = False)