# Rule based approach to detect Actionable items
<br>
<br>

* I will use token matching and Parts of speech tags to match sentences to actionable item patterns
<br>
- I will use Spacy pos tagger and matcher<br> https://spacy.io/usage/rule-based-matching
<br>spacy Installation guide https://spacy.io/usage

## Linguistic rule writting
<br>

refer: https://towardsdatascience.com/linguistic-rule-writing-for-nlp-ml-64d9af824ee8



In [17]:
import pandas as pd
import spacy
from tqdm import tqdm
import re
# i am using the spacy small model
nlp = spacy.load("en_core_web_sm")

- Before starting lets check the pipeline component of spacy

In [2]:

nlp.pipeline
#only need tagger and parser

[('tagger', <spacy.pipeline.pipes.Tagger at 0x240011b7cc8>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x240011b8648>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x240011b8be8>)]

In [3]:
#i dont need the NER, so disabling it
nlp = spacy.load("en_core_web_sm", disable=['ner'])
nlp.pipeline


[('tagger', <spacy.pipeline.pipes.Tagger at 0x2400aba0188>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x2400ab9c288>)]

In [4]:
nlp.pipe_names

['tagger', 'parser']

## Load the sentence data

In [83]:
pd.options.display.max_colwidth = 1500
sentences_df = pd.read_csv("sentence_file.csv")
sentences_df

Unnamed: 0,sentence
0,Here is our forecast
1,Traveling to have a business meeting takes the fun out of the trip.
2,Especially if you have to prepare a presentation.
3,I would suggest holding the business plan meetings here then take a trip without any formal business meetings.
4,I would even try and get some honest opinions on whether a trip is even desired or necessary.
...,...
6627370,"Any review, use, distribution or disclosure by others is strictly prohibited."
6627371,"If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender or reply to Enron Corp. at enron.messaging.administration@enron.com and delete all copies of the message."
6627372,"This e-mail (and any attachments hereto) are not intended to be an offer (or an acceptance) and do not create or evidence a binding and enforceable contract between Enron Corp. (or any of its affiliates) and the intended recipient or any other party, and may not be relied on by anyone as the basis of a contract by estoppel or otherwise."
6627373,Thank you.


### Let me pass some strings and see how it works

In [84]:
example = "I would suggest holding the business plan meetings here then take a trip without any formal business meetings."

doc = nlp(example)
for ele in doc:
    print((ele, ele.pos_, ele.tag_, ele.dep_, ele.head))

(I, 'PRON', 'PRP', 'nsubj', suggest)
(would, 'VERB', 'MD', 'aux', suggest)
(suggest, 'VERB', 'VB', 'ROOT', suggest)
(holding, 'VERB', 'VBG', 'xcomp', suggest)
(the, 'DET', 'DT', 'det', meetings)
(business, 'NOUN', 'NN', 'compound', plan)
(plan, 'NOUN', 'NN', 'compound', meetings)
(meetings, 'NOUN', 'NNS', 'dobj', holding)
(here, 'ADV', 'RB', 'advmod', holding)
(then, 'ADV', 'RB', 'advmod', holding)
(take, 'VERB', 'VB', 'dep', holding)
(a, 'DET', 'DT', 'det', trip)
(trip, 'NOUN', 'NN', 'dobj', take)
(without, 'ADP', 'IN', 'prep', take)
(any, 'DET', 'DT', 'det', meetings)
(formal, 'ADJ', 'JJ', 'amod', meetings)
(business, 'NOUN', 'NN', 'compound', meetings)
(meetings, 'NOUN', 'NNS', 'pobj', without)
(., 'PUNCT', '.', 'punct', suggest)


In [85]:
doc[0].tag_

'PRP'

> The number of sentences is large it will take much time to classify.
limiting the number of sentences

In [11]:
NUMBER_OF_SENTENCES= 50000

def gen_tags(sent_df):
    tags_lst = []
    doc_lst = []
    for ele1 in tqdm(sent_df):
        doc = nlp(ele1)
        doc_lst.append(doc)
        tags = []
        for ele2 in doc:
            tags.append(ele2.tag_)
        tags_lst.append(tags)
    return doc_lst,tags_lst
        
        
lst_doc, lst_tags = gen_tags(sentences_df['sentence'][0:NUMBER_OF_SENTENCES])        

In [35]:
lst_tags

[['RB', 'VBZ', 'PRP$', 'NN'],
 ['VBG',
  'TO',
  'VB',
  'DT',
  'NN',
  'NN',
  'VBZ',
  'DT',
  'NN',
  'IN',
  'IN',
  'DT',
  'NN',
  '.'],
 ['RB', 'IN', 'PRP', 'VBP', 'TO', 'VB', 'DT', 'NN', '.'],
 ['PRP',
  'MD',
  'VB',
  'VBG',
  'DT',
  'NN',
  'NN',
  'NNS',
  'RB',
  'RB',
  'VB',
  'DT',
  'NN',
  'IN',
  'DT',
  'JJ',
  'NN',
  'NNS',
  '.'],
 ['PRP',
  'MD',
  'RB',
  'VB',
  'CC',
  'VB',
  'DT',
  'JJ',
  'NNS',
  'IN',
  'IN',
  'DT',
  'NN',
  'VBZ',
  'RB',
  'VBN',
  'CC',
  'JJ',
  '.'],
 ['RB',
  'RB',
  'IN',
  'DT',
  'NN',
  'NNS',
  ',',
  'PRP',
  'VBP',
  'PRP',
  'MD',
  'VB',
  'RBR',
  'JJ',
  'TO',
  'VB',
  'CC',
  'VB',
  'NNS',
  'IN',
  'DT',
  'JJ',
  'NNS',
  'IN',
  'WP',
  'VBZ',
  'VBG',
  'CC',
  'WP',
  'VBZ',
  'RB',
  '.'],
 ['RB',
  'RB',
  'DT',
  'NN',
  'VBZ',
  'CC',
  'DT',
  'NNS',
  'VBP',
  'JJ',
  'RB',
  'VBG',
  'IN',
  'PRP$',
  'NN',
  '.'],
 ['DT',
  'NNS',
  'MD',
  'VB',
  'JJR',
  'IN',
  'VBN',
  'IN',
  'DT',
  'JJ',
  'N

## Generalization
<br>
some business action terms added along with action phrases

In [9]:
# pattern matching

def action_item_match(sent):
    sent = sent.lower()
    a=0

    action_pattern = ["let's", "let us","let me know","let us know", "we have to", "we have", 
                      "i have to", "please", "we need", "we need to",
                      "can you", "request", "define", "formulate",
                      "drill down", "call", "check","schedule", "could you", "would you",
                      "you should", "we expect", "you have", "we've had",
                      "i expect"]

    for ptrn in action_pattern:
        if ptrn in sent:
            a=1

    return a

### Rules from the observation.

- Sentence starting with 'Modals' like *could,will...*
<br>(MD)
- Sentence starting with 'Verb in base form' like *call,run...*
<br>(VB)
- Sentence has 'adverb'(RB) immediately followed by 'base verb'(VB) *example: big task*
- sentence has 'interjection'(UH) and a 'base verb' or verb in present tense follows example: *ummm you may call*
- sentence has personal nouns like I,he... followed by a base verb or verb in present tense.
- sentence has 'noun' followed by 'base verb' or modal like *Ravi could*
- modal followed by adverb personal pronoun like *the task you should do*
- sentence has determiner and base verb follows
<br>
<br>

refer:https://pythonprogramming.net/chunking-nltk-tutorial/?completed=/part-of-speech-tagging-nltk-tutorial/

<br>

https://linguistics.stackexchange.com/questions/11083/detecting-actions-within-text
<br>


In [8]:
#The above rules as regex

#(RB VB)     # take printouts of the report

#(UH*VB) # hmm collect the file 

#(UH*VBP)

#(PRP VB|VBP) 

#(NN.*VB|MD) 

#(DT*VB)

In [10]:

def action_item_rule(docs,tags1):
    
    a=0

    # sentence starts with base verb or modals like 'could you send me the file by eod'
    if tags1[0] == "VB" or tags1[0] == "MD":
        a = 1
    else:
        tags1 = ' '.join(tags1)
        if re.search("DT.*VB", tags1):
            a = 1
        elif re.search("RB VB", tags1):
            a = 1
        elif re.search("UH.*VB", tags1):
            a = 1
        elif re.search("UH.*VBP", tags1):
            a = 1
        elif re.search("PRP VB|VBP", tags1):
            a = 1
        elif re.search("NN.*VB|MD", tags1):
            a = 1

        # check if sentence contains the word 'please','define' etc
        else:
            a = action_item_match(docs.text)
        
    return a


## Classifying the sentences.

In [90]:
label = []

for i in range(NUMBER_OF_SENTENCES):

    l = action_item_rule(lst_doc[i], lst_tags[i])
    label.append(l)
    

In [101]:
import copy
data_labeled_df = copy.deepcopy(sentences_df.head(NUMBER_OF_SENTENCES))

In [102]:
data_labeled_df['label'] = label

In [105]:
data_labeled_df.to_csv("rule_gen_label.csv", index = False)

In [104]:
data_labeled_df

Unnamed: 0,sentence,label
0,Here is our forecast,0
1,Traveling to have a business meeting takes the fun out of the trip.,0
2,Especially if you have to prepare a presentation.,1
3,I would suggest holding the business plan meetings here then take a trip without any formal business meetings.,0
4,I would even try and get some honest opinions on whether a trip is even desired or necessary.,0
...,...,...
49995,"For power sellers in the West, the current price caps often require utilities to sell their excess power at a loss, Enron, Sierra Pacific and other marketers said in statements to the commission.",0
49996,They said the price they sometimes pay for power is higher than the cost of supply from the most-expensive producer in California.,0
49997,"``The determination of the proxy price should not be limited to California generators,'' Enron said in a government filing.",0
49998,It should be ``determined by the least-efficient generator in the entire West.'',0


## test on pre tagged data

In [7]:
test_ = "actions.csv"
actions_df = pd.read_csv(test_, names =['action_sentence'])
actions_df

Unnamed: 0,action_sentence
0,Activate all who work with Transmission or hav...
1,Add more to your score by stopping in and pick...
2,Add O'neal Winfee and George Smith to the atte...
3,"Additionally, send me the payment schedule for..."
4,Adjust our purchase amount from each party bas...
...,...
1245,Write me note about what is going on and what ...
1246,Write verification plans specifications and do...
1247,you have to expand on the maintenance tools.
1248,You have to resolve Enron's ongoing concerns a...


In [12]:
test_NUMBER_OF_SENTENCES= 1250
        
test_doc, test_tags = gen_tags(actions_df['action_sentence'][0:test_NUMBER_OF_SENTENCES])        

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1250/1250 [00:06<00:00, 201.01it/s]


In [18]:
test_label = []

for j in range(test_NUMBER_OF_SENTENCES):

    l_test = action_item_rule(test_doc[j], test_tags[j])
    test_label.append(l_test)

In [19]:
#accuracy on pretagged data
test_label.count(1)/1250

0.9952