actionItemDetection

part1.py

Read csv file into dataframe(read message data only)
Extract mail content only since we don't need any mail details
There is raw data in mail content other than simple sentences. Filter this raw data.
Using Nltk sentence tokenizer, split the email data paragraphs into sentences. Preprocessing: a.Remove the sentences which contains links, different symbols(like ~~~, ----, *****, >>, ==). b.Also remove sentences with has less than 3 words and greater than 25 words. Before preprocessing sentence count was 6627371 where as after preprocessing count is 3672744
Save this preprocessed sentences into a file.

Rules for actionable sentences:

Sentence start with VB(go,do,make)
Sentence start with VB_Phrase Examples of verb phrases: VB-Phrase: {} (carefully drive) VB-Phrase: {<,>} (Bah ! go get) some work VB-Phrase: {<,>} (Great ! have fun)
VB-Phrase: {<NN.?>+<,>} (Virat, please mail) me the docs VB-Phrase: {
<,>*} (Just carefully listen) VB-Phrase: {} (you stop) this
sentence starts with "please"
sentence containing "please"

part2.py

Read data from a file which is created in part1.py
Classify sentences into true value(actionable) and false value(non-actionable) sentences according to rules described above.
For time being only 50000 sentences for each class has been classified.
Classified sentence data saved into two different files for true value(actionable) and false value(non-actionable) sentences.

part3.py Following is the flow for objective 2:

Summary of the built model...

Layer (type) Output Shape Param #

embedding_3 (Embedding) (None, 30, 50) 1673150

gru_3 (GRU) (None, 32) 7968

Total params: 1,681,151 Trainable params: 1,681,151 Non-trainable params: 0

None Train...

(98486, 30) (98486,) Train on 98486 samples, validate on 1520 samples Epoch 1/25

182s - loss: 0.3617 - acc: 0.7993 - val_loss: 0.7880 - val_acc: 0.7750 Epoch 2/25
170s - loss: 0.0851 - acc: 0.9762 - val_loss: 0.9968 - val_acc: 0.7743

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
part1.py		part1.py
part2.py		part2.py
part3.py		part3.py