<a href="https://colab.research.google.com/github/rmallela26/EmailQuestionAnswering/blob/main/Email_Question_Answering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Futuristic email search

Instead of searching for a certain email, directly search for what you want to know. Simply ask a question (i.e. What is Bob's number?) instead of searching for the email that the answer is in.

In [52]:
!pip install Transformers
!pip install datasets

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 3108, in _dep_map
    return self.__dep_map
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 2901, in __getattr__
    raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 242, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 

In [109]:
import transformers
from transformers import pipeline

question_answer = pipeline("question-answering", model="deepset/minilm-uncased-squad2")

In [110]:
emails = [
    """Charlie
\nHi Alice,

    This is Charlie, we had a zoom meeting on Saturday. When will I be recieving a follow up for my interview?

    Thanks,
    Charlie """,


    """
      Jared

      Hi Jared,

Could we do it over Zoom or FaceTime at 9 AM on Friday?

Thank you,
Bob """,

    """
      Jared

      My number is 999-999-9999. I will call you around 9.
    """,

    """
      Final transcripts

      Dear seniors and parents,

Congratulations again on completing your journey! We are so happy for all of you, and we look forward to seeing you all during the final festivities.

Your final transcript will be sent to your college on Friday, June 7th once grades are released. We will send the transcript to the college that you committed to according to Scoir. If you were admitted off the waitlist somewhere recently, please update Scoir. We also understand that there might be some waitlist movement after you graduate. In this scenario, if you decide to attend elsewhere, please inform your college counselor directly.

Please give the colleges some time to process the transcripts. If you have questions regarding whether or not the transcript has been received, please contact the college's admission office.

Additionally, all final transcripts will be sent to the NCAA Eligibility Center for registered students.

Please remember to send official test scores from the College Board or ACT should your college require them. You can check your college portal to see if they are required.

Let us know if you have any questions and best of luck moving forward.

The College Counseling Team

""",

    """
    AP ENG LIT

This email is confirming that you are scheduled to take the AP English Literature and Composition exam on Wednesday, May 8 from 8:30 a.m. to 11:50 a.m. in Wallace Hall 803B (D. Lin). Please arrive at your testing location by 8:15 a.m., the exam will begin promptly at 8:30 a.m.. You should bring several no. 2 pencils and a blue or black pen. You will not be allowed to have any bags, backpacks, cell phones, electronics, water, or snacks inside the testing room.

Please follow the posted signs on exam day to direct you to the testing room.

Review the following campus maps for your reference:

""",


]

#Find the keyowrds in a question

###Based on NER and dependency parsing

If there is some entity, we can probably find the target email if we search on it. If no entities, we can find based dependency parsing. The nominal subject, direct object, or preopositional object are probably terms we can search on to find the target email.

###Based on neural net

If we had data, we could create a neural net to identify the keyword in a question.

  What did Bob say? -> 'Bob'

  What is the deadline for my project -> 'deadline'
  
  Who is ASB president -> 'ASB president'

In [111]:
import spacy

nlp = spacy.load("en_core_web_sm")

questions = [
    "What did Charlie say",
    "When is my meeting with Jared",
    "What is Jared's number",
    "when will they send my transcripts out",
    "what time is the LIT exam at"
]

def keyword(doc):
    for ent in doc.ents:
        if ent.label_ in {"PERSON", "ORG", "GPE", "EVENT"}:
            return ent.text

    # If no named entities found, use dependency parsing
    deps = []
    tokens = []
    for token in doc:
        # print(doc)
        if token.dep_ in {"dobj", "pobj", "nsubj"}:
            deps.append(token.dep_)
            tokens.append(token)

    if len(deps) > 1:
        # check the pos
        i = 0
        while i < len(deps):
            # print(tokens)
            if tokens[i].pos_ == 'PRON':
                tokens.pop(i)
                deps.pop(i)
                i -= 1

            i += 1

        if len(deps) > 1:
            # pick the last obj, whether it is dobj or sobj
            for i, dep in reversed(list(enumerate(deps))):
                if dep != "nsubj":
                    return(tokens[i].text)

        else: return tokens[0].text

    else:
        if(len(tokens) != 0): return tokens[0].text

    # otherwise return empty string and say question needs to be more specific
    return ""

for question in questions:
    print(keyword(nlp(question)))

Charlie
Jared
Jared
transcripts
LIT


#Answer the question

First we find the relevant emails based on a direct match search on the keyword. Then we find the answer with the highest probability in the 5 most recent emails

If the score of certainity of the answer is too low, say the question is not specific enough.

In [112]:
for question in questions:
    answers = []

    search_word = keyword(nlp(question))
    if search_word == "":
      print("Question needs to be more specific")
      continue
    #find emails
    for email in emails:
      if search_word in email:
        # only do this for the five most recent emails (questions are likely to be referring to more recent emails)
        answers.append(question_answer(question=question, context = email))

        if len(answers) == 5:
          break

    if len(answers) == 0:
      print("Question needs to be more specific")
      continue

    # print([t for t in answers])
    answer = max(answers, key=lambda x: x['score'])

    if answer['score'] < 0.05:
      print("Question needs to be more specific")
    else: print(answer['answer'])

we had a zoom meeting on Saturday
around 9.
999-999-9999
Your final transcript will be sent to your college on Friday, June 7th
8:30 a.m. to 11:50 a.m.
