## Inferencing

<li>For inferencing, we first need to load the best model weights.</li>
<li>Then on the test email, we perform text cleaning operations.</li>
<li>The text cleaning operation includes lowercasing the text, removing unnecessary punctuations and stopwords.</li>
<li>Finally, we try to find out the named entities along with the defined product class.</li>

In [11]:
import os
import re
import spacy
import gdown
import numpy as np
import pandas as pd
from tqdm import tqdm
import warnings

warnings.filterwarnings('ignore')


In [12]:
def remove_stopwords(text):
    """
    Remove stopwords from the input text.

    Args:
        text (str): The input text from which stopwords will be removed.

    Returns:
        str: The cleaned text with stopwords removed.
    """
    doc = nlp(text)
    cleaned_text = ' '.join(token.text for token in doc if not token.is_stop)
    return cleaned_text

In [13]:
def clean_text(text):
    """
    Clean and preprocess the input text.

    This function tokenizes the input text, converts tokens to lowercase,
    removes punctuation, and ensures that the text only contains letters,
    digits, and whitespace.

    Args:
        text (str): The input text to be cleaned.

    Returns:
        str: The cleaned and preprocessed text.
    """
    doc = nlp(text)
    cleaned_text = ' '.join(token.text.lower() for token in doc if not token.is_punct)
    cleaned_text = re.sub(r'\s+', ' ', re.sub(r'[^a-zA-Z0-9\s]', '', cleaned_text)).strip()
    return cleaned_text

## Loading the best ner model for inference

In [14]:
model_path = "models/model-best"  
full_model_path = os.path.join(os.path.dirname(os.getcwd()), model_path)
nlp = spacy.load(full_model_path)

### Defining Test Email

In [27]:
email = """Subject: Issue with Salesforce CRM Compatibility

Dear Salesforce Support Team,

I trust this email finds you well. I'm a dedicated user of Salesforce CRM, and I've recently encountered an issue regarding its compatibility. When accessing Salesforce through Google Chrome on my Windows 10 machine, the interface doesn't display correctly. This is causing inconvenience in my daily work.

I'm reaching out for assistance in resolving this compatibility issue. Salesforce plays a crucial role in our operations, and I need it to work seamlessly. Your prompt help would be greatly appreciated.

Best regards,
[Your Name]
"""


### Performing Text Cleaning

<li>In text cleaning, we remove punctuations and lowercase all the letters in a sentence.</li>
<li>Additionally, we also remove stopwords from the sentences which are commonly used words in a sentence.</li>

In [28]:
clean_email = clean_text(email)
clean_email = remove_stopwords(clean_email)
print(clean_email)

subject issue salesforce crm compatibility dear salesforce support team trust email finds m dedicated user salesforce crm ve recently encountered issue compatibility accessing salesforce google chrome windows 10 machine interface nt display correctly causing inconvenience daily work m reaching assistance resolving compatibility issue salesforce plays crucial role operations need work seamlessly prompt help greatly appreciated best regards


In [29]:
doc = nlp(clean_email)

### Identifying the named entities and their labels

In [30]:
for ent in doc.ents:
    print(ent.text, ent.label_)


salesforce CRM
salesforce CRM
salesforce crm CRM
salesforce CRM
salesforce CRM


In [31]:
prediction_result = {}
for ent in doc.ents:
    if ent.text not in prediction_result:
        prediction_result[ent.label_] = ent.text
print(prediction_result)

{'CRM': 'salesforce'}


In [43]:
list(prediction_result.keys())[0]
list(prediction_result.values())[0]

'salesforce'

In [32]:
from spacy import displacy

colors = {"CRM": "#F67DE3", "ERP": "#7DF6D9", "APPOINTMENT BOOKING":"#a6e22d"}
options = {"colors": colors}
displacy.render(doc, style="ent", options=options)

