Created on March 1st 2021 by Patrick Rotzetter
Last Update on feb 28th 2021

protzetter@bluewin.ch
https://www.linkedin.com/in/rotzetter/

**Short experiment for account opening**




In [1]:
# Import libraries
import spacy
from spacy.pipeline import EntityRuler
from spacy.matcher import Matcher,PhraseMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN, root, xcomp
from spacy import displacy
from spacy.matcher import Matcher
from pathlib import Path
import random  

In [2]:
# validate libraries and models are well installed
!python -m spacy validate

[2K[38;5;2m✔ Loaded compatibility table[0m
[1m
[38;5;4mℹ spaCy installation:
/opt/anaconda3/envs/spacy30/lib/python3.8/site-packages/spacy[0m

NAME              SPACY            VERSION                            
en_core_web_lg    >=3.0.0,<3.1.0   [38;5;2m3.0.0[0m   [38;5;2m✔[0m
en_core_web_sm    >=3.0.0,<3.1.0   [38;5;2m3.0.0[0m   [38;5;2m✔[0m
en_core_web_trf   >=3.0.0,<3.1.0   [38;5;2m3.0.0[0m   [38;5;2m✔[0m



In [3]:
# check python and spacy version
from platform import python_version
print(python_version())
!pip show spacy

3.8.5
Name: spacy
Version: 3.0.1
Summary: Industrial-strength Natural Language Processing (NLP) in Python
Home-page: https://spacy.io
Author: Explosion
Author-email: contact@explosion.ai
License: MIT
Location: /opt/anaconda3/envs/spacy30/lib/python3.8/site-packages
Requires: catalogue, jinja2, spacy-legacy, numpy, preshed, pathy, blis, typer, requests, wasabi, srsly, tqdm, murmurhash, cymem, pydantic, packaging, thinc, setuptools
Required-by: texthero, en-core-web-trf, en-core-web-sm, en-core-web-lg


In [4]:
# load spacy transformer Roberta based model
from spacy.lang.en import English
import en_core_web_trf
nlp = en_core_web_trf.load()

In [5]:
# read input message for our banking example

text = open('banking dialog.txt').read().replace('\n', ' ')

print(text)


Hello,  I am interested to open a trading account for trading purpose with an initial investment of 100k USD  Thanks a lot  Patrick


In [6]:
#process the message trough standard spacy pipeline

doc=nlp(text)


In [7]:
# print text and labels entities detected
for ent in doc.ents :
    print(ent.text, ent.label_,)


100k USD MONEY
Patrick PERSON


In [8]:
#Let us visualize the result directly in the text
displacy.render(doc, style='ent', minify=True)

In this case, the model has detected a person and a money entity 'USD'. This is not enough to start our dialog. Let us see how to improve this.

In [9]:
# add domain specific entities and add to the pipeline
patterns = [{"label": "ACCOUNT", "pattern":  [{"lower": "trading"},{"lower": "account"}]},
            {"label": "INVEST", "pattern":  [{"lower": "investment"}]}]


config = {
   "phrase_matcher_attr": None,
   "validate": True,
   "overwrite_ents": True,
   "ent_id_sep": "||",
}
ruler=nlp.add_pipe('entity_ruler',config=config)


In [10]:
ruler.add_patterns(patterns)

In [11]:
#process the mail again with added entities
doc=nlp(text)
for ents in doc.ents:
    # Print the entity text and its label
    print(ents.text, ents.label_,)



trading account ACCOUNT
investment INVEST
100k USD MONEY
Patrick PERSON


In [12]:
displacy.render(doc, style='ent', minify=True)

Spacy provides all the required tagging to find the action verbs, we want to know if the customer wants to order something or is just interested by some information for example. Let us iterate through all tokens in the text and search for an open clausal complement ( refer to for all possible dependency tags https://spacy.io/api/annotation#pos-tagging )

In [13]:
# Identify action verbs
verbs = set()
for possible_verbs in doc:
    if possible_verbs.head.pos == VERB :
        verbs.add(possible_verbs.head.text)
print(verbs)

{'open'}


In [14]:
# visualize the dependency graph
displacy.render(doc, style="dep", minify=True, jupyter=True)

Let us find possible items in the text using the dependency tag ‘dobj’ for direct objects of a verb.

In [15]:
items = set()
for possible_subject in doc:
    if possible_subject.dep == dobj and possible_subject.head.pos == VERB:
        items.add(possible_subject)
print(items)

{account}





Let us see if we can use spacy to identify intents as well 

In [16]:
# trying to compare intent using vector similarity
import en_core_web_lg
nlp = en_core_web_lg.load()
openaccount=nlp("I want to open an account")
doc=nlp(text)
print(openaccount.similarity(doc))


0.8871653332473544


In [17]:
# trying to compare intent using vector similarity
closeaccount=nlp("I want to close my account")
print(closeaccount.similarity(doc))

0.8630848472207009


In [18]:
# trying to compare intent using vector similarity
testweather=nlp("The weather is beautiful today")
print(testweather.similarity(doc))

0.7164046145215007


In [19]:
# trying to compare intent using vector similarity
testweather=nlp("I want to close the window")
print(testweather.similarity(doc))

0.8178371270666539
