# Vulnerable Customers

4 key drivers of vulnerability:
1. Health – disabilities or illnesses that affect the ability to carry out day-to-day tasks
2. Life Events – major life events such as bereavement, job loss or relationship breakdown
3. Resilience – low ability to withstand financial or emotional shocks
4. Capability – low knowledge of financial matters or low confidence in managing money (financial capability) and low capability in other relevant areas such as literacy, or digital skills

Which we can normalise into distinct topics, e.g.:
1. Death
2. Redundancy
3. Furlough
etc...

Potential approaches:
1. Bag-of-Words: will need enough training data for us to come to some sensible features. This will essentially be a goal-seeking exercise because sensible features will need to include synonyms of the topic at hand.
2. Similarity measure: use a WordNet based similarity measure to monitor stream of text for mention of words close in meaning to these topics, ie. synonyms.

Let's start with the simpler option: approach 2, as this requires no training data.

# NLTK semantic similarity

In [1]:
# import functions to be used, this will also import Pandas and NumPy
import synonym as syn

In [2]:
# functions available in the synonym module
dir(syn)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'multi_topic_scorer',
 'np',
 'pd',
 'prep_phrase',
 'stopwords',
 'topic_scorer',
 'word_scorer',
 'word_tokenize',
 'wordnet']

## Usage example

Use word_scorer on a single word to uncover it's different usages, to help with assigning the correct one in the topic_dictionary later, e.g.:

In [4]:
syn.word_scorer('disabled')

[('disabled.n.01',
  'people collectively who are crippled or otherwise physically handicapped'),
 ('disable.v.01', 'make unable to perform a certain action'),
 ('disable.v.02', 'injure permanently'),
 ('disabled.s.01',
  'incapable of functioning as a consequence of injury or illness')]

Use this to set up the topic_dictionary:

In [5]:
topic_dictionary = {'disability': 'disabled.n.01',
                    'death': 'die.v.02',
                    'health problems': 'ill.a.01',
                    'being a carer': 'care.v.02',
                    'living alone': 'alone.s.01',
                    'job loss (fired)': 'discharged.s.01',
                    'job loss (redundancy)': 'redundancy.n.02',
                    'job loss (furlough)': 'furlough.v.01'}

The phrase we want to analyse:

In [6]:
phrase = 'I worked in Adult Learning for a county council and my job was paid for by government funding. When the funds ran out, halfway through the financial year, there was suddenly no money to pay my wages. I was called to a meeting with four others (whose jobs were also dependant on the funding) and the news was broken to us. There would be a 12 week consultation period before the redundancy became final. We could choose to take the redundancy payment and go, or we could take another job in a different part of the council. If we refused the job offered to us, we could lose the redundancy settlement.'

Score the phrase for it's similarity to each of the topics.
The multi_topic_scorer will iterate through each word in the phrase and compare it to each topic to see if there's a match.
A match is defined as wup_similarity between that word and the topic > sim_thresh
We can also return_hits, to return those words in the phrase that matched with that topic

In [3]:
syn.multi_topic_scorer(phrase, topic_dictionary, sim_thresh=0.7, return_hits=True)

{'disability': (0, []),
 'death': (0, []),
 'health problems': (0, []),
 'being a carer': (0, []),
 'living alone': (0, []),
 'job loss (fired)': (0, []),
 'job loss (redundancy)': (3, ['redundancy', 'redundancy', 'redundancy']),
 'job loss (furlough)': (0, [])}

We can also use the word_scorer to check some of these results, e.g. if there was a particular word that was identified as similar to a topic, we can run through all the usages of that word and see which one met the sim_thresh criteria.
In this case we can check which usage of redundancy matched with redundancy, although this is a rather redundant use of the function!

In [8]:
syn.word_scorer('redundancy', 'redundancy.n.02', with_similarity_score=True)

[('redundancy.n.01',
  'repetition of messages to reduce the probability of errors in transmission',
  0.25),
 ('redundancy.n.02', 'the attribute of being superfluous and unneeded', 1.0),
 ('redundancy.n.03',
  '(electronics) a system design that duplicates components to provide alternatives in case one component fails',
  0.2222222222222222),
 ('redundancy.n.04', 'repetition of an act needlessly', 0.2222222222222222)]