**Classification of customer complaints using tensorflow - Text Classification with Word Embeddings**

In this post, I show how to classify consumer complaints text into these categories: Debt collection, Consumer Loan, Mortgage, Credit card, Credit reporting, Student loan, Bank account or service, Payday loan, Money transfers, Other financial service, Prepaid card.

This kind of model will be very useful for a customer service department that wants to classify the complaints they receive from their customers. The classification of the issues they have received into buckets will help the department to provide customized solutions to the customers in each group.

This model can also be expanded into a system, that can recommend automatic solutions to future complaints as they come in. In the past, performing these kinds of tasks were done manually by multiple employees and of course, take a long time to accomplish, delaying swift response to the complaints received.

Machine learning and AI are here to solve this caliber of problems. Imagine you can classify new complaints with 95% accuracy and route them to the right team to resolve the issue. That will be a win and time saving to any business. Your customers will be happy because the right expert from your business will talk to your customers in trying to resolving their complaints. This will translate into lowering churning rate which means more revenue.

I trained a text classifier with 66,806 of data on customers that have made a complaint to consumer financial protection bureau - CFPB about US financial institutions on the services they have rendered to these consumers. The dataset is on kaggle.com at this link https://www.kaggle.com/cfpb/us-consumer-finance-complaints.

I used the universal-sentence-encoder-large/3 module on the new tensorflowhub platform to leverage the power of transfer learning which according to Wikipedia, is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. Google and other teams have made available on tensorflowhub, models that took them about 62,000 GPU hours to train for our free use.

This dataset is relatively not large but this kind of machine learning process requires more compute power so I chose to use Google’s colab, which gives the option to train a model with free GPU. I have a previous blog post on downloading Kaggle datasets into Google Colab on my [website](https://opokualbert.com/post.html), you may want to check it out if you are interested in downloading this dataset to follow along with this demo.

I will walk through the steps and in the end, we will classify new complaints and see how the model performed.

In [0]:
!pip install -U -q kaggle
!mkdir -p ~/.kaggle

In [0]:
from google.colab import files
files.upload()

In [0]:
!cp kaggle.json ~/.kaggle/

In [0]:
!kaggle datasets download -d cfpb/us-consumer-finance-complaints

In [0]:
!ls

In [0]:
import pandas as pd

In [0]:
from zipfile import ZipFile

zip_file = ZipFile('/content/us-consumer-finance-complaints.zip')

In [0]:

fields= ['product','consumer_complaint_narrative'] 
data=pd.read_csv(zip_file.open('consumer_complaints.csv'), usecols=fields)

In [10]:
data.head()

Unnamed: 0,product,consumer_complaint_narrative
0,Mortgage,
1,Mortgage,
2,Credit reporting,
3,Student loan,
4,Debt collection,


In [11]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import json
import pickle
import urllib

from sklearn.preprocessing import LabelBinarizer

print(tf.__version__)


1.10.1


In [12]:
data = data[pd.notnull(data['consumer_complaint_narrative'])]
data.head()

Unnamed: 0,product,consumer_complaint_narrative
190126,Debt collection,XXXX has claimed I owe them {$27.00} for XXXX ...
190135,Consumer Loan,Due to inconsistencies in the amount owed that...
190155,Mortgage,In XX/XX/XXXX my wages that I earned at my job...
190207,Mortgage,I have an open and current mortgage with Chase...
190208,Mortgage,XXXX was submitted XX/XX/XXXX. At the time I s...


In [13]:
pd.set_option('max_colwidth', 1000)
data['consumer_complaint_narrative'] = data['consumer_complaint_narrative'].str.lower()
data.head()

Unnamed: 0,product,consumer_complaint_narrative
190126,Debt collection,xxxx has claimed i owe them {$27.00} for xxxx years despite the proof of payment i sent them : canceled check and their ownpaid invoice for {$27.00}! \nthey continue to insist i owe them and collection agencies are after me. \nhow can i stop this harassment for a bill i already paid four years ago? \n
190135,Consumer Loan,"due to inconsistencies in the amount owed that i was told by m & t bank and the amount that was reported to the credit reporting agencies, i was advised to write a good will letter in order to address the issue and request the negative entry be removed from my credit report all together. i had a vehicle that was stolen and it was declared a total loss by insurance company. the insurance company and the gap insurancw companypaid the outstanding balance of the loan, but i was told by m & t bank that there was still a balance due on the loan. in good faith, without having received any proof as to why there was still a balance, i made a partial payment towards the remaining debt. i then sent the goodwill letter still offering to pay the remainder of the debt, but in exchange for the removal of the negative entry on my credit report. at one point, in xxxx 2015, per my credit monitoring agency, it showed a delinquent balance of {$0.00}, but when i checked my credit report again on xxxx x..."
190155,Mortgage,"in xx/xx/xxxx my wages that i earned at my job decreased by almost half, by xx/xx/xxxx i knew i was in trouble with my home loan. i began contacting wfb whom my home loan is with, for assitance and options. \nin early xx/xx/xxxx i began the loan modification process with wells fargo bank. i was told that they would not assist me with anything financial on my home loan until i fell 90 days behind, though at the time i started to inquire for assistance from wfb i was only a few weeks behind. so, i began working with a program called xxxx. they approved me for a variety of assistence and reached out to wells fargo bank to determine what they could assist with. wells fargo then turned down the assistance from xxxx and finally offered to do a loan modification for me. the outcome was totally unknow about what i would be offered in the end by wfb for assistance. wells fargo lost my paperwork twice during this process, so it took 2 months from the time i started to the time my paperwork b..."
190207,Mortgage,"i have an open and current mortgage with chase bank # xxxx. chase is reporting the loan payments to xxxx but xxxx is surpressing the information and reporting the loan as discharged in bk. this mortgage was reaffirmed in a chapter xxxx bk discharged dated xxxx/xxxx/2013. chase keeps referring to bk law for chapter xxxx and we keep providing documentation for chapter xxxx, and the account should be open and current with all the payments \n"
190208,Mortgage,"xxxx was submitted xx/xx/xxxx. at the time i submitted this complaint, i had dealt with rushmore mortgage directly endeavoring to get them to stop the continuous daily calls i was receiving trying to collect on a mortgage for which i was not responsible due to bankruptcy. they denied having knowledge of the bankruptcy, even though i had spoken with them about it repeatedly and had written them repeatedly referencing the bankruptcy requesting them to cease the pursuit, they continued to do so. when they were unable to trick me into paying, force me into paying in retaliation they placed reported to my credit bureaus a past due mortgage amount that had been discharged in federal court. on xx/xx/xxxx rushmore responded the referenced complaint indicating that they would remove the reporting from my bureau, yet it is still there now in xx/xx/xxxx. i would like them to remove it immediately and send me a letter indicating that it should not have been there in the first place and they ar..."


In [14]:
import re
data['consumer_complaint_narrative'] = data['consumer_complaint_narrative'].str.replace('x', '')
data['consumer_complaint_narrative'] = data['consumer_complaint_narrative'].str.replace('{', '')
data['consumer_complaint_narrative'] = data['consumer_complaint_narrative'].str.replace('}', '')
data['consumer_complaint_narrative'] = data['consumer_complaint_narrative'].str.replace('/', '')
data.head()

Unnamed: 0,product,consumer_complaint_narrative
190126,Debt collection,has claimed i owe them $27.00 for years despite the proof of payment i sent them : canceled check and their ownpaid invoice for $27.00! \nthey continue to insist i owe them and collection agencies are after me. \nhow can i stop this harassment for a bill i already paid four years ago? \n
190135,Consumer Loan,"due to inconsistencies in the amount owed that i was told by m & t bank and the amount that was reported to the credit reporting agencies, i was advised to write a good will letter in order to address the issue and request the negative entry be removed from my credit report all together. i had a vehicle that was stolen and it was declared a total loss by insurance company. the insurance company and the gap insurancw companypaid the outstanding balance of the loan, but i was told by m & t bank that there was still a balance due on the loan. in good faith, without having received any proof as to why there was still a balance, i made a partial payment towards the remaining debt. i then sent the goodwill letter still offering to pay the remainder of the debt, but in echange for the removal of the negative entry on my credit report. at one point, in 2015, per my credit monitoring agency, it showed a delinquent balance of $0.00, but when i checked my credit report again on 2015, there..."
190155,Mortgage,"in my wages that i earned at my job decreased by almost half, by i knew i was in trouble with my home loan. i began contacting wfb whom my home loan is with, for assitance and options. \nin early i began the loan modification process with wells fargo bank. i was told that they would not assist me with anything financial on my home loan until i fell 90 days behind, though at the time i started to inquire for assistance from wfb i was only a few weeks behind. so, i began working with a program called . they approved me for a variety of assistence and reached out to wells fargo bank to determine what they could assist with. wells fargo then turned down the assistance from and finally offered to do a loan modification for me. the outcome was totally unknow about what i would be offered in the end by wfb for assistance. wells fargo lost my paperwork twice during this process, so it took 2 months from the time i started to the time my paperwork began to be processed for some kind of ..."
190207,Mortgage,"i have an open and current mortgage with chase bank # . chase is reporting the loan payments to but is surpressing the information and reporting the loan as discharged in bk. this mortgage was reaffirmed in a chapter bk discharged dated 2013. chase keeps referring to bk law for chapter and we keep providing documentation for chapter , and the account should be open and current with all the payments \n"
190208,Mortgage,"was submitted . at the time i submitted this complaint, i had dealt with rushmore mortgage directly endeavoring to get them to stop the continuous daily calls i was receiving trying to collect on a mortgage for which i was not responsible due to bankruptcy. they denied having knowledge of the bankruptcy, even though i had spoken with them about it repeatedly and had written them repeatedly referencing the bankruptcy requesting them to cease the pursuit, they continued to do so. when they were unable to trick me into paying, force me into paying in retaliation they placed reported to my credit bureaus a past due mortgage amount that had been discharged in federal court. on rushmore responded the referenced complaint indicating that they would remove the reporting from my bureau, yet it is still there now in . i would like them to remove it immediately and send me a letter indicating that it should not have been there in the first place and they are going to remove it from all my b..."


In [15]:
data['product'].unique()

array(['Debt collection', 'Consumer Loan', 'Mortgage', 'Credit card',
       'Credit reporting', 'Student loan', 'Bank account or service',
       'Payday loan', 'Money transfers', 'Other financial service',
       'Prepaid card'], dtype=object)

In [16]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66806 entries, 190126 to 553096
Data columns (total 2 columns):
product                         66806 non-null object
consumer_complaint_narrative    66806 non-null object
dtypes: object(2)
memory usage: 1.5+ MB


In [17]:
data.dropna(inplace=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66806 entries, 190126 to 553096
Data columns (total 2 columns):
product                         66806 non-null object
consumer_complaint_narrative    66806 non-null object
dtypes: object(2)
memory usage: 1.5+ MB


In [0]:
data_comp=data[['consumer_complaint_narrative']]
data_prod=data[['product']]

In [0]:
train_size = int(len(data_comp) * .999)

train_descriptions = data_comp[:train_size].astype('str')
train_prod = data_prod[:train_size]

test_descriptions = data_comp[train_size:].astype('str')
test_prod =data_prod[train_size:]

In [22]:
print(train_descriptions.shape)
print(test_descriptions.shape)

(66739, 1)
(67, 1)


In [0]:
train_size = int(len(train_descriptions) * .8)

train_desc = train_descriptions[:train_size]
train_pr = train_prod[:train_size]

val_desc = train_descriptions[train_size:]
val_pr =train_prod[train_size:]

In [24]:
print(train_desc.shape)
print(val_desc.shape)

(53391, 1)
(13348, 1)


In [25]:
print(train_pr.shape)
print(val_pr.shape)

(53391, 1)
(13348, 1)


In [26]:
print(train_desc.info())
print(val_desc.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53391 entries, 190126 to 503087
Data columns (total 1 columns):
consumer_complaint_narrative    53391 non-null object
dtypes: object(1)
memory usage: 834.2+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 13348 entries, 503088 to 552770
Data columns (total 1 columns):
consumer_complaint_narrative    13348 non-null object
dtypes: object(1)
memory usage: 208.6+ KB
None


In [27]:
from sklearn import preprocessing
encoder = preprocessing.LabelBinarizer()
encoder.fit_transform(train_pr)
train_encoded = encoder.transform(train_pr)
val_encoded = encoder.transform(val_pr)
num_classes = len(encoder.classes_)

# Print all possible products and the label for the first complaint in our training dataset
print(encoder.classes_)
print(train_encoded[0])

['Bank account or service' 'Consumer Loan' 'Credit card'
 'Credit reporting' 'Debt collection' 'Money transfers' 'Mortgage'
 'Other financial service' 'Payday loan' 'Prepaid card' 'Student loan']
[0 0 0 0 1 0 0 0 0 0 0]


In [28]:
description_embeddings = hub.text_embedding_column("descriptions", module_spec="https://tfhub.dev/google/universal-sentence-encoder-large/3", trainable=False)

INFO:tensorflow:Using /tmp/tfhub_modules to cache modules.
INFO:tensorflow:Downloading TF-Hub Module 'https://tfhub.dev/google/universal-sentence-encoder-large/3'.
INFO:tensorflow:Downloaded TF-Hub Module 'https://tfhub.dev/google/universal-sentence-encoder-large/3'.


In [0]:
multi_label_head = tf.contrib.estimator.multi_label_head(
    num_classes,
    loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE
)

In [30]:
features = {
  "descriptions": np.array(train_desc).astype(np.str)
}
labels = np.array(train_encoded).astype(np.int32)
train_input_fn = tf.estimator.inputs.numpy_input_fn(features, labels, shuffle=True, batch_size=100, num_epochs=10)
estimator = tf.contrib.estimator.DNNEstimator(
    head=multi_label_head,
    hidden_units=[64,10],
    feature_columns=[description_embeddings])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpds8zij34', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f529ff9ca20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [31]:
%%timeit
estimator.train(input_fn=train_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpds8zij34/model.ckpt.
INFO:tensorflow:loss = 0.69291115, step = 0
INFO:tensorflow:global_step/sec: 2.6558
INFO:tensorflow:loss = 0.27220497, step = 100 (37.659 sec)
INFO:tensorflow:global_step/sec: 2.67555
INFO:tensorflow:loss = 0.2514919, step = 200 (37.374 sec)
INFO:tensorflow:global_step/sec: 2.6698
INFO:tensorflow:loss = 0.24499698, step = 300 (37.458 sec)
INFO:tensorflow:global_step/sec: 2.6522
INFO:tensorflow:loss = 0.2499402, step = 400 (37.703 sec)
INFO:tensorflow:global_step/sec: 2.67419
INFO:tensorflow:loss = 0.26085

In [40]:
%%timeit
train_input_fn_1 = tf.estimator.inputs.numpy_input_fn({"descriptions": np.array(train_desc).astype(np.str)}, train_encoded.astype(np.int32), shuffle=False)
estimator.evaluate(input_fn=train_input_fn_1)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-09-15:06:06
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpds8zij34/model.ckpt-21360
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-09-15:09:31
INFO:tensorflow:Saving dict for global step 21360: auc = 0.9603334, auc_precision_recall = 0.8081871, average_loss = 0.12238744, global_step = 21360, loss = 0.12228265
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 21360: /tmp/tmpds8zij34/model.ckpt-21360
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created beca

In [33]:
# Define our eval input_fn and run eval
eval_input_fn = tf.estimator.inputs.numpy_input_fn({"descriptions": np.array(val_desc).astype(np.str)}, val_encoded.astype(np.int32), shuffle=False)
estimator.evaluate(input_fn=eval_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-09-14:11:31
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpds8zij34/model.ckpt-21360
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-09-14:12:25
INFO:tensorflow:Saving dict for global step 21360: auc = 0.94820327, auc_precision_recall = 0.7767216, average_loss = 0.13540688, global_step = 21360, loss = 0.1355119
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 21360: /tmp/tmpds8zij34/model.ckpt-21360


{'auc': 0.94820327,
 'auc_precision_recall': 0.7767216,
 'average_loss': 0.13540688,
 'global_step': 21360,
 'loss': 0.1355119}

In [0]:
predict_input_fn = tf.estimator.inputs.numpy_input_fn({"descriptions": np.array(test_descriptions).astype(np.str)}, shuffle=False)

results = estimator.predict(predict_input_fn)

In [39]:
# Display predictions
for product in results:
  top = product['probabilities'].argsort()[-1:]
  for prod in top:
    text_prod = encoder.classes_[prod]
    print(text_prod + ': ' + str(round(product['probabilities'][prod] * 100, 2)) + '%')
  print('')

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpds8zij34/model.ckpt-21360
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Debt collection: 40.48%
Debt collection: 70.87%
Debt collection: 68.23%
Credit reporting: 86.93%
Debt collection: 85.09%
Debt collection: 80.59%
Credit reporting: 54.16%
Mortgage: 80.75%
Mortgage: 99.57%
Debt collection: 57.54%
Debt collection: 94.4%
Credit reporting: 96.67%
Credit card: 80.22%
Debt collection: 81.88%
Debt collection: 95.83%
Mortgage: 22.04%
Credit reporting: 98.03%
Mortgage: 64.78%
Credit card: 23.41%
Bank account or service: 60.4%
Student loan: 49.41%
Mortgage: 12.8%
Mortgage: 89.66%
Credit reporting: 68.4%
Credit card: 23.63%
D

In [36]:
test_prod

Unnamed: 0,product
552772,Credit card
552773,Debt collection
552775,Debt collection
552779,Credit reporting
552792,Debt collection
552794,Credit reporting
552798,Debt collection
552802,Mortgage
552807,Mortgage
552808,Debt collection
