### For obvious reasons we will use Google T5 to summarize complaints in a  popular public dataset called Consumer Financial Protection Bureau

In [1]:
#Install the following packages, I used Google Colab as installing some of these on Mac m1 machines can be frustrating. 
!pip install transformers==2.8.0
!pip install torch==1.4.0
!pip install pip --upgrade
!pip install pyopenssl --upgrade
!pip install openai streamlit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0m

In [2]:
#Import Torch and transformers
import warnings
warnings.filterwarnings('ignore')
import os
import csv
import pandas as pd
import torch
import json 
from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Config


In [3]:
#Mount Google Drive to Import Cleaned Dataset
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
complaints_pathname='/content/drive/MyDrive/cleaned_complained.csv'
df_all_complaints=pd.read_csv(complaints_pathname)
df_all_narr=df_all_complaints.dropna(subset=['Consumer complaint narrative'])
df_all_narr=df_all_narr[['Product','Sub-product','Issue','Sub-issue','Consumer complaint narrative']]

In [5]:
#Set Pandas Display Option to see the entirity of complaint
pd.set_option('display.max_colwidth',240)
pd.set_option('max_rows', 99999)

In [6]:
df_part_narr=df_all_narr[df_all_narr.index.isin([408,659,789,856858,856702,950006,865088,681842,536367,285894])]
df_part_narr

Unnamed: 0,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative
408,Mortgage,VA mortgage,"Loan modification,collection,foreclosure",,Quicken loans contacted my father while he was XXXX and was not legally supposed to sign any documents. The documents were hidden from me and I found them when my father passed away. I advised quicken loans that he passed away and they ...
659,Debt collection,"Other (i.e. phone, health club, etc.)",Cont'd attempts collect debt not owed,Debt resulted from identity theft,I received an alert from a company called Credence who has placed for collection an account on my credit report on behalf of a company called XXXX in the amount of {$150.00}. I received this alert on both my XXXX and XXXX reports. \n\nX...
789,Bank account or service,Checking account,"Account opening, closing, or management",,"Bank of America fails to show a valid current contract, Bank of America 's response shows no tangible evidence of applicable facts requiring my obligation or performance, I never agreed to a lifetime of obligation to these CRIMINALS, PA..."
285894,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Account status incorrect,XXXX XXXX is reporting incorrect information and will not : 1 ) contact me ( they are trying to contact my WIFE! ) 2 ) remove the erroneous marks 3 ) fix the reporting 4 ) delete the negative reporting 5 ) DO anything!\n\nThis very frus...


### T5 Model

In [7]:
#Function takes in three parameters, text, min length and max length of the summary output
def T5_summarize(text_ps,required_max_len,required_min_len):
    model = T5ForConditionalGeneration.from_pretrained('t5-small')
    tokenizer = T5Tokenizer.from_pretrained('t5-small')
    device = torch.device('cpu')
    summarized_text = list()
    text_ps_list = list()
    if type(text_ps) ==  str:
        text_ps_list.append (text_ps)
    elif type(text_ps) == list:
        text_ps_list = text_ps
    else:
        text_ps_list=[]
    for p in text_ps_list:
        text = p
        preprocess_text = text.strip().replace("\n","")
        t5_prepared_Text = "summarize: "+preprocess_text
        tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)
        # summmarize 
        summary_ids = model.generate(tokenized_text,
                                        num_beams=8,
                                        no_repeat_ngram_size=4,
                                        min_length=required_min_len,
                                        max_length=required_max_len,
                                        early_stopping=True)

        output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summarized_text.append(output)
    return summarized_text

In [8]:
#Run the T5_summarize function fitted with parameters, the function reads 'Consumer complaint narrative' column and outputs summary as a list
#As this can be resource intensive we summarize only the first 4 complaints keep the output to max length of 100 words and min of 30 words
sum_list = T5_summarize (df_part_narr.iloc[:,4].tolist(),100,30)



## Output of Summarized Text

In [9]:
#First 4 complaints summarized
sum_list

['quicken loans contacted my father while he was XXXX and was not legally supposed to sign any documents. the documents were hidden from me and I found them when my father passed away. I advised quicken loans that he passed away and they told me I could modify or refinance my fathers house. they used comps that were not comperable to my fathers home.',
 'a company has placed for collection an account on my credit report on behalf of a company called XXXX in the amount of $150.00. I received a collection letter from the same company for this very same account. they failed to find any information on this account pertaining to my social security number or any other personal identifying information.',
 'parties to a contract should be competent, being of the age of consent, of sound mind, not disqualified from contracting by any law to which s/he is subject. a flaw in capacity may be due to minority, lunacy, idiocy, drunkenness or kind.',
 'XXXX is reporting incorrect information and will 

In [10]:
pd.set_option('display.max_colwidth',1500)
pd.set_option('max_rows', 99999)

## First complaint prior to summarization

In [11]:
# Original Complaint before Summarized
text = df_part_narr['Consumer complaint narrative'][0:1]

In [12]:
text

408    Quicken loans contacted my father while he was XXXX and was not legally supposed to sign any documents. The documents were hidden from me and I found them when my father passed away. I advised quicken loans that he passed away and they told me I could modify or refinance my fathers house. I told them the house was not worth what the amount of the loan was. I had the house appraised in XX/XX/2016 and the house appraised for {$230000.00}. When quicken loans sent someone to appraise my fathers house, they used comps that were not comperable to my fathers home. They used comps not near my dads home and homes that were for sale and did n't sell yet. I opted to do a short sale on the house, quicken loans approved the short sale as well as the va. I have had nothing but headaches since my father passed with quicken loans. They did. A loan with a man that was XXXX, XXXX, XXXX and when I asked them how they could do that, their answer was if he can sign his name he can have a mortgage. Q