# POC for complaint summarization

source code for a quick POC for exploring the PaLM2 Vertex AI LLM for summarizing complaints

In [1]:
!pip install huggingface-hub
!pip install langchain
!pip install SentencePiece
!pip install transformers

Collecting huggingface-hub
  Downloading huggingface_hub-0.26.2-py3-none-any.whl.metadata (13 kB)
Downloading huggingface_hub-0.26.2-py3-none-any.whl (447 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m447.5/447.5 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: huggingface-hub
Successfully installed huggingface-hub-0.26.2
Collecting SentencePiece
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25hInstalling collected packages: SentencePiece
Successfully installed SentencePiece-0.2.0
Collecting transformers
  Downloading transformers-4.46.1-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
########### IMPORT HFACE PIPELINE FROM LANGCHAIN AND SUMMARIZE CHAIN ################
from langchain.chains.summarize import load_summarize_chain
from langchain import PromptTemplate, LLMChain
from langchain.llms import VertexAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd
import numpy


In [4]:
#define Palm2 vertex AI
llm = VertexAI(model_name='text-bison@001',batch_size=100)


  warn_deprecated(


In [5]:
#set prompt template
prompt_template ="""
summarize the given text by high lighting most important information

{text}

Summary:
    """

#define prompt template
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

#define chain with a map_reduce type
chain = load_summarize_chain(llm, map_prompt=prompt, combine_prompt=prompt, verbose=True,chain_type="map_reduce")

In [3]:
#read data
#I am reading data from my local repo
df_complaints=pd.read_csv('../data/complaints.csv',low_memory=False)
desc_col='Consumer complaint narrative'
df_complaints=df_complaints[~df_complaints[desc_col].isna()]

In [7]:

# define text splitter
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,#set this for chunking texts
    chunk_overlap  = 40,
    length_function = len,
)

In [17]:

 def split_doc(text_splitter,doc):
    """
    function to split an input document using Langchain
    Args:
        text_splitter: a langchain text splitter
        doc: string text
    Output:
        texts: a dictionary of splitted text
    """
    texts = text_splitter.create_documents([doc])

    return texts

def summarize_docs(llm_chain,docs):
    """
    function to summarize chunked documents
    Args:
        llm_chain: a langchain summarize chain
        docs: chunked documents
    Output:
        summaries: list of summarized documents
    """
    #summarize all chunks in one go
    summary = llm_chain.batch(docs)

    summaries=[]
    #extract summaries
    for summarized_doc in summary:
        summaries.append(summarized_doc['output_text'])

    return summaries

In [34]:
#split all complaints
splitted_texts=df_complaints[desc_col].apply(lambda doc: split_doc(text_splitter,doc) )

#summarize splited complaints
df_complaints['complaint_summary']= summarize_docs(chain,list(splitted_texts))



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
summarize the given text by high lighting most important information

Zions Debt Holdings has posted a collection action on my credit report with respect to an alleged debt owed in connection with an account I previously had with years ago ( Account Number ) That account was cancelled in ( in accordance with the terms of the agreement with ) and at the time of cancellation no amounts were owed. I supplied all of this information ( and more ) to Zions, including a copy of both the agreement I had with showing my right to cancel ) and a copy of the cancellation letter I sent to I have also disputed this account with times already. However, in each case Zions continued to allege that the subject collection action was valid without suppling any supporting information whatsoever ; either to me or to I

Summary:
    [0m

[1m> Finished chain.[0m


[1m

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_complaints['complaint_summary']= summarize_docs(chain,list(splitted_texts))
