## 1. Summarization with langchain

Summarization of long documents is a common LLM use case. The issue that most often arises, however, is that there is a token limit for the model. (Max context window length). With langchain this can be worked around by chunking and recursive summarization.

In [1]:
# First lets install dependencies (make sure already installed)
!pip3 install transformers chromadb langchain








In [2]:
# First import the dependencies we need:
import os
from dotenv import load_dotenv

from langchain.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain

from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM

print("Done.")


Done.


In [3]:
# Get our API key, projectId and URL from .env

project_id = ""
api_key = ""
ibm_cloud_url = "https://us-south.ml.cloud.ibm.com"


if api_key is None or ibm_cloud_url is None or project_id is None:
    raise Exception("One or more environment variables are missing!")
else:
    creds = {
        "url": ibm_cloud_url,
        "apikey": api_key 
    }

print("Done.")


Done.


Here we can take a [stuff](https://python.langchain.com/docs/modules/chains/document/stuff) or [map reduce](https://python.langchain.com/docs/modules/chains/document/map_reduce) approach to summarizing documents. We'll start with the simpler "stuff". Feel free to play around with changing the document URL and inference parameters to optimize the output. 

In [4]:
# Initialize llm and document loader:
print("Loading web document...")
# Try out some other documents as well
loader = WebBaseLoader("https://www.ibm.com/blog/what-can-ai-and-generative-ai-do-for-governments/")
doc = loader.load()
print("Done.")

# You might need to tweak some of the runtime parameters to optimize the results.
print("Initializing flan-ul2-20B model...")
params = {
    GenParams.DECODING_METHOD: "sample",
    GenParams.TEMPERATURE: 0.15,
    GenParams.TOP_P: 1,
    GenParams.TOP_K: 20,
    GenParams.REPETITION_PENALTY: 1.0,
    GenParams.MIN_NEW_TOKENS: 20,
    GenParams.MAX_NEW_TOKENS: 205
}

flan_model = Model(
    model_id="google/flan-ul2",
    params=params,
    credentials=creds,
    project_id=project_id
).to_langchain()

# Can use 'stuff' or 'map reduce'; 
chain = load_summarize_chain(flan_model, chain_type="stuff")

print("Running summarization task...\n")

res = chain.run(doc)

print(res)
print("\nDone.")


Loading web document...
Done.
Initializing flan-ul2-20B model...
Running summarization task...



  warn_deprecated(


Few technologies have taken the world by storm the way artificial intelligence (AI) has over the past few years. AI and its many use cases have become a topic of public discussion no longer relegated to tech experts. AI—generative AI, in particular—has tremendous potential to transform society as we know it for good, boost productivity and unlock trillions in economic value in the coming years. AI’s value is not limited to advances in the private sector. When implemented in a responsible way—where the technology is fully governed, privacy is protected, and decision-making is transparent and explainable—AI in government has the power to usher in a new era of government services. Such services can empower citizens and help restore trust in public entities by improving workforce efficiency and reducing operational costs in the public sector. On the backend, AI tools likewise have the potential to supercharge digital modernization in by, for example, automating the migration of legacy soft

We can also combine several of the features we've seen previously, including prompt templates and chains. In the following block we load the document into a template and run a "stuffed document chain". Note that we can stuff a list of documents as well. 

In [5]:
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
print("Initializing chain...")
llm_chain = LLMChain(llm=flan_model, prompt=prompt)

# Define StuffDocumentsChain
print("Stuff chain with documents...")
stuff_chain = StuffDocumentsChain(
    llm_chain=llm_chain, document_variable_name="text"
)

print("Running summarization on stuffed document chain...\n")
res = stuff_chain.run(doc)

print(res)

print("\nDone.")


Initializing chain...
Stuff chain with documents...
Running summarization on stuffed document chain...

Few technologies have taken the world by storm the way artificial intelligence (AI) has over the past few years. AI and its many use cases have become a topic of public discussion no longer relegated to tech experts. AI—generative AI, in particular—has tremendous potential to transform society as we know it for good, boost productivity and unlock trillions in economic value in the coming years. AI’s value is not limited to advances in the private sector. When implemented in a responsible way—where the technology is fully governed, privacy is protected, and decision-making is transparent and explainable—AI in government has the power to usher in a new era of government services. Such services can empower citizens and help restore trust in public entities by improving workforce efficiency and reducing operational costs in the public sector. On the backend, AI tools likewise have the po

Note that the output above should be the same as the previous block if using the same inference parameters and document URL. Now we will use the same stuff chain method to see how it behaves with multiple documents.

In [11]:
# Add a new article
print("Loading 2nd article...")
loader_2 = WebBaseLoader('https://www.govexec.com/technology/2023/07/what-will-federal-government-do-generative-ai/388595/')
doc_2 = loader_2.load() # Returns list
print("Done.")

# Combine docs
docs = doc + doc_2

print("Running summarization on stuffed document chain.\n")
try:
  res = stuff_chain.run(docs)
  print(res)
except Exception as e:
  print(e)

print("\nDone.")


Loading 2nd article...
Done.
Running summarization on stuffed document chain.



Failure during generate. (POST https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2024-01-12)
Status code: 400, body: {"errors":[{"code":"invalid_input_argument","message":"Invalid input argument for Model 'google/flan-ul2': the number of input tokens 5309 cannot exceed the total tokens limit 4096 for this model","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai"}],"trace":"187fad7e645a95be0f3b544f045d4434","status_code":400}


Failure during generate. (POST https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2024-01-12)
Status code: 400, body: {"errors":[{"code":"invalid_input_argument","message":"Invalid input argument for Model 'google/flan-ul2': the number of input tokens 5309 cannot exceed the total tokens limit 4096 for this model","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai"}],"trace":"187fad7e645a95be0f3b544f045d4434","status_code":400}

Done.


Executing the above code should result in an error to the effect of `input tokens (7038) plus prefix length (0) must be < 4096` meaning that we have exceeded the model's token input length. This brings us to the next topic, "map reduce" which helps us solve this problem.

In [7]:
from transformers import AutoTokenizer
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain

from time import perf_counter

# Add a 3rd document
print("Loading 3rd document...")
loader_3 = WebBaseLoader("https://www.thomsonreuters.com/en-us/posts/government/ai-use-government-agencies/")
doc_3 = loader_3.load()
docs = docs + doc_3

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
print("Init map chain...")
map_chain = LLMChain(llm=flan_model, prompt=map_prompt)

# Reduce
reduce_template = """The following is set of summaries:
{doc_summaries}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
print("Init reduce chain...")
reduce_chain = LLMChain(llm=flan_model, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
print("Stuff documents using reduce chain...")
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000
)

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=True,
    verbose=True
)

# Note here we are using a pretrained tokenizer from Huggingface, specifically for the flan-ul2 model.
# You might want to play around with different tokenizers and text splitters to see how the results change.
print("Init chunk splitter...")
try:
    tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2") # Hugging face tokenizer for flan-ul2
    text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
        tokenizer=tokenizer
    )
    split_docs = text_splitter.split_documents(docs)
    print(f"Using {len(split_docs)} chunks: ")
except Exception as ex:
    print(ex)

print("Run map-reduce chain. This should take ~15-30 seconds...")
try:
    t1_start = perf_counter()
    results = map_reduce_chain(split_docs)
    steps = results["intermediate_steps"]
    output = results["output_text"]
    t1_stop = perf_counter()
    print("Elapsed time:", round((t1_stop - t1_start), 2), "seconds.\n") 

    print("Results from each chunk: \n")
    for idx, step in enumerate(steps):
        print(f"{idx + 1}. {step}\n")
    
    print("\n\nFinal output:\n")
    print(output)

    print("\nDone.")
except Exception as e:
    print(e)


Loading 3rd document...
Init map chain...
Init reduce chain...
Stuff documents using reduce chain...
Init chunk splitter...


tokenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.43M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Using 3 chunks: 
Run map-reduce chain. This should take ~15-30 seconds...


[1m> Entering new MapReduceDocumentsChain chain...[0m


  warn_deprecated(


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]


[1m> Finished chain.[0m
Elapsed time: 11.96 seconds.

Results from each chunk: 

1. No, I don't think so. I'm not interested in this topic. I'm interested in other topics.

2. Federal employees are going to see AI tools show up in cloud-based productivity suites sooner rather than later, but it's not clear yet how the trending tech will impact public-facing digital services.

3. How government agencies come to use generative AI and other innovative technologies in their operations will largely depend upon how the regulatory scheme unfolds



Final output:

No, I'm not interested in this topic. I'm interested in other topics. AI tools are going to show up in cloud-based productivity suites sooner rather than later, but it's not clear yet how the trending tech will impact public-facing digital services.

Done.


As you can see, Langchain along with a tokenizer for the model can quickly divide a larger amount of text into chunks and recursively summarize into a concise sentence or two. You might want to play around with trying different documents, tweaking the model runtime parameters, and trying a different model alltogether to see how things behave. One of the most important things to note in order to get good results is that the way the input is chunked and tokenized matters a lot. Passing poor map results will result in a lower quality summarization.