<a href="https://colab.research.google.com/github/olonok69/LLM_Notebooks/blob/main/langchain/summarization/Langchain_%26_llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Summarization

https://python.langchain.com/docs/use_cases/summarization


1.- Stuff: Simply “stuff” all your documents into a single prompt.


2.- Map-reduce: Summarize each document on it’s own in a “map” step and then “reduce” the summaries into a final summary

3.- Refine : The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.

In [None]:
!pip install langchain langchain-community tiktoken -q
!pip install -U accelerate -q
! pip install -U unstructured numpy -q

In [None]:

from google.colab import output
output.enable_custom_widget_manager()

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import torch
import transformers
from transformers import AutoTokenizer
from  langchain import LLMChain, HuggingFacePipeline, PromptTemplate

# Llama2

https://huggingface.co/meta-llama/Llama-2-7b-chat-hf


In [None]:
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)

In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="cuda",
    max_length=3000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

In [None]:
template = """
              Write a summary of the following text delimited by triple backticks.
              Return your response which covers the key points of the text.
              ```{text}```
              SUMMARY:
           """

In [None]:
prompt = PromptTemplate(template=template, input_variables=["text"])
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)

In [None]:
text = """
Patient Name: John Doe
Date of Birth: January 15, 1975
Gender: Male
Medical Record Number: 123456789
Date of Assessment: August 18, 2023

I. Chief Complaint:
The patient presents with complaints of persistent fatigue, unexplained weight loss, and intermittent abdominal pain over the past few months. He reports a gradual decrease in appetite and occasional nausea. The patient is seeking medical evaluation to determine the underlying cause of his symptoms.

II. Medical History:
The patient has a history of hypertension managed with medication for the past five years. He underwent an appendectomy in his late twenties and had a hernia repair surgery a decade ago. The patient reports a family history of diabetes on his maternal side.

III. Review of Systems:

General: The patient reports fatigue, unexplained weight loss of approximately 10 pounds over three months, and a decreased appetite.

Gastrointestinal: The patient experiences intermittent abdominal pain, predominantly in the right upper quadrant, without a clear trigger. He reports occasional nausea, and denies vomiting, diarrhea, or changes in bowel habits.

Cardiovascular: The patient's blood pressure has been well controlled with medication. He denies chest pain, palpitations, or shortness of breath.

Respiratory: The patient denies cough, wheezing, or shortness of breath.

Musculoskeletal: No significant joint pain or limitations in mobility reported.

Neurological: The patient denies headaches, dizziness, or changes in vision.

Psychological: The patient mentions occasional stress due to work-related factors but denies symptoms of depression or anxiety.

IV. Physical Examination:

Vital Signs: Blood pressure is 130/80 mmHg, heart rate is 78 beats per minute, respiratory rate is 16 breaths per minute, and temperature is 98.6°F (37°C).

General: The patient appears fatigued but alert and oriented to person, place, and time. He appears to have lost weight since his last visit.

Abdominal Examination: There is tenderness on palpation in the right upper quadrant of the abdomen. No palpable masses or organomegaly noted. Bowel sounds are normal.

Cardiovascular Examination: Regular rate and rhythm with no murmurs or abnormal sounds.

Respiratory Examination: Clear breath sounds bilaterally, no wheezing or crackles noted.

Neurological Examination: No focal neurological deficits observed.

V. Diagnostic Investigations:

Complete Blood Count (CBC): Within normal limits.
Comprehensive Metabolic Panel (CMP): Slight decrease in albumin levels.
Liver Function Tests (LFTs): Mild elevation in liver enzymes (AST and ALT).
Abdominal Ultrasound: No evidence of gallstones or other abnormalities. Liver appears slightly enlarged.
CT Scan of the Abdomen: Reveals a mass in the liver, approximately 5 cm in diameter. The mass appears to be enhancing on contrast.
VI. Assessment and Plan:

Based on the patient's symptoms, physical examination, and diagnostic investigations, the following assessment and plan have been formulated:

Assessment:

Unexplained weight loss and fatigue.
Right upper quadrant abdominal pain.
Elevated liver enzymes and an enlarging liver mass.
Plan:

Further Evaluation: Given the presence of an enlarging liver mass and elevated liver enzymes, the patient will be referred to a gastroenterologist for further evaluation, including possible liver biopsy.
Oncology Consultation: Given the possibility of malignancy, an oncology consultation will be sought to determine the nature of the liver mass and develop an appropriate treatment plan.
Symptom Management: The patient's abdominal pain will be managed with pain relievers as needed. Anti-nausea medication will be prescribed to alleviate nausea symptoms.
Nutritional Support: The patient's decreased appetite and weight loss will be addressed with dietary counseling and nutritional support to ensure adequate calorie intake.
Psychological Support: Given the patient's stress and anxiety related to his symptoms, psychological support and counseling will be offered to address emotional well-being.
VII. Follow-Up:

The patient is scheduled for a follow-up appointment in two weeks to discuss the results of further investigations, consultations, and to address any ongoing symptoms. The patient is advised to monitor his symptoms and report any significant changes or new symptoms before the next visit.

VIII. Prognosis and Discussion:

The presence of an enlarging liver mass raises concerns about potential malignancy. Further diagnostic testing and consultations with specialists are essential to determine the nature of the mass and develop an appropriate treatment plan. The patient's overall prognosis will be influenced by the nature of the liver mass and the success of treatment interventions. Close collaboration between the patient, medical team, and specialists will guide the management of his condition and optimize his chances of recovery.

IX. Patient Education:

The patient and his family will receive comprehensive education about the diagnostic process, potential treatment options, and the importance of adhering to medical recommendations. Emphasis will be placed on the need for regular follow-up appointments and open communication with the medical team to ensure the best possible outcomes.

X. Conclusion:

This patient report outlines the comprehensive health assessment of John Doe, a male patient presenting with unexplained weight loss, fatigue, and abdominal pain. The assessment encompasses a thorough medical history, review of systems, physical examination findings, diagnostic investigations, assessment, and a detailed plan for further evaluation and management. The collaboration between healthcare professionals and specialists will be crucial in guiding the patient's journey toward an accurate diagnosis, effective treatment, and optimal health outcomes.
"""

In [None]:
res = llm_chain.run(text)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
              Write a summary of the following text delimited by triple backticks.
              Return your response which covers the key points of the text.
              ```
Patient Name: John Doe
Date of Birth: January 15, 1975
Gender: Male
Medical Record Number: 123456789
Date of Assessment: August 18, 2023

I. Chief Complaint:
The patient presents with complaints of persistent fatigue, unexplained weight loss, and intermittent abdominal pain over the past few months. He reports a gradual decrease in appetite and occasional nausea. The patient is seeking medical evaluation to determine the underlying cause of his symptoms.

II. Medical History:
The patient has a history of hypertension managed with medication for the past five years. He underwent an appendectomy in his late twenties and had a hernia repair surgery a decade ago. The patient reports a family history of diabetes on his maternal side.

I

In [None]:
print(res)

 John Doe, a 48-year-old male, presented with unexplained weight loss, fatigue, and abdominal pain. His medical history includes hypertension, appendectomy, and hernia repair surgery. Physical examination revealed tenderness in the right upper quadrant of the abdomen, and diagnostic investigations showed elevated liver enzymes and a mass in the liver. The patient will undergo further evaluation and management, including a gastroenterologist consultation and possible liver biopsy. The patient's prognosis will depend on the nature of the liver mass and the success of treatment interventions.





In [None]:
from langchain.document_loaders import UnstructuredFileLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.chains.question_answering import load_qa_chain


from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain.document_loaders import DirectoryLoader, TextLoader

# Loaders
from langchain.schema import Document

# Splitters
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [None]:
sm_loader = UnstructuredFileLoader("/content/drive/MyDrive/data/muir_lake_tahoe_in_winter.txt")
sm_doc = sm_loader.load()

lg_loader = UnstructuredFileLoader("/content/drive/MyDrive/data/worked.txt")
lg_doc = lg_loader.load()

In [None]:
file_path = "/content/drive/MyDrive/data/muir_lake_tahoe_in_winter.txt"

with open(file_path, "r",  encoding="utf-8") as f:
    text = f.read()

In [None]:
def doc_summary(docs):
    print (f'You have {len(docs)} document(s)')

    num_words = sum([len(doc.page_content.split(' ')) for doc in docs])

    print (f'You have roughly {num_words} words in your docs')
    print ()
    print (f'Preview: \n{docs[0].page_content.split(". ")[0]}')

In [None]:
doc_summary(sm_doc)

You have 1 document(s)
You have roughly 2295 words in your docs

Preview: 
The winter glory of the Sierra ! How little is known of it! Californians admire descriptions of the Swiss Alps, reading with breathless interest how ice and snow load their sublime heights, and booming avalanches sweep in glorious array through their crowded forests, while our own icy, snow-laden mountains, with their unrivaled forests, loom unnoticed along our eastern horizon


In [None]:
doc_summary(lg_doc)

You have 1 document(s)
You have roughly 12576 words in your docs

Preview: 
February 2021Before college the two main things I worked on, outside of school,

were writing and programming


In [None]:
chain = load_summarize_chain(llm_chain, chain_type="stuff")

In [None]:

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", "\t"], chunk_size=3000, chunk_overlap=1000)

In [None]:
len(text)

5861

### Summarize: Stuff

In [None]:


docs = text_splitter.create_documents([text])
num_documents = len(docs)
num_documents

3

In [None]:
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run(docs[0].page_content)
response

" Based on the patient's complaints of persistent fatigue, unexplained weight loss, and intermittent abdominal pain, the physician performs a thorough medical evaluation. The patient's medical history reveals a history of hypertension, appendectomy, and hernia repair surgery. The review of systems shows no significant findings in the cardiovascular, respiratory, or musculoskeletal systems. The physical examination reveals tenderness in the right upper quadrant of the abdomen, and the diagnostic investigations show a slight decrease in albumin levels, mild elevation in liver enzymes, and a mass in the liver observed on CT scan. Based on these findings, the physician diagnoses the patient with liver disease and creates a plan for further evaluation and management.\n            The key points of the text are:\n\n1. The patient presents with persistent fatigue, unexplained weight loss, and intermittent abdominal pain.\n2. The patient has a history of hypertension, appendectomy, and hernia 

In [None]:
print(response)

 Based on the patient's complaints of persistent fatigue, unexplained weight loss, and intermittent abdominal pain, the physician performs a thorough medical evaluation. The patient's medical history reveals a history of hypertension, appendectomy, and hernia repair surgery. The review of systems shows no significant findings in the cardiovascular, respiratory, or musculoskeletal systems. The physical examination reveals tenderness in the right upper quadrant of the abdomen, and the diagnostic investigations show a slight decrease in albumin levels, mild elevation in liver enzymes, and a mass in the liver observed on CT scan. Based on these findings, the physician diagnoses the patient with liver disease and creates a plan for further evaluation and management.
            The key points of the text are:

1. The patient presents with persistent fatigue, unexplained weight loss, and intermittent abdominal pain.
2. The patient has a history of hypertension, appendectomy, and hernia repai

In [None]:

# Make an empty list to hold your summaries
summary_list = []

# Loop through a range of the lenght of your selected docs
for i, doc in enumerate(docs):

    # Go get a summary of the chunk
    chunk_summary = llm_chain.run(doc.page_content)


    # Append that summary to your list
    summary_list.append(chunk_summary)

    print (f"Summary #{i} (chunk #- Preview: {chunk_summary[:500]} \n")

Summary #0 (chunk #- Preview:  Based on the patient's symptoms and medical history, the primary care physician suspects a possible diagnosis of liver cancer. The patient's elevated liver enzymes, abdominal pain, and mass in the liver on CT scan support this diagnosis. The physician plans to refer the patient to a specialist for further evaluation and treatment.

The patient's history of hypertension and previous abdominal surgeries may also be relevant to the patient's current symptoms. The physician will consider these fact 

Summary #1 (chunk #- Preview:  The patient is a 55-year-old male who presented with unexplained weight loss and fatigue, along with right upper quadrant abdominal pain. Physical examination revealed tenderness in the right upper quadrant, but no palpable masses or organomegaly. Laboratory tests showed a slight decrease in albumin levels and mild elevation in liver enzymes. Imaging studies revealed a mass in the liver, which was confirmed by CT scan. The patient w

In [None]:

summaries = "\n".join(summary_list)

# Convert it back to a document
summaries = Document(page_content=summaries)

print (f"Your total summary has {llm.get_num_tokens(summaries.page_content)} tokens")

Your total summary has 520 tokens


In [None]:
print(summaries.page_content)

 Based on the patient's symptoms and medical history, the primary care physician suspects a possible diagnosis of liver cancer. The patient's elevated liver enzymes, abdominal pain, and mass in the liver on CT scan support this diagnosis. The physician plans to refer the patient to a specialist for further evaluation and treatment.

The patient's history of hypertension and previous abdominal surgeries may also be relevant to the patient's current symptoms. The physician will consider these factors when developing a treatment plan.

Overall, the patient's presentation suggests a need for urgent further evaluation and management by a specialist to determine the underlying cause of his symptoms and to develop an appropriate treatment plan.
 The patient is a 55-year-old male who presented with unexplained weight loss and fatigue, along with right upper quadrant abdominal pain. Physical examination revealed tenderness in the right upper quadrant, but no palpable masses or organomegaly. Lab

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 5000,
    chunk_overlap = 0
)

### Summarize: Map Reduce


The ReduceDocumentsChain handles taking the document mapping results and reducing them into a single output. It wraps a generic CombineDocumentsChain (like StuffDocumentsChain) but adds the ability to collapse documents before passing it to the CombineDocumentsChain if their cumulative size exceeds token_max. In this example, we can actually re-use our chain for combining our docs to also collapse our docs.

So if the cumulative number of tokens in our mapped documents exceeds 4000 tokens, then we’ll recursively pass in the documents in batches of \< 4000 tokens to our StuffDocumentsChain to create batched summaries. And once those batched summaries are cumulatively less than 4000 tokens, we’ll pass them all one last time to the StuffDocumentsChain to create the final summary.

In [None]:
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_text_splitters import CharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
import pprint

In [None]:
file_path = "/content/drive/MyDrive/data (1)/book.txt"

with open(file_path, "r", encoding="utf-8") as f:
    text1= f.read()

In [None]:

docs = text_splitter.create_documents([text1])
num_documents = len(docs)
num_documents

62

In [None]:
# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

In [None]:
# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes.
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

In [None]:
# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

# Combines and iteratively reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000,
)

In [None]:
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)

In [None]:
response =  map_reduce_chain.run(split_docs[:5])

In [None]:
pprint.pprint(response)


('\n'
 '\n'
 'The main themes of The Great Gatsby are:\n'
 '\n'
 '1. The corrupting influence of wealth and power: The novel critiques the '
 'excesses of the wealthy elite and the corrupting influence of their '
 'privileged lifestyle.\n'
 '2. The fragility of human relationships: The novel highlights the '
 'impermanence of human connections and the fragility of relationships.\n'
 '3. The elusiveness of truth and reality: The novel explores the tension '
 "between illusion and reality, as the characters' lives are revealed to be "
 'more complex and nuanced than they initially seem.\n'
 '4. The American Dream: The novel examines the idea of the American Dream and '
 "its elusiveness, as Gatsby's pursuit of wealth and status illustrates.\n"
 '5. Social class and inequality: The novel highlights the vast social '
 'disparities of the time and the ways in which wealth and status shape the '
 'lives of the characters.\n'
 '6. Love and relationships: The novel explores the complexities of

### Refine


https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.refine.RefineDocumentsChain.html#langchain.chains.combine_documents.refine.RefineDocumentsChain



The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.


In [None]:
llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})
#chain = load_summarize_chain(llm, chain_type="refine")



In [None]:
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in Italian"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)



In [None]:
result

{'intermediate_steps': ["\nThe Great Gatsby is a novel by F. Scott Fitzgerald, first published in 1925. The book is set in the 1920s in Long Island's North Shore and New York City. The story revolves around the life of Jay Gatsby, a wealthy and mysterious man who throws extravagant parties in the hopes of winning back his lost love, Daisy Buchanan. The novel explores themes of love, greed, class, and the American Dream.",
  '\n\nPlease provide the refined summary in Italian.',
  '',
  '\n\nPlease provide the original summary in English.',
  '\n\nPlease provide the original summary in English.',
  '\n\nPlease provide the original summary in English.',
  '',
  '\n\nPlease provide the original summary in Italian.',
  '\n\nPlease provide the original summary in Italian.',
  '',
  '\n\nPlease provide the original summary in Italian, and I will refine it according to the new context provided.',
  '',
  '\n\nPlease provide the original summary in Italian for us to work with.',
  '',
  '\n\nPl