<a href="https://colab.research.google.com/github/shu65/langchain_examples/blob/main/Make_Mind_Map_with_LangChain_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install openai langchain python-dotenv transformers pypdf tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip list | grep langchain

langchain                     0.0.141


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!ls /content/drive/MyDrive/colab_env/lang_chain_env

/content/drive/MyDrive/colab_env/lang_chain_env


In [None]:
from dotenv import load_dotenv

load_dotenv(dotenv_path="/content/drive/MyDrive/colab_env/lang_chain_env")

True

In [None]:
# download paper

!wget --user-agent TryToStopMeFromUsingWgetNow https://arxiv.org/pdf/2205.14135.pdf -O reserch_paper.pdf

--2023-04-17 11:57:07--  https://arxiv.org/pdf/2205.14135.pdf
Resolving arxiv.org (arxiv.org)... 128.84.21.199
Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2630825 (2.5M) [application/pdf]
Saving to: ‘reserch_paper.pdf’


2023-04-17 11:57:08 (4.81 MB/s) - ‘reserch_paper.pdf’ saved [2630825/2630825]



In [None]:
from langchain.document_loaders.pdf import PyPDFLoader

file_path = "./reserch_paper.pdf"
loader = PyPDFLoader(file_path)
pages = loader.load()
print(pages[0].page_content)

FlashAttention : Fast and Memory-Eﬃcient Exact Attention
with IO-Awareness
Tri Daoy, Daniel Y. Fuy, Stefano Ermony, Atri Rudraz, and Christopher Réy
yDepartment of Computer Science, Stanford University
zDepartment of Computer Science and Engineering, University at Buﬀalo, SUNY
{trid,danfu}@cs.stanford.edu ,ermon@stanford.edu ,atri@buffalo.edu ,
chrismre@cs.stanford.edu
June 24, 2022
Abstract
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity
of self-attention are quadratic in sequence length. Approximate attention methods have attempted
to address this problem by trading oﬀ model quality to reduce the compute complexity, but often do
not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-
aware—accounting for reads and writes between levels of GPU memory. We propose FlashAttention ,
an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes
between GPU high 

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

pages = pages[:11]
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name="p50k_base",
    chunk_size=1000,
    chunk_overlap=100,
)
sub_pages = text_splitter.split_documents(pages)
len(sub_pages)

21

In [None]:
from typing import List, Optional, Tuple

import langchain
from langchain import OpenAI
from langchain.chains.llm import LLMChain
from langchain.docstore.document import Document
from langchain.llms.fake import FakeListLLM
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import BaseLanguageModel

In [None]:
MAKE_MIND_MAP_PROMPT_TEMPLATE_STR = """I would like to create a mind map using the Xmind tool about following text.
Can you provide me with some text in Markdown format that is compatible with Xmind? 
Please include a Central Topic with Main Topics and any additional information goes to Subtopics that will help create an effective mind map."

The mindmap format is follow:

# Central Tpoic Title

## Main Topic Title 1

### Subtopic Title 1

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2

### Subtopic Title 2

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2

## Main Topic Title 2

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2


Text:


{text}


Mind map in Markdown format:"""
MAKE_MIND_MAP_PROMPT_TEMPLATE = PromptTemplate(
    template=MAKE_MIND_MAP_PROMPT_TEMPLATE_STR, input_variables=["text"]
)

class MindMapMaker(object):
    def __init__(
        self,
        llm: BaseLanguageModel,
        verbose: Optional[bool] = None,
    ) -> None:
        self.llm_chain = LLMChain(
            llm=llm, prompt=MAKE_MIND_MAP_PROMPT_TEMPLATE, verbose=verbose
        )
        assert len(self.llm_chain.prompt.input_variables) == 1

    def __call__(self, docs: List[Document]) -> List[Document]:
        results = self.llm_chain.apply(
            [
                {**{self.llm_chain.prompt.input_variables[0]: d.page_content}}
                for d in docs
            ]
        )
        question_result_key = self.llm_chain.output_key
        ret = [
            Document(page_content=r[question_result_key], metadata=docs[i].metadata)
            for i, r in enumerate(results)
        ]
        return ret

In [None]:
MERGE_PROMPT_TEMPLATE_STR = """Compact the following mind map into one mind maps. 
The format should be Xmind compatible Markdwon format.
The mind map format is follow:

# Central Tpoic Title

## Main Topic Title 1

### Subtopic Title 1

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2

### Subtopic Title 2

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2

## Main Topic Title 2

- Subtopic Title 1

  - Subtopic Title 1
  - Subtopic Title 2


{text}


Output mind map in Markdown format:"""
MERGE_PROMPT_TEMPLATE = PromptTemplate(
    template=MERGE_PROMPT_TEMPLATE_STR, input_variables=["text"]
)

class MindMapMerger(object):
    def __init__(
        self,
        llm: BaseLanguageModel,
        input_token_max: int = 2000,
        return_intermediate_steps=True,
        verbose: Optional[bool] = None,
    ) -> None:
        self.llm_chain = LLMChain(
            llm=llm, prompt=MERGE_PROMPT_TEMPLATE, verbose=verbose
        )
        self.input_token_max = input_token_max
        self.return_intermediate_steps = return_intermediate_steps
        assert len(self.llm_chain.prompt.input_variables) == 1

    def _get_input_text(self, docs: List[Document]) -> str:
        doc_strings_values = []
        for i, doc in enumerate(docs):
            doc_strings_values.append(f"Input mind map {i}")
            doc_strings_values.append(doc.page_content.strip())
        return "\n\n".join(doc_strings_values)

    def _get_output_metadata(self, docs: List[Document]) -> dict:
        ret = {}
        for i, doc in enumerate(docs):
            ret[i] = doc.metadata
        return ret

    def __call__(self, docs: List[Document]) -> Tuple[Document, dict]:
        input_docs = docs
        if self.return_intermediate_steps:
            extra_return_dict = {"inputs": input_docs}
        else:
            extra_return_dict = {}

        length_func = self.llm_chain.llm.get_num_tokens
        num_tokens_per_doc = [
            length_func(input_docs[i].page_content) for i in range(len(input_docs))
        ]
        merge_step = 0
        while len(input_docs) > 1:
            print(f"merge_step: {merge_step}, number of mind maps: {len(input_docs)}")
            current_token = 0
            merge_inputs_list = []
            merge_inputs = []
            for doc_i in range(len(input_docs)):
                if (current_token + num_tokens_per_doc[doc_i]) >= self.input_token_max:
                    if len(merge_inputs) <= 1:
                        raise ValueError(
                            "Can not merege two mind maps, because the total length of mind maps are too long."
                            f"One mind map lenght is {current_token}. "
                            f"The other mind map lengtht is {num_tokens_per_doc[doc_i]}."
                            f"token_max is {self.input_token_max}"
                        )
                    merge_inputs_list.append(merge_inputs)
                    merge_inputs = []
                    current_token = 0
                merge_inputs.append(input_docs[doc_i])
                current_token += num_tokens_per_doc[doc_i]

            if len(merge_inputs) > 0:
                merge_inputs_list.append(merge_inputs)

            new_input_docs = []
            for merge_inputs in merge_inputs_list:
                if len(merge_inputs) > 1:
                    input_text = self._get_input_text(merge_inputs)
                    result = self.llm_chain.run(
                        **{self.llm_chain.prompt.input_variables[0]: input_text}
                    )
                    metadata = self._get_output_metadata(merge_inputs)
                    new_doc = Document(page_content=result, metadata=metadata)
                    new_input_docs.append(new_doc)
                else:
                    new_input_docs.append(merge_inputs[0])

            if self.return_intermediate_steps:
                extra_return_dict.update({f"merge_step{merge_step}": new_input_docs})
            input_docs = new_input_docs
            num_tokens_per_doc = [
                length_func(input_docs[i].page_content) for i in range(len(input_docs))
            ]
            merge_step += 1

        return input_docs[0], extra_return_dict

In [None]:
def extract_page_content(value):
    if isinstance(value, Document):
        return value.page_content
    elif isinstance(value, dict):
        ret = {k: extract_page_content(v) for k, v in value.items()}
        return ret
    elif isinstance(value, list):
        ret = [extract_page_content(v) for v in value]
        return ret
    else:
        raise RuntimeError(f"{type(value)}  is not support")

In [None]:
mind_map_maker_llm = OpenAI(temperature=0, max_tokens=256)
mind_map_merger_llm = OpenAI(temperature=0, max_tokens=512)

mind_map_maker = MindMapMaker(llm=mind_map_maker_llm)
mind_map_merger = MindMapMerger(llm=mind_map_merger_llm, input_token_max=1200)
mind_maps = mind_map_maker(sub_pages)

merged_mind_map, history = mind_map_merger(mind_maps)

merge_step: 0, number of mind maps: 21


<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
).


merge_step: 1, number of mind maps: 4
merge_step: 2, number of mind maps: 2


In [None]:
print("Mind Map:")
print(merged_mind_map.page_content)

Mind Map:


# FlashAttention : Fast and Memory-Efficient Exact Attention with IO-Awareness

## Introduction

### FLOP Reduction

- Sparse-Approximation
- Low-Rank Approximation
- Combinations
- IO-Awareness
  - Reads and Writes
  - Fast and Slow Memory
  - GPU On-Chip SRAM and HBM

## Proposed Method

### FlashAttention

- Tiling
- Reduce Memory Reads/Writes
- Analyze IO Complexity
- Bandwidth & Memory Size
- Attention on GPT-2
  - FlashAttention PyTorch
    - Time (ms)
    - Matmul
    - Mask
    - Softmax
    - Dropout
    - Matmul
    - Fused Kernel
    - Q: N x d
    - V: N X d
    - KT: d x N
    - QKT: N x N
    - sm(Q KT)V: N x d
- Outer Loop
  - Copy Block to SRAM
  - Copy
    - Outer Loop
    - Inner Loop
  - Compute Block on SRAM
  - Output to HBM
- Inner Loop
  - Outer Loop
  - GPU
    - SRAM
    - GPU
    - HBM
    - Main Memory (CPU DRAM)
  - SRAM : 19 TB/s (20 MB)
  - HBM: 1.5 TB/s (40 GB)
  - DRAM : 12.8 GB/s (>1 TB)
- Fewer HBM Accesses
  - Fig. 2
  - Lower Bound
- Memo

In [None]:
import json

with open("mind_map_merge_history.josn", "w") as f:
    json.dump(extract_page_content(history), f, indent=2)

In [None]:
for i in range(len(mind_maps)):
  print(f"sub page {i}")
  print(mind_maps[i].page_content)
  print("-"*10)
  print()

sub page 0


# FlashAttention : Fast and Memory-Efficient Exact Attention with IO-Awareness

## Introduction

### FLOP Reduction

- Sparse-Approximation
- Low-Rank Approximation
- Combinations

### IO-Awareness

- Reads and Writes
- Fast and Slow Memory
- GPU On-Chip SRAM and HBM

## Proposed Method

### FlashAttention

- Tiling
- Reduce Memory Reads/Writes
- Analyze IO Complexity

### Block-Sparse FlashAttention

- Faster than Existing Approximate Attention
- End-to-End Wall-Clock Speedup
- Longer Context in Transformers
- Higher Quality Models
- New Capabilities
----------

sub page 1


# FlashAttention Memory Hierarchy

## Bandwidth & Memory Size

### Attention on GPT-2

- FlashAttention PyTorch
  - Time (ms)
  - Matmul
  - Mask
  - Softmax
  - Dropout
  - Matmul
  - Fused Kernel
  - Q: N x d
  - V: N X d
  - KT: d x N
  - QKT: N x N
  - sm(Q KT)V: N x d

### Outer Loop

- Copy Block to SRAM
- Copy
  - Outer Loop
  - Inner Loop
- Compute Block on SRAM
- Output to HBM

### Inner Loop