# [🦜️ LangChain + Pinecone + Llama2 🦙基於 RAG 的 LLM，讀取自己的 .pdf 來回答問題](https://medium.com/@gary.tsai.advantest/%EF%B8%8F-langchain-pinecone-llama2-%E5%9F%BA%E6%96%BC-rag-%E7%9A%84-llm-%E8%AE%80%E5%8F%96%E8%87%AA%E5%B7%B1%E7%9A%84-pdf-%E4%BE%86%E5%9B%9E%E7%AD%94%E5%95%8F%E9%A1%8C-bf62244feb91)

* 💻 從頭開始​​開發基於檢索增強生成（RAG）的 LLM 程式。
* 🚀 整體流程（加載、分塊、嵌入、索引）

## 動機

近年來，隨著科技巨頭釋出越來越多的開源的大型語言模型 （Large Language Model, LLM），許多人可能受到新聞報導的啟發，希望將這些模型應用於比純粹的對話更為嚴肅的任務，如數據分析或程式編寫。然而，基於模型的訓練資料的數量和多樣性，LLM 可能無法滿足某些特定任務的需求。舉例而言，當我詢問 2024 年台灣的總統誰會當選時，LLM 可能無法提供正確答案，因為它的知識截止日期止於 2021 年 9 月。

面對這種情況，讓我們首先關注 HuggingFace 的 LLM 排行榜。你會發現在前 100 名模型中幾乎找不到預訓練模型（pre-trained）的蹤跡，即使是像九月最新上線的 Falcon 180B 這樣具有數千億參數的模型，在特定任務上仍然可能無法勝任。

因此，我們必須採用自定義/客製化的資料強化 LLM，以使其能夠適應特定任務或提高性能。目前，主要有兩種主流操作策略：微調（fine-tuning）和檢索增強生成。fine-tuning 是修改 LLM 的內部參數，使其更專業化，而 RAG 則通過在推理過程中整合外部知識（例如讀取 PDF 檔案），來擴展 LLM 的能力。


## RAG Architecture Diagram
![RAG Architecture Diagram](https://miro.medium.com/v2/resize:fit:720/format:webp/0*iIKckIYOfy-G8r9s.png)

RAG 是一種動態的操作方法（Approach），它將外部資料嵌入（Embedings）成向量格式式並存儲在向量資料庫中（Pinecone 或是 ChromaDB），然後進行索引（Indexes）。

當有查詢需求（Question）時，RAG 通過餘弦相似度（Cosine）或歐氏距離等演算法來查找向量資料庫，檢索（Indexes）前 k 個最相關的上下文，將向量資料庫吐出的查詢結果傳遞給 LLM ，LLM 在根據提供的內容回復答案給用戶。


## 🦜️ LangChain
![🦜️ LangChain](https://miro.medium.com/v2/resize:fit:720/format:webp/1*gWDMAyh0VhMS-y7k4z5kRw.png)

LangChain 為用戶提供了載入（1, Load）、分割（2, Chunks）、嵌入（3, Embedings）和相似性搜索（6, Similarity Search）及查詢（7, Query）資料的基本元件。

使用 LangChain 將文本切割成多個文檔模塊（Chunks）的主要目的，是為了應對 LLM 的上下文長度限制，並滿足擴展上下文的需求。將大量資料分成多個文檔模塊，每個模塊都有較小的上下文範圍，但可以在整個資料集中滑動，以提供更長的連續上下文。切割成多個文檔模塊的策略旨在應對 LLM 的上下文限制，使其更適合處理大型資料集，並為使用者提供更長的上下文，以提高問答和生成任務的效能。這是一項強大的技術，尤其適用於需要處理大量文本的應用場景，具有重要的實際價值。

1. 上下文長度限制：LLM 的上下文長度限制是指模型可以參考的文字長度。這個限制影響了模型能夠理解和生成的文字序列的長度。以 ChatGPT 為例，它的上下文長度限制通常以 token 數來表示。
2. 擴展上下文的需求：某些應用場景需要更長的上下文，例如處理大量自定義資料，如公司的文件檔案或交易紀錄。這些資料集可能相當龐大，且問題通常需要參考整個資料集才能回答。然而，由於上下文長度限制，使用LLM 處理這些資料集時就必須採取一些複雜的策略。

### Step 1: Install All the Required Packages

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*GU3uxNA5git9BgUVb5YxWw.png)

In [2]:
import site, sys

if "google.colab" in sys.modules:
    !pip install langchain --quiet
    !pip install pypdf --quiet
    !pip install unstructured --quiet
    !pip install sentence_transformers --quiet
    !pip install pinecone-client --quiet
    !pip install llama-cpp-python --quiet
    !pip install huggingface_hub --quiet
else:
    %pip install langchain --quiet
    %pip install pypdf --quiet
    %pip install unstructured --quiet
    %pip install sentence_transformers --quiet
    %pip install pinecone-client --quiet
    %pip install llama-cpp-python --quiet
    %pip install huggingface_hub --quiet
site.main()

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Step 2: Import All the Required Libraries

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*4T9TWHJBJre8rzEWi7vdUQ.png)

In [3]:
from langchain.document_loaders import PyPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from sentence_transformers import SentenceTransformer
from langchain.chains.question_answering import load_qa_chain

import pinecone
import os

### Step 3: Load the Data

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*iqvC_UxNTUr7vDhBFk3oxA.png)

In [4]:
pdf_file = (
    "/Users/morris/Downloads/The-product-and-convolution-of-guassian-distributions.pdf"
)
loader = PyPDFLoader(pdf_file)
data = loader.load()

### Step 4: Split the Text into Chunks

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*79hQbXWBdSOy7jmiZspd6w.png)

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(data)
len(docs)

55

### Step 5: Setup the Environment
填入 Pinecone 資料庫的 API Keys。
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*J4fTZJ-03sQMijYw4bE4qg.png)
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*79hQbXWBdSOy7jmiZspd6w.png)

In [6]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"
PINECONE_API_ENV = os.environ.get("PINECONE_API_ENV", "gcp-starter")
PINECONE_API_KEY = os.environ.get(
    "PINECONE_API_KEY", "1a84b1d0-17c7-4349-93d2-fb0481c05896"
)

### Step 6: Download the Embeddings

放入想使用的 Embedding 模型。
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*50ZumTLlJc57XrNNZEy4jw.png)

您可以在 Hugging Face 上查看此模型的維度為 384。
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*ZEBo4Y-OOZDb0TibS2uQSg.png)

如果連結成功，您在 Pinecone 資料庫中看到的 Indexes 維度將會顯示 384。
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*ik4i341LL9rUQkBrAoy64A.png)

In [8]:
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

### Step 7: Initializing the Pinecone

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*_ExS3LUuKWOXmgQ7Dnrh6Q.png)

In [9]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV,  # next to api key in console
)
index_name = "langchainpinecone"  # put in the name of your pinecone index here

### Step 8: Create Embeddings for Each of the Text Chunk

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*6mDXiUZtZpHrtSLLrW_OQQ.png)

In [11]:
docsearch = Pinecone.from_texts(
    [t.page_content for t in docs], embedding, index_name=index_name
)

### Step 9: Similarity Search

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*WHWhbtwCFgzElgH1_r6Iqw.png)

In [25]:
query = "What is the type for functions resulting from the product and the convolution of Gaussian probability density functions?"
docs = docsearch.similarity_search(query)
docs

[Document(page_content='Products and Convolutions of Gaussian Probability\nDensity Functions\nP. A. Bromiley\nImaging Sciences Research Group, Institute of Population He alth,\nSchool of Medicine, University of Manchester,\nManchester, M13 9PT, UK\npaul.bromiley@manchester.ac.uk\nAbstract\nIt is well known that the product and the convolution of Gauss ian probability density functions (PDFs)\nare also Gaussian functions. This document provides proofs of this for several cases; the product of'),
 Document(page_content='−∞f(u)du][∫∞\n−∞g(t)dt]\nTherefore, the preservation of the normalisation when conv olving PDFs i.e. the fact that the convolution is also a\nPDF, normalised such that the area under the function is equa l to unity, is a special case rather than being true\nin general.\n5 Summary\nIt is well known that the product and the convolution of a pair of Gaussian PDFs are also Gaussian. In the case\nof the product of two univariate Gaussian PDFs N(µf,σf) andN(µg,σg), the result i

### Step 10: Query the Docs to get the Answer Back (Llama 2 Model)

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*qo5Ks8CfjymEJVu4MxnwZg.png)

In [20]:
import site, sys

if "google.colab" in sys.modules:
    !CMAKE_ARGS="-DCMAKE_CUBLAS=on" FORCE_CMAKE=1 pip install ydata-profiling==4.6.2 numba==0.58.1 llama-cpp-python==0.2.20 --force-reinstall --upgrade --no-cache-dir --verbose
else:
    %%sh
    CMAKE_ARGS="-DCMAKE_CUBLAS=on" FORCE_CMAKE=1 pip install ydata-profiling==4.6.2 numba==0.58.1 llama-cpp-python==0.2.20 --force-reinstall --upgrade --no-cache-dir --verbose

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using pip 23.3.1 from /Users/morris/Documents/code/ntu-csie-information-retrieval/.conda/lib/python3.10/site-packages/pip (python 3.10)
Collecting ydata-profiling==4.6.2
  Obtaining dependency information for ydata-profiling==4.6.2 from https://files.pythonhosted.org/packages/0a/cb/d946b8ab543dfcd6cdf66eb3dfe1d6b39dfdede57b3c8e9115822b3c7bee/ydata_profiling-4.6.2-py2.py3-none-any.whl.metadata
  Downloading ydata_profiling-4.6.2-py2.py3-none-any.whl.metadata (20 kB)
  Link requires a different Python (3.10.13 not in: '>=3.6,<3.9'): https://files.pythonhosted.org/packages/d1/68/d872f91bcb57c00c54835beb950a9d9ceb99e497f167fa333e8eba968ecc/numba-0.52.0rc3.tar.gz (from https://pypi.org/simple/numba/) (requires-py

  Running command pip subprocess to install build dependencies
  Collecting scikit-build-core>=0.5.1 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached scikit_build_core-0.7.0-py3-none-any.whl.metadata (19 kB)
  Collecting exceptiongroup (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached exceptiongroup-1.2.0-py3-none-any.whl.metadata (6.6 kB)
  Collecting packaging>=20.9 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
  Collecting tomli>=1.1 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
  Collecting pathspec>=0.10.1 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached pathspec-0.11.2-py3-none-any.whl.metadata (19 kB)
  Collecting pyproject-metadata>=0.5 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)
  Using cac

  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started


  Running command Getting requirements to build wheel


  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started


  Running command pip subprocess to install backend dependencies
  Collecting cmake>=3.21
    Using cached cmake-3.27.7-py2.py3-none-macosx_10_10_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.macosx_11_0_universal2.whl.metadata (6.7 kB)
  Using cached cmake-3.27.7-py2.py3-none-macosx_10_10_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.macosx_11_0_universal2.whl (47.4 MB)
  Installing collected packages: cmake
  Successfully installed cmake-3.27.7


  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started


  Running command Preparing metadata (pyproject.toml)
  [92m***[0m [1m[92mscikit-build-core 0.7.0[0m using [94mCMake 3.27.7[0m [91m(metadata_wheel)[0m[0m


  Preparing metadata (pyproject.toml): finished with status 'done'
  Link requires a different Python (3.10.13 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/99/f1/c00d6be56e1a718a3068079e3ec8ce044d7179345280f6a3f5066068af0d/scipy-1.6.2.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
  Link requires a different Python (3.10.13 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/fe/fd/8704c7b7b34cdac850485e638346025ca57c5a859934b9aa1be5399b33b7/scipy-1.6.3.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
  Link requires a different Python (3.10.13 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/bb/bb/944f559d554df6c9adf037aa9fc982a9706ee0e96c0d5beac701cb158900/scipy-1.7.0.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
  Link requires a different Python (3.10.13 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/47/33/a24aec22b7be7fdb10ec117a95e1e4099

  Running command python setup.py egg_info
  running egg_info
  creating /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info
  writing /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/PKG-INFO
  writing dependency_links to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/dependency_links.txt
  writing entry points to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/entry_points.txt
  writing top-level names to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/top_level.txt
  writing manifest file '/private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/SOURCES.txt'
  reading manifest file '/private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-9y9gxa5g/htmlmin.egg-info/SOURCES.

  Preparing metadata (setup.py): finished with status 'done'
Collecting phik<0.13,>=0.11.1 (from ydata-profiling==4.6.2)
  Downloading phik-0.12.3-cp310-cp310-macosx_11_0_arm64.whl (649 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m649.9/649.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting requests<3,>=2.24.0 (from ydata-profiling==4.6.2)
  Obtaining dependency information for requests<3,>=2.24.0 from https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
  Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm<5,>=4.48.2 (from ydata-profiling==4.6.2)
  Obtaining dependency information for tqdm<5,>=4.48.2 from https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl.metadata
  Downloading tqdm-4.66.1-py3-none-any.whl.metadata (57 kB)
[2K

  Running command python setup.py egg_info
  running egg_info
  creating /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info
  writing /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info/PKG-INFO
  writing dependency_links to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info/dependency_links.txt
  writing entry points to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info/entry_points.txt
  writing requirements to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info/requires.txt
  writing top-level names to /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.egg-info/top_level.txt
  writing manifest file '/private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-pip-egg-info-b5kv7q13/wordcloud.eg

  Preparing metadata (setup.py): finished with status 'done'
Collecting dacite>=1.8 (from ydata-profiling==4.6.2)
  Obtaining dependency information for dacite>=1.8 from https://files.pythonhosted.org/packages/21/0f/cf0943f4f55f0fbc7c6bd60caf1343061dff818b02af5a0d444e473bb78d/dacite-1.8.1-py3-none-any.whl.metadata
  Downloading dacite-1.8.1-py3-none-any.whl.metadata (15 kB)
  Link requires a different Python (3.10.13 not in: '>=3.6,<3.10'): https://files.pythonhosted.org/packages/19/66/6b2c49c7c68da48d17059882fdb9ad9ac9e5ac3f22b00874d7996e3c44a8/llvmlite-0.36.0.tar.gz (from https://pypi.org/simple/llvmlite/) (requires-python:>=3.6,<3.10)
  Link requires a different Python (3.10.13 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/55/21/f7df5d35f3f5d0637d64a89f6b0461f2adf78e22916d6372486f8fc2193d/llvmlite-0.37.0.tar.gz (from https://pypi.org/simple/llvmlite/) (requires-python:>=3.7,<3.10)
Collecting llvmlite<0.42,>=0.41.0dev0 (from numba==0.58.1)
  Obtaining dependency inf



[?25hCollecting MarkupSafe>=2.0 (from jinja2<3.2,>=2.11.1->ydata-profiling==4.6.2)
  Obtaining dependency information for MarkupSafe>=2.0 from https://files.pythonhosted.org/packages/20/1d/713d443799d935f4d26a4f1510c9e61b1d288592fb869845e5cc92a1e055/MarkupSafe-2.1.3-cp310-cp310-macosx_10_9_universal2.whl.metadata
  Downloading MarkupSafe-2.1.3-cp310-cp310-macosx_10_9_universal2.whl.metadata (3.0 kB)
Collecting contourpy>=1.0.1 (from matplotlib<=3.7.3,>=3.2->ydata-profiling==4.6.2)
  Obtaining dependency information for contourpy>=1.0.1 from https://files.pythonhosted.org/packages/fe/26/43821d61b7ee62c1809ec852bc572aaf4c27f101ebcebbbcce29a5ee0445/contourpy-1.2.0-cp310-cp310-macosx_11_0_arm64.whl.metadata
  Downloading contourpy-1.2.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.8 kB)
Collecting cycler>=0.10 (from matplotlib<=3.7.3,>=3.2->ydata-profiling==4.6.2)
  Obtaining dependency information for cycler>=0.10 from https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f594

[0m  Running command Building wheel for llama-cpp-python (pyproject.toml)
  [92m***[0m [1m[92mscikit-build-core 0.7.0[0m using [94mCMake 3.27.7[0m [91m(wheel)[0m[0m
  [92m***[0m [1mConfiguring CMake...[0m
  loading initial cache file /var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/tmpqqkoqd2b/build/CMakeInit.txt
  -- The C compiler identification is AppleClang 15.0.0.15000040
  -- The CXX compiler identification is AppleClang 15.0.0.15000040
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
  -- Detecting CXX compile featu

  Building wheel for llama-cpp-python (pyproject.toml): finished with status 'done'
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.20-cp310-cp310-macosx_14_0_arm64.whl size=1668776 sha256=daa125feea5537d7fac4ee044458418dd05bf8c7c9a849ed46172628d07c96a9
  Stored in directory: /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-ephem-wheel-cache-65ugp3ri/wheels/ef/f2/d2/0becb03047a348d7bd9a5b91ec88f4654d6fa7d67ea4e84d43
  Building wheel for htmlmin (setup.py): started


  Running command python setup.py bdist_wheel
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib
  creating build/lib/htmlmin
  copying htmlmin/escape.py -> build/lib/htmlmin
  copying htmlmin/command.py -> build/lib/htmlmin
  copying htmlmin/decorator.py -> build/lib/htmlmin
  copying htmlmin/__init__.py -> build/lib/htmlmin
  copying htmlmin/parser.py -> build/lib/htmlmin
  copying htmlmin/main.py -> build/lib/htmlmin
  copying htmlmin/middleware.py -> build/lib/htmlmin
  creating build/lib/htmlmin/python3html
  copying htmlmin/python3html/__init__.py -> build/lib/htmlmin/python3html
  copying htmlmin/python3html/parser.py -> build/lib/htmlmin/python3html
  running egg_info
  writing htmlmin.egg-info/PKG-INFO
  writing dependency_links to htmlmin.egg-info/dependency_links.txt
  writing entry points to htmlmin.egg-info/entry_points.txt
  writing top-level names to htmlmin.egg-info/top_level.txt
  reading manifest file 'htmlmin.egg-info/SOURC

  Building wheel for htmlmin (setup.py): finished with status 'done'
  Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.whl size=27082 sha256=0d1da4314d8ea1ec109b863aced51efa4b15dd870eebbc765d5ad1a43ee8c0f7
  Stored in directory: /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-ephem-wheel-cache-65ugp3ri/wheels/dd/91/29/a79cecb328d01739e64017b6fb9a1ab9d8cb1853098ec5966d
  Building wheel for wordcloud (setup.py): started


  Running command python setup.py bdist_wheel
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-11.0-arm64-cpython-310
  creating build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/wordcloud_cli.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/_version.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/__init__.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/tokenization.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/wordcloud.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/color_from_image.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/__main__.py -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/stopwords -> build/lib.macosx-11.0-arm64-cpython-310/wordcloud
  copying wordcloud/DroidSansMono.ttf -> build/lib.macosx-11.0-arm64-cpy

  Building wheel for wordcloud (setup.py): finished with status 'done'
  Created wheel for wordcloud: filename=wordcloud-1.9.2-cp310-cp310-macosx_11_0_arm64.whl size=151934 sha256=6173bbba72e515f0addaf2a42e2764409e44a8ecfdf4c0d5a14555059dbd8395
  Stored in directory: /private/var/folders/nm/z_st25815r577yfsqfmgj25h0000gn/T/pip-ephem-wheel-cache-65ugp3ri/wheels/56/72/cc/86d8dbd1e3a8ef5470177cf6a4d25ec3cbebac55402e0ff4f0
Successfully built llama-cpp-python htmlmin wordcloud
Installing collected packages: pytz, htmlmin, urllib3, tzdata, typing-extensions, tqdm, tangled-up-in-unicode, six, PyYAML, pyparsing, pillow, packaging, numpy, networkx, multimethod, MarkupSafe, llvmlite, kiwisolver, joblib, idna, fonttools, diskcache, dacite, cycler, charset-normalizer, certifi, attrs, annotated-types, typeguard, scipy, requests, PyWavelets, python-dateutil, pydantic-core, patsy, numba, llama-cpp-python, jinja2, contourpy, pydantic, pandas, matplotlib, imagehash, wordcloud, visions, statsmodels, sea

### Step 11: Import All the Required Libraries

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*b5Drov958AKpt3dQJP5kiQ.png)

In [22]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from huggingface_hub import hf_hub_download
from langchain.chains.question_answering import load_qa_chain

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

### Step 12: Quantized Models from the Hugging Face Community

Hugging Face 提供了量化模型的功能，讓我們能夠在 T4 GPU（因為使用Google Colab）上高效且有效地利用模型。在本例中，我們將使用名為Llama-2–13B-chat-GGML 的模型。

量化會降低精度以優化資源使用。量化是一種透過使用低精度數據類型（如8 位整數 int8，而不是通常的 32 位浮點 float32）來表示權重和激活，以減少運行推論計算和內存成本的技術。

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*CuVf4k23aHTQVqMI2Lw9Ig.png)


In [31]:
# https://huggingface.co/TheBloke/Llama-2-13B-Chat-GGML
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin"  # the model is in bin format
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

n_gpu_layers = 4  # Change this value based on your model and your GPU VRAM pools.
n_batch = 16  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Load the model,
llm = LlamaCpp(
    model_path=model_path,
    max_tokens=32,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=64,
    verbose=False,
)
chain = load_qa_chain(llm, chain_type="stuff")

EntryNotFoundError: 404 Client Error. (Request ID: Root=1-656a05ca-01a7d339054af0811bf088ee;1b1e449d-cd3b-446a-a3c2-e9c6a30dbffd)

Entry Not Found for url: https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat.ggufv3.q5_1.bin.

當有查詢需求（Question）時，RAG 通過相似性搜索（Similarity Search）檢索（Indexes）出前 3 個最相關的上下文。

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*7cisvY--C_rUTB-3oGNk4Q.png)
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*WAly5vipoUytSRZMa22MrA.png)



In [None]:
query = "What is the type for functions resulting from the product and the convolution of Gaussian probability density functions?"
docs = docsearch.similarity_search(query)
docs

將向量資料庫吐出的查詢結果傳遞給 LLM ，LLM 在根據提供的內容回復答案給用戶。
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*imHEsq0QF-W37cdcHVGP7w.png)
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*vZQjPTkluS0mM52-OJdcvA.png)

結合 LangChain、Pinecone 以及 Llama2 等技術，基於 RAG 的大型語言模型能夠高效地從您自己的 PDF 文件中提取信息，並準確地回答與 PDF 相關的問題。一旦成功完成任務，對於您或貴就公司而言，將是一個重大的應用。

In [None]:
chain.run(input_documents=docs, question=query)