[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Fret-KoXuw3Y0dev5IzysbRaRsSs0m0r?usp=sharing)

(聲明：以下內容都是在網路上整理並修改的，真正我原創的內容並不多，我主要只是搬運工)

這部分其實官方還在開發中，所以你可以預期連範例也可能會有bug

## 使用場景

程式碼分析是最受歡迎的 LLM 應用程式之一 (例如： [GitHub Co-Pilot](https://github.com/features/copilot), [Code Interpreter](https://chat.openai.com/auth/login?next=%2F%3Fmodel%3Dgpt-4-code-interpreter), [Codium](https://www.codium.ai/), and [Codeium](https://codeium.com/about)) 目前有以下的使用場景:

- 對程式碼庫進行問答以了解其工作原理
- 使用LLMs提出重構或改進建議
- 使用LLMs記錄代碼
![Image description](https://python.langchain.com/assets/images/code_understanding-cd1bda63c69e227203a1d5a7e8133887.png)
## 概述



程式碼分析的問答流程遵循 [我們為文件問答執行的步驟](/docs/extras/use_cases/question_answering), 但有一些差異:

特別的是我們可以採用[分而治之的策略](https://python.langchain.com/docs/integrations/document_loaders/source_code)來完成以下幾件事:

* 讓程式碼中的每個最頂級函數和類別被載入到單獨的檔案中
* 將剩餘部分放入單獨的文件中
* 保留有關每個拆分來自何處的元數據

## 快速開始

In [1]:
!pip install openai tiktoken chromadb langchain
!pip install gitpython
import os
import dotenv
os.environ["OPENAI_API_KEY"] ="這邊要放自己的 OPEN AI API KEY"
# Set env var OPENAI_API_KEY or load from a .env file

dotenv.load_dotenv()

Collecting openai
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m51.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.13-py3-none-any.whl (437 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m437.8/437.8 kB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.306-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m74.9 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K    

False

我們將遵循 [本筆記本的結構](https://github.com/cristobalcl/LearningLangChain/blob/master/notebooks/04%20-%20QA%20with%20code.ipynb)並採用 [根據上下文相關性進行的程式碼分割](https://python.langchain.com/docs/integrations/document_loaders/source_code).

### 載入

我們將使用`langchain.document_loaders.TextLoader`上傳所有的python project.

以下腳本迭代 LangChain 儲存庫中的檔案並載入每個`.py`檔案（又稱文件）：

In [2]:
from git import Repo
from langchain.text_splitter import Language
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser

In [3]:
# Clone
repo_path = "/Users/rlm/Desktop/test_repo"
repo = Repo.clone_from("https://github.com/langchain-ai/langchain", to_path=repo_path)


我們使用 [`LanguageParser`](https://python.langchain.com/docs/integrations/document_loaders/source_code)加載 py 程式碼，這將：

* 將頂級函數和類別放在一起（放入單一文件中）
* 將剩餘程式碼放入單獨的文件中
* 保留有關每個拆分來自何處的元數據

In [4]:
# Load
loader = GenericLoader.from_filesystem(
    repo_path+"/libs/langchain/langchain",
    glob="**/*",
    suffixes=[".py"],
    parser=LanguageParser(language=Language.PYTHON, parser_threshold=500)
)
documents = loader.load()
len(documents)

1546

### 分割

將其分割`Document` 成區塊以進行嵌入和向量儲存。

我們可以使用 `RecursiveCharacterTextSplitter` w/ `language`設定。

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
python_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON,
                                                               chunk_size=2000,
                                                               chunk_overlap=200)
texts = python_splitter.split_documents(documents)
len(texts)

4695

### RetrievalQA 檢索問答

我們需要以一種可以一般文字搜尋其內容的方式儲存文件。

最常見的方法是把每個文件的內容文字向量化，然後將嵌入向量和文件儲存在向量存儲中。

當設定向量存儲檢索器時：

* 我們測試[檢索的最大邊際相關性](/docs/extras/use_cases/question_answering)
* 檢索器會傳回8份文件


#### 深入學習

- [瀏覽超過 40 個向量儲存資料集](https://integrations.langchain.com/)
- [有關向量儲存的更多資料](/docs/modules/data_connection/vectorstores/)
- [瀏覽超過 30 個文字嵌入資料集](https://integrations.langchain.com/)
- [有關嵌入模型的更多資料](/docs/modules/data_connection/text_embedding/)

In [6]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))
retriever = db.as_retriever(
    search_type="mmr", # Also test "similarity"
    search_kwargs={"k": 8},
)

### 聊天

聊天測試，透過[聊天機器人](/docs/extras/use_cases/chatbots)檢索程式碼資訊.

#### 深入學習

- [在此瀏覽超過55 個 LLM 和聊天模型資料集](https://integrations.langchain.com/)
- [有關LLM和聊天模型的更多文件請參閱此處](/docs/modules/model_io/models/)
- 使用本地 LLMS： [PrivateGPT](https://github.com/imartinez/privateGPT)和[GPT4All](https://github.com/nomic-ai/gpt4all)的流行反應了本地運行 LLM 的重要性。

In [7]:
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationalRetrievalChain
llm = ChatOpenAI(model_name="gpt-4")
memory = ConversationSummaryMemory(llm=llm,memory_key="chat_history",return_messages=True)
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

In [8]:
question = "How can I initialize a ReAct agent?"
result = qa(question)
result['answer']

'The steps to initialize a ReAct agent are as follows:\n\n1. Import the necessary modules and classes. This includes the `OpenAI` from `langchain.llms`, `initialize_agent` from `langchain.agents`, `ZapierToolkit` from `langchain.agents.agent_toolkits`, and `ZapierNLAWrapper` from `langchain.utilities.zapier`.\n\n2. Create an instance of `OpenAI` with the desired temperature setting. This instance is used as the language model.\n\n3. Create an instance of `ZapierNLAWrapper`. This is used to create a toolkit for Zapier, an online automation tool.\n\n4. Create an instance of `ZapierToolkit` by calling `from_zapier_nla_wrapper` and passing in the `ZapierNLAWrapper` instance.\n\n5. Use the `initialize_agent` function to create the ReAct agent. This function takes in the tools from the `ZapierToolkit`, the `OpenAI` instance, and the type of agent to create (`AgentType.ZERO_SHOT_REACT_DESCRIPTION` in this case).\n\n6. Once the agent is created, you can run it using the `run` method and pass i

In [9]:
questions = [
    "What is the class hierarchy?",
    "What classes are derived from the Chain class?",
    "What one improvement do you propose in code in relation to the class herarchy for the Chain class?",
]

for question in questions:
    result = qa(question)
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

-> **Question**: What is the class hierarchy? 

**Answer**: The class hierarchy for initializing a ReAct agent is as follows:

- BaseSingleActionAgent
    - LLMSingleActionAgent
    - OpenAIFunctionsAgent
    - XMLAgent
    - Agent
        - <name>Agent (For example: ZeroShotAgent, ChatAgent, ReActDocstoreAgent)
- BaseMultiActionAgent
    - OpenAIMultiFunctionsAgent

In this hierarchy, ReActDocstoreAgent is a subclass of the Agent class, which itself is a subclass of several classes including the BaseSingleActionAgent. 

-> **Question**: What classes are derived from the Chain class? 

**Answer**: The classes that are derived from the Chain class are:

1. BaseConversationalRetrievalChain
2. ConstitutionalChain
 

-> **Question**: What one improvement do you propose in code in relation to the class herarchy for the Chain class? 

**Answer**: Based on the provided code, one improvement could be to include more explicit comments or docstrings for each class in the hierarchy. This would ma




我們可以查看[LangSmith](https://smith.langchain.com/public/2b23045f-4e49-4d2d-8980-dec85259af36/r) 追蹤來了解幕後發生的情況：

* 特別得是該程式碼會有良好的結構且在檢索結果中整齊地組織在一起。
* 檢索到的程式碼和聊天記錄將傳遞給 LLM 進行答案蒸餾

![Image description](https://python.langchain.com/assets/images/code_retrieval-238439ab9f5edfe8cbdbc6fcfbc97179.png)


### 開源 LLMs

我們可以透過  LLamaCPP or [Ollama integration](https://ollama.ai/blog/run-code-llama-locally)使用[Code LLaMA](https://about.fb.com/news/2023/08/code-llama-ai-for-coding/)

注意：請務必升級`llama-cpp-python`才能使用新的`gguf` [檔案格式](https://github.com/abetlen/llama-cpp-python/pull/633)

```
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 /Users/rlm/miniforge3/envs/llama2/bin/pip install -U llama-cpp-python --no-cache-dir
```

[查看最新的 code-llama 模型](https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF/tree/main)

In [10]:
!pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.11.tar.gz (3.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.11-cp310-cp310-manylinux_2_35_x86_64.whl size=1023478 sha256=73a14995a5b6878a9cff653d22e0565cf0efa81a57606887ccd3f1d24d759073
  Store

In [11]:
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationalRetrievalChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [12]:
# 下載code llama模型到 colab
import requests


url = "https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF/resolve/e94db8d144152f0b5e153dcb0ac0a266f1588fc3/codellama-13b-instruct.Q4_K_M.gguf"


response = requests.get(url, stream=True)
response.raise_for_status()

with open("codellama-13b-instruct.Q4_K_M.gguf", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

print("模型已成功下載！")

模型已成功下載！


In [13]:

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(
    model_path="/content/codellama-13b-instruct.Q4_K_M.gguf",
    n_ctx=5000,
    n_gpu_layers=1,
    n_batch=512,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [14]:
llm("Question: In bash, how do I list all the text files in the current directory that have been modified in the last month? Answer:")

 To get a list of all the text files in the current directory that have been modified in the last month.
You can use this bash command: find . -type f \( -iname '*.txt' \) \-mtime +30 -print This is because in order to perform this operation in bash, you need to use several different subcommands and options together when invoking the command find.
Here is an explanation of each of the main components that make up the bash command that is used to perform this particular operation in bash: . : This is a period symbol that is used as part of the name or path of the file that you want to perform this operation on.
Here is an example of how this symbol might be used: .txt : This is a slash symbol that is used as part of the name or path of the file that you want to perform this operation on.
Here is an example of how this symbol might be used: dir/ : This is a string literal that contains some text characters.
Here is an example of what this particular type of string literal is intended to 

" To get a list of all the text files in the current directory that have been modified in the last month.\nYou can use this bash command: find . -type f \\( -iname '*.txt' \\) \\-mtime +30 -print This is because in order to perform this operation in bash, you need to use several different subcommands and options together when invoking the command find.\nHere is an explanation of each of the main components that make up the bash command that is used to perform this particular operation in bash: . : This is a period symbol that is used as part of the name or path of the file that you want to perform this operation on.\nHere is an example of how this symbol might be used: .txt : This is a slash symbol that is used as part of the name or path of the file that you want to perform this operation on.\nHere is an example of how this symbol might be used: dir/ : This is a string literal that contains some text characters.\nHere is an example of what this particular type of string literal is int

In [15]:
from langchain.chains.question_answering import load_qa_chain

# Prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

由於下列這個功能還處於 close Beta ，我目前沒拿到 LangSmith API 金鑰(close Beta) ，所以這邊我就沒RUN
-----------------------------------------------------
我們還可以使用 LangChain Prompt Hub 來儲存和取得prompts。

這將與您的[LangSmith API 金鑰](https://docs.smith.langchain.com/)一起使用。

讓我們在[此處](https://smith.langchain.com/hub/rlm/rag-prompt)嘗試使用預設的 RAG 提示。

In [17]:
!pip install langchainhub

Collecting langchainhub
  Downloading langchainhub-0.1.13-py3-none-any.whl (3.4 kB)
Collecting types-requests<3.0.0.0,>=2.31.0.2 (from langchainhub)
  Downloading types_requests-2.31.0.7-py3-none-any.whl (14 kB)
Installing collected packages: types-requests, langchainhub
Successfully installed langchainhub-0.1.13 types-requests-2.31.0.7


In [18]:
from langchain import hub
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
export LANGCHAIN_API_KEY=<your-api-key>
QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-default")

HTTPError: ignored

In [None]:
# Docs
question = "How can I initialize a ReAct agent?"
docs = retriever.get_relevant_documents(question)

# Chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=QA_CHAIN_PROMPT)

# Run
chain({"input_documents": docs, "question": question}, return_only_outputs=True)

Llama.generate: prefix-match hit


Here's the trace [RAG](https://smith.langchain.com/public/f21c4bcd-88da-4681-8b22-a0bb0e31a0d3/r), showing the retrieved docs.