## [LangChain GitHub](https://github.com/hwchase17/langchain)

In [None]:
%%bash
pip install langchain

---
* [Agents](https://python.langchain.com/en/latest/modules/agents/tools/getting_started.html)
  * [Hierarchical Planning Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/openapi.html)
  * [Python Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/python.html)

* [Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback](https://arxiv.org/abs/2302.12813)
* [Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications](https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/)

* [Elastic Versatile Agent with LangChain](https://github.com/corca-ai/EVAL)

* https://github.com/pHaeusler/micro-agent
* https://github.com/SamPink/dev-gpt

* https://github.com/yoheinakajima/babyagi

---
* [Chatbots](https://python.langchain.com/en/latest/use_cases/chatbots.html)
  * [Memory](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html)
    * [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html)
    * [ConversationBufferWindowMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer_window.html)
  * [Conversation Agent](https://python.langchain.com/en/latest/modules/agents/agents/examples/conversational_agent.html)
  * [GPT with Google Search](https://github.com/andylokandy/gpt-4-search)

---
* [Chatbot for LangChain](https://blog.langchain.dev/langchain-chat/)
* [Code of Chatbot for LangChain](https://github.com/hwchase17/chat-langchain)
  * [Key Code of Chatbot for LangChain](https://github.com/hwchase17/chat-langchain/blob/master/query_data.py)

---
[ReadTheDocs Documentation Loader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/readthedocs_documentation.html)

In [None]:
%%bash
wget -r -A .html -P rtdocs https://langchain.readthedocs.io/en/latest/

In [None]:
from langchain.document_loaders import ReadTheDocsLoader
ReadTheDocsLoader('rtdocs', features='lxml').load()

[Document(page_content='.rst\n.pdf\nWelcome to LangChain\n Contents \nGetting Started\nModules\nUse Cases\nReference Docs\nLangChain Ecosystem\nAdditional Resources\nWelcome to LangChain#\nLangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an API, but will also:\nBe data-aware: connect a language model to other sources of data\nBe agentic: allow a language model to interact with its environment\nThe LangChain framework is designed with the above principles in mind.\nThis is the Python specific portion of the documentation. For a purely conceptual guide to LangChain, see here. For the JavaScript documentation, see here.\nGetting Started#\nCheckout the below guide for a walkthrough of how to get started using LangChain to create an Language Model application.\nGetting Started Documentation\nModules#\nThere are several main modules that LangChain p

In [None]:
%%bash
rm -rf rtdocs

---
* [ChatGPT Chatbot for Internal Use](https://zenn.dev/tatsui/articles/langchain-chatbot)
  * https://github.com/tatsu-i/chatbot-sample

TODO:
* 提示模版的中文化
* 启示、自我提示、自我认知的模型化？

### Introduction

Have you ever encountered a problem when using ChatGPT to search for the latest information? The current language model of ChatGPT (gpt-3.5-turbo-0301) was trained on data up until September 2021, so it may not be able to answer questions about the latest information accurately.

In this article, we will explain how to create a chatbot that can use chain of thought to respond, by teaching ChatGPT new knowledge.

### Preparing and importing training data

First, clone a repository as training data.

Next, import the repository files as text files using the following code, which contains the OpenAI API key.

In [None]:
import os
import pickle
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.faiss import FAISS

def get_docs(dir_name):
    # (1) Import a series of documents.
    loader = DirectoryLoader(dir_name, loader_cls=TextLoader, silent_errors=True)
    raw_documents = loader.load()
    # (2) Split them into small chunks.
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=200,
    )
    return text_splitter.split_documents(raw_documents)

def ingest_docs(dir_name):
    documents = get_docs(dir_name)
    # (3) Create embeddings for each document (using text-embedding-ada-002).
    embeddings = OpenAIEmbeddings()
    return FAISS.from_documents(documents, embeddings)

vectorstore = ingest_docs('_posts/ultimate-facts')

---

In [None]:
import os
from EdgeGPT import Chatbot as Bing, ConversationStyle

bing = Bing(cookiePath = os.path.expanduser('~/.config/EdgeGPT/cookies.json'))

async def ask(prompt):
    res = (await bing.ask(
        prompt = prompt,
        conversation_style = ConversationStyle.balanced,
    ))['item']['messages'][1]

    print(res['text'])
    print('\n---\n')
    print(res['adaptiveCards'][0]['body'][0]['text'])

In [None]:
await ask('''
text-embedding-ada-002 是什么？
''')

text-embedding-ada-002 是 OpenAI 的一个新的嵌入模型，它替换了五个用于文本搜索、文本相似度和代码搜索的单独模型，并在大多数任务上优于我们以前最强大的模型 Davinci，同时价格降低了 99.8%[^1^]。您可以通过向嵌入 API 端点发送文本字符串以及选择嵌入模型 ID（例如 text-embedding-ada-002）来获取嵌入[^2^]。

---

[1]: https://openai.com/blog/new-and-improved-embedding-model/ "New and improved embedding model - openai.com"
[2]: https://platform.openai.com/docs/guides/embeddings "Embeddings - OpenAI API"

text-embedding-ada-002 是 OpenAI 的一个新的嵌入模型，它替换了五个用于文本搜索、文本相似度和代码搜索的单独模型，并在大多数任务上优于我们以前最强大的模型 Davinci，同时价格降低了 99.8%[^1^][1]。您可以通过向嵌入 API 端点发送文本字符串以及选择嵌入模型 ID（例如 text-embedding-ada-002）来获取嵌入[^2^][2]。



---

### Creating a chatbot

Now, we will create a simple chatbot using the LLM chain.

```
/usr/local/anaconda3/envs/biobot/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py:191: UserWarning: `ChatVectorDBChain` is deprecated - please use `from langchain.chains import ConversationalRetrievalChain`
```

* [LangChain's Retrieval Plugin](https://blog.langchain.dev/retrieval/)
* [Chat Over Documents with Chat History](https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html)

---
LangChain GitHub

* [LangChain Document Loaders](https://github.com/hwchase17/langchain/tree/master/langchain/document_loaders)
  * [Loader that uses Selenium to load URLs](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/url_selenium.py)
  * [Loader that uses Playwright to load URLs](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/url_playwright.py)
  * [Loads files from a Git repository](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/git.py)
  * [Loader that loads .ipynb notebook files](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/notebook.py)
  * [Load HTML files of URLs](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/url.py)
    => [from unstructured.partition.auto import partition](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/auto.py)
  * [Read tweets of user twitter handle](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/twitter.py)
  * [Load data from Google Drive](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/googledrive.py)
  * [Load PDF files](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/pdf.py)
    * todo: https://arxiv.org/pdf/2304.03442.pdf
  * [Load Python files as text files](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/python.py)
  * [Load EPub files as unstructured files](https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/epub.py)

In [None]:
from langchain.chains.llm import LLMChain
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores.base import VectorStore
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Callback function to stream answers to stdout.
manager = CallbackManager([StreamingStdOutCallbackHandler()])

streaming_llm = ChatOpenAI(streaming=True, callback_manager=manager, verbose=True, temperature=0)
question_gen_llm = ChatOpenAI(temperature=0, verbose=True, callback_manager=manager)
# Prompt to generate independent questions by incorporating chat history and a new question.
question_generator = LLMChain(llm=question_gen_llm, prompt=CONDENSE_QUESTION_PROMPT)
# Pass in documents and a standalone prompt to answer questions.
doc_chain = load_qa_chain(streaming_llm, chain_type='stuff', prompt=QA_PROMPT)
# Generate prompts from embedding model.
qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)

The prompt given to ChatGPT's API is created in the following steps.

In [None]:
question = 'What makes Remix different from existing frameworks? Please list in bullet points in English.'
qa({'question': question, 'chat_history': []})

The given context does not provide information about Remix or any existing frameworks, so it is not possible to answer this question.

{'question': 'What makes Remix different from existing frameworks? Please list in bullet points in English.',
 'chat_history': [],
 'answer': 'The given context does not provide information about Remix or any existing frameworks, so it is not possible to answer this question.'}

In [None]:
question = '你知道什么？'
qa({'question': question, 'chat_history': []})

I don't know. The context provided is a collection of excerpts from different sources and it is not clear what specific information is being referred to.

{'question': '你知道什么？',
 'chat_history': [],
 'answer': "I don't know. The context provided is a collection of excerpts from different sources and it is not clear what specific information is being referred to."}

In [None]:
question = '你知道什么？'
qa({'question': question, 'chat_history': []})

I am an AI language model and I only have access to the information provided in the context above. From the context, it discusses topics such as the nature of science, psychology, philosophy, and religion. It also touches on concepts such as truth, reality, consciousness, and perception. However, it is difficult to provide a specific answer to your question because it is very broad. Please provide more context or a specific question for me to answer.

{'question': '你知道什么？',
 'chat_history': [],
 'answer': 'I am an AI language model and I only have access to the information provided in the context above. From the context, it discusses topics such as the nature of science, psychology, philosophy, and religion. It also touches on concepts such as truth, reality, consciousness, and perception. However, it is difficult to provide a specific answer to your question because it is very broad. Please provide more context or a specific question for me to answer.'}

### Ask a question

```
How is Remix different from existing frameworks?
Please list in bullet points in Japanese.
```

The following bullet points will be output as an answer summarizing the context.

In [None]:
# Get context related to the question from the embedding model
for context in vectorstore.similarity_search(question):
    print(f'{context}\n')

page_content='> 我们刚刚知道自然科学借以掌握质的方法––形成量的概念的方法。我们必须提出的问题是，这种方法是不是也能够适用于主观的意识的质。按照我们前面所说，为了使这种方法能够加以运用，必须有与这些质充分确定地、唯一地联系着的空间变化。如果情况真的如此，那么这个问题就可以通过空间–时间的重合方法来解决，因而**测量**便是可能的。但是，这种重合的方法本质上就是进行物理的观察，而就内省法来说，却不存在物理的观察这种事情。由此立刻就可以得出结论：心理学沿着内省的途径决不可能达到知识的理想。因此，它必须尽量使用物理的观察方法来达到它的目的。但这是不是可能的呢？是不是有依存于意识的质的空间变化，就像例如在光学中干涉带的宽度依存于颜色，在电学中磁铁的偏转度依存于磁场的强度那样呢？\n> 现在我们知道，事实上应当承认在主观的质和推断出来的客观世界之间有一种确切规定的、一义的配列关系。大量的经验材料告诉我们，我们可以发现，至少必须假设与所有经验唯一地联系着的“物理的”过程的存在。没有什么意识的质不可能受到作用于身体的力的影响。的确，我们甚至能够用一种简单的物理方法，例如吸进一种气体，就把意识全部消除掉。我们的行动与我们的意志经验相联系，幻觉与身体的疲惫相联系，抑郁症的发作与消化的紊乱相联系。为了研究这类相互联系，心的理论必须抛弃纯粹内省的方法而成为**生理的**心理学。只有这个学科才能在理论上达到对心理的东西的完全的知识。借助于这样一种心理学，我们就可以用概念和所与的主观的质相配列，正如我们能够用概念与推论出来的客观的质相配列一样。这样，主观的质就像客观的质一样成为可知的了。' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

page_content='真理、真实；神造真实、人造真实；真实，想象；记忆，拟构。\n如果哲学更像「真理」，那么各类科学就更像「真实」。如果物理学更像「真理」，那么化学就更像「真实」。如果化学更像「真理」，那么生物学、生理学就更像「真实」。如果生理学更像「真理」，那么脑科学、神经科学就更像「真实」。\n如果理科更像「神造真实」，那么工科就更像「人造真实」。如果生理学更像「神造真实」，那么医学、药学就更像「人造真实」。\n\n---\n\n> 我只是一个碳族生物；一个

```
Remix's job is to cross the center of the stack and then get out of your way. We avoid as many "Remixisms" as possible and instead make it easier to use the standard APIs the web already has.

This one is more for us. We've been educators for the 5 years before Remix. Our tagline is Build Better Websites. We also think of it with a little extra on the end: Build Better Websites, Sometimes with Remix. If you get good at Remix, you will accidentally get good at web development in general.

Remix's APIs make it convenient to use the fundamental Browser/HTTP/JavaScript, but those technologies are not hidden from you.

Additionally, if Remix doesn't have an adapter for your server already, you can look at the source of one of the adapters and build your own.

## Server Framework
If you're familiar with server-side MVC web frameworks like Rails and Laravel, Remix is the View and Controller, but it leaves the Model up to you. There are a lot of great databases, ORMs, mailers, etc. in the JavaScript ecosystem to fill that space. Remix also has helpers around the Fetch API for cookie and session management.

Instead of having a split between View and Controller, Remix Route modules take on both responsibilities.

Most server-side frameworks are "model focused". A controller manages multiple URLs for a single model.

## Welcome to Remix!
We are happy you're here!

Remix is a full stack web framework that lets you focus on the user interface and work back through web fundamentals to deliver a fast, slick, and resilient user experience that deploys to any Node.js server and even non-Node.js environments at the edge like Cloudflare Workers.

Want to know more? Read the Technical Explanation of Remix

This repository contains the Remix source code. This repo is a work in progress, so we appreciate your patience as we figure things out.

## Documentation
For documentation about Remix, please
```

### Final Answers:

* Remix aims to make it easy to use standard APIs.
* You can learn about web development in general with Remix.
* Remix is a framework that plays the role of both the View and Controller.
* Remix leaves the model to the user.
* Remix provides helpers for the Fetch API.
* Remix can be deployed in non-Node.js environments such as Node.js servers and Cloudflare Workers.

### Notes on API usage:

Regarding ChatGPT on the web, it is currently still an opt-out format (not used for retraining if you apply) as of March 10, 2023, but the ChatGPI API is an opt-in format (not used for retraining unless you apply), and it has been decided that it will not be used for actual model improvement. (It is stored for 30 days for legal monitoring purposes.) Since execution via the API reduces the risk of leakage to third parties other than OpenAI and is not used for retraining, the threshold for using confidential information with ChatGPT seems to have been lowered. The price of the API (gpt-3.5-turbo) is relatively inexpensive at 0.002 dollars per 1000 tokens, while the embedded model (text-embedding-ada-002) is 0.0004 dollars per 1000 tokens. However, if you try to create an embedded model for a large number of files, it will cost more than expected. If the number of tokens cannot be predicted in advance, it is a good idea to calculate the price in advance and decide whether to execute it as follows:

In [None]:
import tiktoken

encoding = tiktoken.encoding_for_model('text-embedding-ada-002')
text = ''
for doc in get_docs('_posts'):
    text += doc.page_content.replace(' ', ' ')
token_count = len(encoding.encode(text, allowed_special='all'))
print(f'Estimated price: {token_count*0.00000004} USD')

Estimated price: 0.03964988 USD


### Summary:

Since ChatGPT learns using past data, it cannot answer questions about the latest information or information that is not publicly available on the internet. This time, by mixing context related to the question content into the prompt, we were able to answer questions about the latest data and files saved locally.

If this article has been even a little helpful to you, I would be delighted. If you have any questions or comments, please feel free to contact me.