# 1. 准备

* 安装python 3.8以上，目前的最新版本是3.11：
  https://www.python.org/downloads/release/python-3111/
* 本例采用jupyter-lab作为开发环境，因此需要在电脑上安装jupyter-lab。
  https://jupyter.org/ 

* 注册openai账户，并设置OPENAI_API_KEY环境变量
* redis服务需要安装redis-stack
  ```bash
  docker run -d -p 13333:8001 -p 10001:6379 redis/redis-stack:latest
  ```
* 然后安装相关的python依赖包
  ```bash
  pip install openai
  pip install langchain
  pip install urllib
  pip install redis
  ```

In [3]:
import requests
import urllib.request
import json
from lxml import etree
import time
import random
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.redis import Redis
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document
import redis


# 2. 爬取数据

本例略过所有与爬虫有关的内容。

In [5]:
# 略
# 伪造的数据，这些信息应该要通过爬虫爬出来
notes=[{
    'title':'abc',
    'source':'https://www.baidu.com',
    'content':'defg',
    'uid':'123456',
    'uname':'haha哥'
}]

# 3. 把数据以word vector的格式存入redis数据库。

In [6]:
redis_url='redis://localhost:10001'
embeddings=OpenAIEmbeddings()
count=0
for note in notes:
    uid=note['uid']
    uname=note['uname']
    titl=note['title']
    _content=note['content']
    src=note['source']
    content= str(titl) +'\n\n' + str(_content)
    meta = {'source':src, 'title':titl, 'uid':uid, 'uname':uname}
    doc = Document(page_content=content,metadata=meta)
    rds = Redis.from_documents([doc],embeddings,redis_url=redis_url,index_name='link')
    count=count+1

print('Inserted docs:',str(count))


Inserted docs: 1


# 4. 尝试做一次最相似查询
当然，你如果直接用我上边的例子是查不出来我下边这些东西的，因为我这些内容是提前弄进数据库里的，你也可以通过自己的方法把数据写进数据库。

In [7]:
query = '自驾318'
rds.similarity_search(query)

[Document(page_content='自驾318去西藏的路线图收好\n\n自驾318去西藏的路线图收好，当然，也可以跟阿杰旅行的车队，省心省力就能自驾西藏，有车无车都可以加入\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', metadata={'source': 'https://www.xiaohongshu.com/explore/644a401c00000000130058ed', 'note_id': '644a401c00000000130058ed', 'title': '自驾318去西藏的路线图收好', 'type': 'video', 'user_id': '62244dc60000000010004d5f', 'user_nick_name': '阿杰旅行丨自驾攻略'}),
 Document(page_content='自驾318川藏线路线图\n\n自驾318川藏线路线图，走吧，跟着阿杰旅行的车队自驾西藏，不用自己做攻略，直接带上你的西藏梦和我出发吧，有车无车都可以加入\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', metadata={'source': 'https://www.xiaohongshu.com/explore/6448eb2700000000130151d0', 'note_id': '6448eb2700000000130151d0', 'title': '自驾318川藏线路线图', 'type': 'video', 'user_id': '62244dc60000000010004d5f', 'user_nick_name': '阿杰旅行丨自驾攻略'}),
 Document(page_content='自驾318去西藏的路线图，收藏好咯\n\n自驾318去西藏的路线图，收藏好咯，去西藏用的着。当然，也可以跟阿杰旅行的车队，省心省力就能自驾西藏，成都集合，每周出发\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', metadata={'

# 5. 接下来使用langchain的vector_store_agent来使用上边存下来的某书旅游数据，实现一个问答机器人。

In [10]:
from langchain import OpenAI, VectorDBQA
llm = OpenAI(temperature=0)
rds = Redis.from_existing_index(embeddings, redis_url=redis_url, index_name='link')
from langchain.agents.agent_toolkits import (
    create_vectorstore_agent,
    VectorStoreToolkit,
    VectorStoreInfo,
)
vectorstore_info = VectorStoreInfo(
    name="hotest_travel_advice",
    description="the hotest travel advice at present",
    vectorstore=rds
)
toolkit = VectorStoreToolkit(vectorstore_info=vectorstore_info)
agent_executor = create_vectorstore_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    # prefix='You are an agent designed to answer questions about sets of documents.\nYou have access to tools for interacting with the documents, and the inputs to the tools are questions.\nYou will always provide sources for your questions, in which case you should use the appropriate tool to do so.\nIf the question does not seem relevant to any of the tools provided, just return "I don\'t know" as the answer.\n'
)


# 6. 查看一下prompt

In [11]:
print(agent_executor.agent.llm_chain.prompt)

input_variables=['input', 'agent_scratchpad'] output_parser=None partial_variables={} template='You are an agent designed to answer questions about sets of documents.\nYou have access to tools for interacting with the documents, and the inputs to the tools are questions.\nYou will always provide sources for your questions, in which case you should use the appropriate tool to do so.\nIf the question does not seem relevant to any of the tools provided, just return "I don\'t know" as the answer.\n\n\nhotest_travel_advice: Useful for when you need to answer questions about hotest_travel_advice. Whenever you need information about the hotest travel advice at present you should ALWAYS use this. Input should be a fully formed question.\nhotest_travel_advice_with_sources: Useful for when you need to answer questions about hotest_travel_advice and the sources used to construct the answer. Whenever you need information about the hotest travel advice at present you should ALWAYS use this.  Input 

# 7. 问个问题

In [12]:
resp = agent_executor.run("年轻应该干什么？")
print(resp)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find the hottest travel advice
Action: hotest_travel_advice
Action Input: 年轻应该干什么？[0m




Observation: [36;1m[1;3m 年轻应该疯狂，一起自驾西藏，不负青春，不留遗憾，这才是我们要的旅行。[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 年轻应该疯狂，一起自驾西藏，不负青春，不留遗憾，这才是我们要的旅行。[0m

[1m> Finished chain.[0m
年轻应该疯狂，一起自驾西藏，不负青春，不留遗憾，这才是我们要的旅行。
