# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [9]:
#pip install --upgrade langchain

In [1]:
import os
import openai

openai.api_key = os.environ['OPENAI_API_KEY']

import warnings
warnings.filterwarnings('ignore')

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [3]:
file = 'Tweets.csv'
loader = CSVLoader(file_path=file)

In [4]:
from langchain.indexes import VectorstoreIndexCreator

In [14]:
#pip install docarray

In [5]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

ValidationError: 1 validation error for OpenAIEmbeddings
__root__
  Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass  `openai_api_key` as a named parameter. (type=value_error)

In [16]:
query ="What is Open AI usages?"

In [17]:
response = index.query(query)

In [18]:
display(Markdown(response))

 Open AI is a research organization focused on developing artificial intelligence technologies. They are best known for their work on generative models such as GPT-3 and their work on reinforcement learning.

In [19]:
loader = CSVLoader(file_path=file)

In [20]:
docs = loader.load()

In [21]:
docs[0]

Document(page_content='\ufeffTweet: CHAT GPT几个相对实用的场景\nProperty: https://twitter.com/vista8\nCreated: June 9, 2023 12:38 PM\nLink: \nTags: Tool\nTweet Link: https://twitter.com/vista8/status/1658508138050629632\nType: Tweet', metadata={'source': 'Tweets.csv', 'row': 0})

In [22]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [23]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [24]:
print(len(embed))

1536


In [25]:
print(embed[:5])

[-0.021913960576057434, 0.006774206645786762, -0.018190348520874977, -0.039148248732089996, -0.014089343138039112]


In [26]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [27]:
query = "What is the VR usage with LLM?"

In [28]:
docs = db.similarity_search(query)

In [29]:
len(docs)

4

In [32]:
docs[0]

Document(page_content='\ufeffTweet: 苹果官方介绍并没有把 #AppleVisionPro  定义为XR眼镜，操作系统也不是XrOS，苹果的定义是进入空间计算时代，连接数字世界和物理世界的桥梁。似乎苹果有着更大的野心和规划。\nProperty: https://twitter.com/xiaohuggg\nCreated: June 6, 2023 1:00 PM\nLink: \nTags: \nTweet Link: https://twitter.com/xiaohuggg/status/1665896981259231232\nType: Tweet', metadata={'source': 'Tweets.csv', 'row': 11})

In [33]:
retriever = db.as_retriever()

In [34]:
llm = ChatOpenAI(temperature = 0.0)


In [36]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])
qdocs

'\ufeffTweet: 终于，苹果最近发布的AR/VR职位描述里，总算泄露出来了他们要把LLM放进AR/VR设备里的想法：“Conversational and Generative AI”\nProperty: https://twitter.com/fi56622380\nCreated: June 7, 2023 7:49 PM\nLink: \nTags: \nTweet Link: https://twitter.com/fi56622380/status/1665151979595898880\nType: Thread\ufeffTweet: 微调LLM+向量记忆索引+text to video+control net+deepfake换脸+声音克隆+LLM微调情绪标签，最后再加个高清VR。\nProperty: https://twitter.com/goldengrape\nCreated: June 7, 2023 6:37 AM\nLink: \nTags: Digitlife\nTweet Link: https://twitter.com/goldengrape/status/1666134189488640000\nType: Tweet\ufeffTweet: 我把 AI 加入自己的工作流已经有段时间，现在基本是逢人就安利，建议尽快把 AI 介入开发流程。开一个 thread，分享下目前的一些经验、使用的工具和服务。截图这个基本就是我目前工作时的桌面状态。一般就是左边小窗开一个 ChatGPT ，右边VSCode + GitHub Copilot 插件。接下来展开讲讲。\nProperty: https://twitter.com/luoleiorg\nCreated: June 1, 2023 5:44 PM\nLink: \nTags: \nTweet Link: https://twitter.com/luoleiorg/status/1662145013151858689\nType: Thread\ufeffTweet: 苹果官方介绍并没有把 #AppleVisionPro  定义为XR眼镜，操作系统也不是XrOS，苹果的定义是进入空间计算时代，连接数字世界和物理世界的桥梁。似乎苹果有着更大的野心和规划。\nPropert

In [38]:
response = llm.call_as_llm(f"{qdocs} Question: What is the VR usage with LLM?") 


In [39]:
display(Markdown(response))

Answer: The recent job description from Apple suggests that they plan to incorporate "Conversational and Generative AI" into their AR/VR devices, which could potentially involve using LLM technology. Additionally, another tweet mentions using LLM in combination with other technologies for text-to-video, deepfake, and emotion tagging in a VR context.

In [40]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [44]:
query =  " What is the VR usage with LLM?"

In [45]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [46]:
display(Markdown(response))

According to the first tweet, Apple's recent job description for AR/VR includes the idea of putting LLM (likely referring to the language model GPT-3) into their AR/VR devices with the goal of "Conversational and Generative AI." However, the tweet does not provide specific details on how LLM will be used in VR.

In [47]:
response = index.query(query, llm=llm)
response

In [None]:
indexDefault = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

response = index.query(query)
response

In [50]:
indexEmb = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

response = index.query(query )
response

' The tweets suggest that Apple is planning to incorporate LLM into their AR/VR devices, and that AI can be used in development workflows to create high-definition VR experiences.'