In [None]:
pip list

In [None]:
import sys

In [None]:
print(sys.version)

# Get Started

## Introduction

**LangChain** is a framework for developing applications powered by language models. It enables applications that:

+ **Are context-aware**: connect a langulate model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
+ **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

The main values props of LangChain are:

1. **Components**: abstractions for working with language models
2. **Off-the-shelf chains**: a structured assembly of components for accomplishing specific higher-level tasks

### Installation

#### Environment Setup

In [None]:
pip install langchain

In [None]:
pip install typing_extensions==4.7.1 --upgrade

In [None]:
from langchain.llms import OpenAI

In [None]:
llm = OpenAI(openai_api_key='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')

#### Building an Application

LangChain provides modules for building language model applications. Modules can be combined for more complex use cases. The most common and import chain LangChain helps create contains three things:

+ **LLM**: the core reasoning engine is the language model.
+ **Prompt Tempaltes**: instructions to the language model.
+ **Output Parsers**: translate the raw response from the LLM for use downstream

#### LLMs

+ **LLMs**: a language model that takes a string as input and returns a string
+ **ChatModels**: a language model that takes a list of messages as input and returns a message

A `ChatMessage` has 2 required components:

+ `content`: content of the message
+ `role`: role of the enrity the `ChatMessage` is coming from

LangChain has several objects for distinguising roles:

+ `HumanMessage`: a `ChatMessage` from a human
+ `AIMessage`: a `ChatMessage` from an AI
+ `SystemMessage`:a `ChatMessage` from the system
+ `FuntionMessage`: a `ChatMessage` from a function

LangChain has a standard interface with two methods:

+ `predict`: takes a string and returns a string
+ `predict_messages`: takes a list of messages and returns a message

##### Import an LLM and a ChatModel

In [None]:
pip install --upgrade openai

In [None]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [None]:
llm = OpenAI()

In [None]:
llm.predict("hi!")

In [None]:
chat_model = ChatOpenAI()

In [None]:
chat_model.predict("hi!")

In [None]:
text = "What would be a good company name for a speculative quantitative trader's fund?"

In [None]:
llm.predict(text)

In [None]:
chat_model.predict(text)

Let's use the `predict_messages` method to run over a list of messages.

In [None]:
from langchain.schema import HumanMessage

In [None]:
messages = [HumanMessage(content=text)]

In [None]:
llm.predict_messages(messages)

In [None]:
chat_model.predict_messages(messages)

#### Prompt Templates

Most LLM applications do not pass user input directly into an LLM. Usually, LLM applications will add the user input into a larger piece of text called a prompt template that provides additional context on the specific task at hand.

In [None]:
from langchain.prompts import PromptTemplate

In [None]:
prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")

In [None]:
prompt.format(product='financial returns uncorrelated with the broader market')

For more explanation on prompt template functionality, see the LangChain [section on prompts](https://python.langchain.com/docs/modules/model_io/prompts).

PromptTemplates can also be used to produce a list of messages. What happens most often is a ChatPromptTemplate is a list of ChatMessageTemplates. Each ChatMessageTemplate containes instructions for how to format a ChatMessage (i.e., role, content)

In [6]:
from langchain.prompts.chat import ChatPromptTemplate

In [None]:
template = "You are a helpful assistant that translates {input_language} to {output_language}."
human_template = "{text}"

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

In [None]:
chat_prompt.format_messages(input_language="English", 
                            output_language="Spanish", 
                            text="I love programming."
                           )

#### Output Parsers

OutputParsers convert the raw output of an LLM into a format that can be used downstream. There are a few main types:

+ Convert text from LLM -> structured information (i.e., json)
+ Convert a ChatMessage into a string
+ Convert the extra information returned from a call besides the message (like OpenAI function invocation) into a string

For more information, see the [section on output parsers](https://python.langchain.com/docs/modules/model_io/output_parsers)

Let's convert a comma seperated list into a list.

In [None]:
from langchain.schema import BaseOutputParser

In [None]:
class CommaSeperatedListOutputParser(BaseOutputParser):
    """ Parse the output of an LLM call to a comma-seperated list. """

    def parse(self, text: str):
        """ Parse the output of an LLM call. """
        return text.strip().split(", ")

In [None]:
CommaSeperatedListOutputParser().parse("hi, bye")

### PromptTemplate + LLM + OutputParser

We can combine all 3 components into one chain. The chain will take input variables, pass those to a prompt template to create a tempalte, pass the prompt to a language model, then pass the output through an output parser.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import BaseOutputParser

In [None]:
class CommaSeperatedListOutputParser(BaseOutputParser):
    """ Parse the output of an LLM call to a comma-seperated list. """

    def parse(self, text: str):
        """ Parse the output of an LLM call. """
        return text.strip().split(", ")

In [None]:
template = """You are a helpful assistant who generates comma seperate lists. A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""
human_template = "{text}"

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

In [None]:
chain = chat_prompt | ChatOpenAI() | CommaSeperatedListOutputParser()

In [None]:
chain.invoke({"text": "colors"})

The `|` syntax joins these components together. The `|` syntax is called the LangChain expression Language. To learn more, check out the documentation [here](https://python.langchain.com/docs/expression_language)

# LangChain Expression Language

## Cookbook for Retrieval Augmented Generation (RAG)

In [1]:
pip install langchain openai faiss-cpu tiktoken


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
from operator import itemgetter
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.vectorstores import FAISS

In [6]:
vectorstore = FAISS.from_texts(["andrew works at a consulting firm"],
                               embedding=OpenAIEmbeddings()
                              )

In [7]:
retriever = vectorstore.as_retriever()

In [8]:
template = """ Answer the question based only on the following context: {context}

Questions: {question}
"""

In [9]:
prompt = ChatPromptTemplate.from_template(template)

In [10]:
model = ChatOpenAI()

The "retriever" (i.e., corpus of documents) sets the context.

In [11]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [12]:
chain.invoke("where did andrew work?")

'Andrew worked at a consulting firm.'

### Conversational Retrieval Chain

We can add in conversation history. This means adding in `chat_message_history`.

In [13]:
from langchain.schema.runnable import RunnableMap
from langchain.schema import format_document
from langchain.prompts.prompt import PromptTemplate

In [14]:
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

In [15]:
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [16]:
template = """Answer the question based only on the following context: 
{context}

Question: {question}
"""

In [17]:
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

In [18]:
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

In [19]:
def _combine_documents(docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

In [20]:
from typing import Tuple, List

In [21]:
def _format_chat_history(chat_history: List[Tuple]) -> str:
    buffer = ""
    for dialogue_turn in chat_history:
        human = "Human: " + dialogue_turn[0]
        ai = "Assistant: " + dialogue_turn[1]
        buffer += "\n" + "\n".join([human, ai])
    return buffer

In [22]:
_inputs = RunnableMap(
    standalone_question = RunnablePassthrough.assign(
        chat_history = lambda x: _format_chat_history(x['chat_history'])
    ) | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),
)

In [23]:
import pprint

In [24]:
pprint.pprint(_inputs)

{
  standalone_question: RunnableAssign(mapper={
                         chat_history: RunnableLambda(lambda x: _format_chat_history(x['chat_history']))
                       })
                       | PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')
                       | ChatOpenAI(client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, temperature=0.0, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='')
                       | StrOutputParser()
}


In [25]:
_context = {
    "context": itemgetter("standalone_question") | retriever | _combine_documents,
    "question": lambda x: x["standalone_question"]
}

In [26]:
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()

In [27]:
conversational_qa_chain.invoke({
    "question": "where did andrew work?",
    "chat_history": [],
})

AIMessage(content='Andrew was employed at a consulting firm.')

In [28]:
conversational_qa_chain.invoke({
    "question": "where did he work?",
    "chat_history": ["who wrote this notebook?", "Andrew"],
})

AIMessage(content='Andrew worked at a consulting firm.')

## Cookbook for Adding Memory

Add memory to an arbitraty chain. You can use memory classes.

In [30]:
from operator import itemgetter
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

In [31]:
model = ChatOpenAI()

In [32]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful chatbot"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

In [33]:
memory = ConversationBufferMemory(return_messages=True)

In [34]:
memory.load_memory_variables({})

{'history': []}

In [38]:
chain = RunnablePassthrough.assign(
    memory=RunnableLambda(memory.load_memory_variables) | itemgetter('history')
) | prompt | model

In [39]:
inputs = {"input": "hi im bob"}

In [52]:
response = chain.invoke(inputs)

KeyError: 'history'

In [None]:
memory.save_context(inputs, {"output": response.content})

In [None]:
memory.load_memory_variables({})

In [None]:
inputs = {"input": "whats my name"}

In [None]:
response = chain.invoke(inputs)

In [None]:
response

#### With Memory and Returing Source Documents

For memory, we need to manage outside memory. For returning the retrieved documents, we need to pass them through all the way.

In [34]:
from operator import itemgetter
from langchain.memory import ConversationBufferMemory
from langchain.schema.runnable.base import RunnableLambda

In [30]:
memory = ConversationBufferMemory(return_messages=True, output_key="answer", input_key="question")

Potential [solution](https://python.langchain.com/docs/modules/memory/adding_memory_chain_multiple_inputs) to the error when instantiating chat history in memory.

In [31]:
# memory.load_memory_variables({}) | itemgetter("history")

TypeError: unsupported operand type(s) for |: 'dict' and 'operator.itemgetter'

In [35]:
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)

In [36]:
standalone_question = {
    "standalone_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: _format_chat_history(x['chat_history'])
    } | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),
}

In [37]:
# retrieve the documents
retrieved_documents = {
    "docs": itemgetter("standalone_question") | retriever,
    "question": lambda x: x["standalone_question"]
}

In [38]:
# construct the inputs for the final prompt
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question")
}

In [39]:
# return the answers
answer = {
    "answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(),
    "docs": itemgetter("docs"),
}

In [40]:
# put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer

In [41]:
inputs = {"question": "where did harrison work?"}

In [42]:
result = final_chain.invoke(inputs)

In [43]:
result

{'answer': AIMessage(content="There is no information provided about Harrison's employment."),
 'docs': [Document(page_content='andrew works at a consulting firm')]}

# Modules

## Retrievers

### Interface with application-specific data

LangChain uses [Chroma](https://python.langchain.com/docs/ecosystem/integrations/chroma.html) as the vector store to index and search embeddings. To walk through this tutorial, we'll first need to install `chromadb`. [Link](https://python.langchain.com/docs/modules/data_connection/retrievers#get-started) to the getting started exercise showcasing question answering over documents.

In [None]:
pip install chromadb

## Memory

### [Getting Started](https://python.langchain.com/docs/modules/memory/)

Take a look at how to use `ConversationBufferMemory` in chains. `ConversationBufferMemory` is a "simple" form of memory that keeps a list of chat messages in a buffer and passes those into a prompt template.

A more complex memory system might return a succint summany of past K messages. A more sophisticated system might extract entities from stored messages and only return information about entities referenced in the current run.

In [None]:
pip install langchain==0.0.320

In [7]:
pip list | grep "langchain"

langchain                     0.0.320

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
from langchain.memory import ConversationBufferMemory

In [2]:
memory = ConversationBufferMemory()

In [3]:
memory.chat_memory.add_user_message("hi!")

In [4]:
memory.chat_memory.add_ai_message("what's up?")

#### Variables returned from memory

In [5]:
memory.load_memory_variables({})

{'history': "Human: hi!\nAI: what's up?"}

If you want the memory variables to be returned in the key `chat_history` you can do the following:

In [6]:
memory = ConversationBufferMemory(return_messages=True)

In [7]:
memory.chat_memory.add_user_message("hi!")

In [8]:
memory.chat_memory.add_ai_message("what's up?")

In [9]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi!'), AIMessage(content="what's up?")]}

Often, chains return multiple input/output keys. This is contollable by `input_key` and `output_key` parameters on the memory types.

#### End to end example

##### Using an LLM

In [10]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory

In [14]:
llm = OpenAI(temperature=0)

In [16]:
# notice  "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}
Response:"""

In [17]:
prompt = PromptTemplate.from_template(template)

In [20]:
# notice we need to align the `memory_key`
memory = ConversationBufferMemory(memory_key="chat_history")
conversation = LLMChain(
    llm=llm,
    prompt=prompt,
    # verbose=True,
    memory=memory
)

In [21]:
# notice we pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

{'question': 'hi',
 'chat_history': '',
 'text': ' Hi there! How can I help you?'}

##### Using a ChatModel

In [22]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory

In [23]:
llm = ChatOpenAI()

In [25]:
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template(
            "You are a nice chatbot having a conversation with a human."
        ),
        # `variable_name` here is what must align with memory
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{question}")
    ]
)

In [26]:
# notice we `return_messages=True` to fit into the MessagesPlaceholder
# notice `"chat_history"` aligns with the MessagesPlaceholder name
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

In [28]:
conversation = LLMChain(
    llm=llm,
    prompt=prompt,
    # verbose=True,
    memory=memory
)

In [29]:
# motice we pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

{'question': 'hi',
 'chat_history': [HumanMessage(content='hi'),
  AIMessage(content='Hello! How can I assist you today?')],
 'text': 'Hello! How can I assist you today?'}

### Memory in the Multi-input Chain

[Link](https://python.langchain.com/docs/modules/memory/adding_memory_chain_multiple_inputs) to add memory to a chain that has multiple inputs. 

## Chains

### Foundational LLMs

# Retrieval-Augmented Generation (RAG) Use Cases

## Ouickstart

See documentation [here](https://python.langchain.com/docs/use_cases/question_answering/#overview)

## Using a Retriever 

This [example](https://python.langchain.com/docs/use_cases/question_answering/vector_db_qa) demonstrates question answering over an index.

In [None]:
pip install soupsieve

In [None]:
pip install unstructured

In [None]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import UnstructuredWordDocumentLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

In [None]:
loader = UnstructuredWordDocumentLoader(
        "/Users/andkelly/Library/CloudStorage/Dropbox/Applications Folder/Resumes/2023_Kelly_Andrew_Acorns_Resume.doc", 
        mode="elements", 
        strategy="fast",
    )

In [None]:
from platform import python_version

In [None]:
print(python_version())

In [None]:
pip install python-docx

In [None]:
documents = loader.load()

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [None]:
texts = text_splitter.split_documents(documents)

In [None]:
import langchain

In [15]:
from langchain.vectorstores import utils

In [None]:
texts = langchain.vectorstores.utils.filter_complex_metadata(texts)

In [None]:
embeddings = OpenAIEmbeddings()

In [None]:
docsearch = Chroma.from_documents(texts, embeddings)

In [None]:
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())

In [None]:
query = "give me a 2 sentence elevator pitch for Andrew"

In [None]:
qa.run(query)

### File Directory

For each path in this directory, load into the vector store. See [documentation](https://python.langchain.com/docs/modules/data_connection/document_loaders/file_directory).

In [None]:
pip install grpcio==1.58.0

In [None]:
pip install "unstructured[pdf]"

In [1]:
from langchain.document_loaders import DirectoryLoader

In [2]:
loader = DirectoryLoader('/Users/andkelly/Library/CloudStorage/Dropbox/Applications Folder/Resumes', glob="**/*.pdf")

In [3]:
docs = loader.load()

In [4]:
len(docs)

6

In [5]:
from langchain.text_splitter import CharacterTextSplitter

In [6]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [8]:
texts = text_splitter.split_documents(docs)

In [9]:
from langchain.embeddings.openai import OpenAIEmbeddings

In [4]:
import os

In [5]:
os.environ['OPENAI_API_KEY'] = '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'

In [13]:
embeddings = OpenAIEmbeddings()

In [14]:
from langchain.vectorstores import Chroma

In [15]:
docsearch = Chroma.from_documents(texts, embeddings)

In [18]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

In [19]:
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())

In [20]:
query = "give me a 2 sentence elevator pitch for Andrew"

In [21]:
qa.run(query)

" Andrew Arthur Kelly is a Finance professional who graduated with a Master's of Science in International Business from the University of Florida's Hough Graduate School of Business. He also holds a Bachelor of Science in Business Administration from the University of Florida's Warrington College of Business and was a Cum Laude Honors Graduate with a GPA of 3.52."

### Vector Store Retriever Options

### Return Source Documents

Return source documents used to answer the quesion by specifying an optional parameter when constructing the chain.

In [10]:
import os

In [None]:
print(os.environ)

In [11]:
os.environ['OPENAI_API_KEY'] = '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'

In [None]:
qa = RetrievalQA.from_chain_type(llm = OpenAI(), 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(serach_type ="mmr",
                                                                 search_kwag={
                                                                     'fetch_k': 30
                                                                 }),
                                return_source_documents=True
                                )

In [None]:
query = "what did andrew design and develop?"

In [None]:
result = qa({"query": query})

In [None]:
result["result"]

In [None]:
test_doc = result["source_documents"][0]

In [None]:
test_doc = test_doc.to_json()

In [None]:
test_doc

In [None]:
test_doc.get('kwargs').get('metadata').get('filename')

#### Easter egg

[documentation](https://api.python.langchain.com/en/latest/api_reference.html)

## Remembering Chat History

The ConversationalRetrievalQA chain builds on RetrievalQA chain to provide chat history.

First, combine the chat history (either explicitly or retrieved from memory) and the question into a standalone question. Next, look up releveant documents from the retriever. Finally, pass those documents and the question to a question-answering chain to return a response.

Let's create a retriever from document embeddings in a vector store.

In [1]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain

Load in documents. This step can be swapped out with a loader for any type of data.

In [3]:
from langchain.document_loaders import UnstructuredWordDocumentLoader

In [4]:
loader = UnstructuredWordDocumentLoader(
        "/Users/andkelly/Library/CloudStorage/Dropbox/Essays/Bjarke Ingels.docx", 
        mode="elements", 
        strategy="fast",
    )

If we had multiple loaders (i.e., multiple data source formats), you could use the cell below.

In [None]:
# loaders = [....]
# docs = []
# for loader in loaders:
#     docs.extend(loader.load())

In [5]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [7]:
documents = loader.load()

In [8]:
documents = text_splitter.split_documents(documents)

In [20]:
from langchain.vectorstores import utils

In [21]:
docuemnts = utils.filter_complex_metadata(documents)

In [12]:
embeddings = OpenAIEmbeddings()

In [22]:
vectorstore = Chroma.from_documents(documents, embeddings)

Create a memory object to track the inputs/outputs to hold a conversation.

In [23]:
from langchain.memory import ConversationBufferMemory

In [24]:
memory = ConversationBufferMemory(memory_key = "chat_history",
                                 return_messages = True
                                 )

In [25]:
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), 
                                           vectorstore.as_retriever(), 
                                           memory=memory
                                          )

In [26]:
query = "Who is Bjarke Ingels?"

In [27]:
result = qa({"question": query})

In [28]:
result["answer"]

' Bjarke Ingels is an architect from Denmark who is striving to design and engineer structures that are socially, economically, and environmentally perfect. He believes that architecture is the human manipulation of the surface of the planet to make sure that if fits the way we want to live. He is also interested in creating a social infrastructure for cities.'

In [29]:
query = "What buildings is he famous for?"

In [30]:
result = qa({"question": query})

In [31]:
result["answer"]

' Bjarke Ingels has designed the Vancouver House, the Titling Building in Huaxi, five open-air swimming pools at Islands Brygge Harbour Bath in the Copenhagen Harbour, the Maritime Youth House, a sailing club and a youth house at Sundby Harbour in Copenhagen, and the VM Houses in Ørestad, Copenhagen.'

For more examples, revisit the [documentation](https://python.langchain.com/docs/use_cases/question_answering/chat_vector_db#pass-in-chat-history)

## Citing Retrival Sources

Use OpenAI functions ability to extract citations from text.

In [None]:
from langchain.chains import create_citation_fuzzy_match_chain
from langchain.chat_models import ChatOpenAI

In [None]:
question = "What did the author do during college?"
context = """
My name is Andrew Kelly, and I grew up in Sarasota Florida but I was born in Michigan.
I went to a public highschool but in university I studied finance. 
As part of a summer internship, I worked at many companies including Citi, University of Florida.
I also Chaired Student Government Productions at the University of Florida.
"""

In [None]:
from openai import Model

In [None]:
models = Model.list()

In [None]:
for model in models["data"]:
    print(model["id"])

In [None]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

In [None]:
chain = create_citation_fuzzy_match_chain(llm)

In [None]:
result = chain.run(question=question, context=context)

In [None]:
print(result)

In [None]:
def highlight(text, span):
    return(
        "..."
        + text[span[0] - 20 : span[0]]
        + "*"
        + "\033[91m"
        + text[span[0] : span[1]]
        + "\033[0m"
        + "*"
        + text[span[1] : span[1] + 20]
        + "..."
    )

In [None]:
for fact in result.answer:
    print("Statement:", fact.fact)
    for span in fact.get_spans(context):
        print("Citation:", highlight(context, span))
    print()

# Integrations

## Components

### Retriever for Wikipedia