# Setup
[![Open in Colab](colab-badge.svg)](https://colab.research.google.com/github/pprados/langchain-references/blob/wip/langchain_reference.ipynb)


In [1]:
!python -m pip -qU install --upgrade pip

In [2]:
# Document loading, retrieval methods and text splitting
%pip install -qU wikipedia

%pip install -qU langchain-references
%pip install -qU langchain-community
%pip install -qU langchain-text-splitters

# Local vector store via Chroma
%pip install -qU langchain-chroma

# inference and embeddings 
%pip install -qU langchain-openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
import langchain_references

langchain_references.__version__

'0.0.0'

# Document loading, retrieval methods and text splitting
Load documents from the web and split them into smaller chunks for processing.

In [4]:
import os

os.environ["USER_AGENT"] = "langhchain-references"

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

from langchain_community.retrievers import WikipediaRetriever

documents = WikipediaRetriever(
    top_k_results=10, 
    doc_content_chars_max=2000
).invoke("mathematic")


In [5]:
[(doc.metadata["title"],doc.metadata["source"]) for doc in documents]

[('Mathematics', 'https://en.wikipedia.org/wiki/Mathematics'),
 ('History of mathematics',
  'https://en.wikipedia.org/wiki/History_of_mathematics'),
 ('Mathematical Reviews',
  'https://en.wikipedia.org/wiki/Mathematical_Reviews'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('Mathematical game', 'https://en.wikipedia.org/wiki/Mathematical_game'),
 ('Applied mathematics', 'https://en.wikipedia.org/wiki/Applied_mathematics'),
 ('Mathematical sciences',
  'https://en.wikipedia.org/wiki/Mathematical_sciences'),
 ('Mathematical logic', 'https://en.wikipedia.org/wiki/Mathematical_logic'),
 ('List of mathematics awards',
  'https://en.wikipedia.org/wiki/List_of_mathematics_awards'),
 ('Philosophy of mathematics',
  'https://en.wikipedia.org/wiki/Philosophy_of_mathematics')]

In [6]:
import os
from getpass import getpass

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass()

In [7]:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

embeddings = OpenAIEmbeddings()
model = ChatOpenAI(model="gpt-4o-mini")

Load the documents into a vector store.

In [8]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(documents=documents,
                                    embedding=embeddings,
                                    )

Combine the documents into a single string, but with a uniq small numeric id.

In [9]:
def format_docs(docs):
    # return "\n\n".join(doc.page_content for doc in docs)
    return "\n".join(
        # Add a document id so that LLM can reference it 
        [f"<document id={i + 1}>\n{doc.page_content}\n</document>\n" for i, doc in
         enumerate(docs)]
    )


# Manage references with langchain-reference

Create a prompt with `{format_references}`, `{context}` and `{question}` placeholders.

In [10]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved documents to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

{format_references}
  
<documents>
{documents}
</documents>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

Create a context with documents and format_references.

In [11]:
from langchain_references import FORMAT_REFERENCES

context = RunnablePassthrough.assign(
    documents=lambda input: format_docs(input["documents"]),
    format_references=lambda _: FORMAT_REFERENCES,
)
print(FORMAT_REFERENCES)

When referencing the documents, add a citation right after. Use "[NUMBER](id=ID_NUMBER)" for the citation (e.g. "The Space Needle is in Seattle [1](id=55)[2](id=12).").


Create a chain with the context, rag_prompt and model.

In [12]:
from langchain_core.output_parsers import StrOutputParser

# Invoke the chain without `manage_references()`
chain = (
        context
        | rag_prompt
        | model
)

Select documents similar to the question.

In [13]:
question = "What is the difference kind of games and competition of mathematics?"

docs = vectorstore.similarity_search(question,k=6)
[(d.metadata["title"],d.metadata["source"]) for d in docs]

[('Mathematical game', 'https://en.wikipedia.org/wiki/Mathematical_game'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('Mathematics', 'https://en.wikipedia.org/wiki/Mathematics'),
 ('Philosophy of mathematics',
  'https://en.wikipedia.org/wiki/Philosophy_of_mathematics'),
 ('History of mathematics',
  'https://en.wikipedia.org/wiki/History_of_mathematics'),
 ('List of mathematics awards',
  'https://en.wikipedia.org/wiki/List_of_mathematics_awards')]

Invoke the chain with the documents and question, but without `manage_references()`. You can see some ̀\[1](id=1)` references in the answer.

In [15]:
answer = (chain | StrOutputParser()).invoke({"documents": docs, "question": question})
print(answer)

Mathematical games are structured activities with rules defined by mathematical principles, focusing on strategy and often involving simple procedures, like chess or tic-tac-toe [1](id=1). In contrast, mathematics competitions, such as the International Mathematical Olympiad, are events where participants solve mathematical problems or puzzles, often requiring a higher level of mathematical knowledge and skills [2](id=2). While games are typically recreational, competitions are formalized events aimed at testing and showcasing mathematical abilities.


Invoke the chain with the documents and question with `manage_references()`.

In [17]:
from langchain_references import manage_references

answer = (manage_references(chain) | StrOutputParser()).invoke(
    {"documents": docs, "question": question})
print(answer)

Mathematical games are structured activities defined by clear rules and strategies, often engaging players in fundamental arithmetic concepts without requiring deep mathematical expertise, while competitions, such as mathematics olympiads, involve participants completing math tests that may include multiple-choice questions or proofs <sup>[[1](https://en.wikipedia.org/wiki/Mathematical_game)]</sup><sup>[[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)]</sup> The former emphasizes recreational and educational aspects, whereas the latter focuses on assessment and competition among individuals or teams in solving mathematical problems <sup>[[1](https://en.wikipedia.org/wiki/Mathematical_game)]</sup><sup>[[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)]</sup>

- **1** [Mathematical game](https://en.wikipedia.org/wiki/Mathematical_game)
- **2** [List of mathematics competitions](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)



In [18]:
from IPython.display import display, Markdown

display(Markdown(answer))

Mathematical games are structured activities defined by clear rules and strategies, often engaging players in fundamental arithmetic concepts without requiring deep mathematical expertise, while competitions, such as mathematics olympiads, involve participants completing math tests that may include multiple-choice questions or proofs <sup>[[1](https://en.wikipedia.org/wiki/Mathematical_game)]</sup><sup>[[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)]</sup> The former emphasizes recreational and educational aspects, whereas the latter focuses on assessment and competition among individuals or teams in solving mathematical problems <sup>[[1](https://en.wikipedia.org/wiki/Mathematical_game)]</sup><sup>[[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)]</sup>

- **1** [Mathematical game](https://en.wikipedia.org/wiki/Mathematical_game)
- **2** [List of mathematics competitions](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)
