# Setup

In [None]:
!python -m pip -q install --upgrade pip
!pip install -e .

In [None]:
# Document loading, retrieval methods and text splitting
%pip install -qU wikipedia

%pip install -qU langchain-references
%pip install -qU langchain-community
%pip install -qU langchain-text-splitters

# Local vector store via Chroma
%pip install -qU langchain-chroma

# inference and embeddings 
%pip install -qU langchain-openai

In [None]:
import langchain_references

langchain_references.__version__

# Document loading, retrieval methods and text splitting
Load documents from the web and split them into smaller chunks for processing.

In [95]:
import os

os.environ["USER_AGENT"] = "langhchain-references"

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

from langchain_community.retrievers import WikipediaRetriever

documents = WikipediaRetriever(
    top_k_results=10, 
    doc_content_chars_max=2000
).invoke("mathematic")


In [96]:
[(doc.metadata["title"],doc.metadata["source"]) for doc in documents]

[('Mathematics', 'https://en.wikipedia.org/wiki/Mathematics'),
 ('History of mathematics',
  'https://en.wikipedia.org/wiki/History_of_mathematics'),
 ('Mathematical Reviews',
  'https://en.wikipedia.org/wiki/Mathematical_Reviews'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('Mathematical game', 'https://en.wikipedia.org/wiki/Mathematical_game'),
 ('Applied mathematics', 'https://en.wikipedia.org/wiki/Applied_mathematics'),
 ('Mathematical sciences',
  'https://en.wikipedia.org/wiki/Mathematical_sciences'),
 ('Mathematical logic', 'https://en.wikipedia.org/wiki/Mathematical_logic'),
 ('List of mathematics awards',
  'https://en.wikipedia.org/wiki/List_of_mathematics_awards'),
 ('Philosophy of mathematics',
  'https://en.wikipedia.org/wiki/Philosophy_of_mathematics')]

In [97]:
import os
from getpass import getpass

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass()

In [98]:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

embeddings = OpenAIEmbeddings()
model = ChatOpenAI(model="gpt-4o-mini")

Load the documents into a vector store.

In [99]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(documents=documents,
                                    embedding=embeddings,
                                    )

Combine the documents into a single string, but with a uniq small numeric id.

In [100]:
def format_docs(docs):
    # return "\n\n".join(doc.page_content for doc in docs)
    return "\n".join(
        # Add a document id so that LLM can reference it 
        [f"<document id={i + 1}>\n{doc.page_content}\n</document>\n" for i, doc in
         enumerate(docs)]
    )


# Manage references with langchain-reference

Create a prompt with `{format_references}`, `{context}` and `{question}` placeholders.

In [101]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved documents to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

{format_references}
  
<documents>
{documents}
</documents>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

Create a context with documents and format_references.

In [102]:
from langchain_references import FORMAT_REFERENCES

context = RunnablePassthrough.assign(
    documents=lambda input: format_docs(input["documents"]),
    format_references=lambda _: FORMAT_REFERENCES,
)
print(FORMAT_REFERENCES)

When referencing the documents, add a citation right after. Use "[NUMBER](id=ID_NUMBER)" for the citation (e.g. "The Space Needle is in Seattle [1](id=55)[2](id=12).").


Create a chain with the context, rag_prompt and model.

In [103]:
from langchain_core.output_parsers import StrOutputParser

# Invoke the chain without `manage_references()`
chain = (
        context
        | rag_prompt
        | model
)

Select documents similar to the question.

In [104]:
question = "What is the difference kind of games and competition of mathematics?"

docs = vectorstore.similarity_search(question,k=6)
[(d.metadata["title"],d.metadata["source"]) for d in docs]

[('Mathematical game', 'https://en.wikipedia.org/wiki/Mathematical_game'),
 ('Mathematical game', 'https://en.wikipedia.org/wiki/Mathematical_game'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions'),
 ('List of mathematics competitions',
  'https://en.wikipedia.org/wiki/List_of_mathematics_competitions')]

Invoke the chain with the documents and question, but without `manage_references()`.

In [108]:
answer = (chain | StrOutputParser()).invoke({"documents": docs, "question": question})
print(answer)

Mathematical games do not require deep mathematical knowledge to play and focus on recreational aspects, while mathematical competitions or olympiads involve participants completing tests that may require specific mathematical expertise, often including detailed solutions or proofs [1](id=1)[3](id=5). Games tend to be more casual and accessible, whereas competitions are structured and competitive [2](id=2).


Invoke the chain with the documents and question with `manage_references()`.

In [109]:
answer = (manage_references(chain) | StrOutputParser()).invoke(
    {"documents": docs, "question": question})
print(answer)

Mathematical games do not require deep mathematical knowledge to participate, focusing instead on enjoyment and engagement, while mathematical puzzles demand specific expertise to solve <sup>[1](https://en.wikipedia.org/wiki/Mathematical_game)</sup><sup>[1](https://en.wikipedia.org/wiki/Mathematical_game)</sup> In contrast, mathematics competitions, such as mathematical olympiads, involve structured tests where participants must solve problems, often requiring detailed solutions or proofs <sup>[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)</sup><sup>[2](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)</sup> Thus, the primary difference lies in the level of expertise required and the format of participation.

- **1** [Mathematical game](https://en.wikipedia.org/wiki/Mathematical_game)
- **2** [List of mathematics competitions](https://en.wikipedia.org/wiki/List_of_mathematics_competitions)



In [None]:
from IPython.display import display, Markdown

display(Markdown(answer))