# Per User Retrieval

When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see eachother’s data. This means that you need to be able to configure your retrieval chain to only retrieve certain information.

This generally involves two steps.

- Step 1: Make sure the retriever you are using supports multiple users, for example, each vectorstore and retriever may have their own mechanism, and may be called different things (namespaces, multi-tenancy, etc). For vectorstores, this is generally exposed as a keyword argument that is passed in during similarity_search.

- Step 2: Add that parameter as a configurable field for the chain. This will let you easily call the chain and configure any relevant flags at runtime.

- Step 3: Call the chain with that configurable field

<a href="https://colab.research.google.com/github/edumunozsala/langchain-rag-techniques/blob/main/per-user-retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -q install langchain openai tiktoken pinecone-client

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m73.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip show langchain

Name: langchain
Version: 0.0.305
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, anyio, async-timeout, dataclasses-json, jsonpatch, langsmith, numexpr, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


### Load the API Key

In [1]:
from dotenv import load_dotenv

# Load the enviroment variables
load_dotenv()

True

### Load the PDF document

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load a PDF file, extract text into documents and create a FAISS vectorstore with Langchain
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document

import os
import re 

Helper functions to load the PDF file, split it and create the Documents

In [3]:

def split_text_documents(text, source="Not provided", chunk_size=1000, chunk_overlap=0):
    """
    Split the documents in the reader into smaller chuncks.
    
    Args:
    reader (PdfReader): The PdfReader object to be splitted.
    Returns:
    str: The summarized document.
    """
    # Create a text splitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap, separators=[" ", ",", "\n"]
    )
    #Split the text
    texts = text_splitter.split_text(text)
    # Create a list of documents
    docs = [Document(page_content=t, metadata={"source":source, "chunk":i}) for i,t in enumerate(texts)]
    """
    documents = text_splitter.split_documents(documents=reader)
    print(f"Splitted into {len(documents)} chunks")
    # Update the metadata with the source url
    for doc in documents:
        #old_path = doc.metadata["source"]
        #new_url = old_path.replace("langchain-docs", "https:/")
        #doc.metadata.update({"source": new_url})
        print(doc.metadata)
    """
    
    return docs

def load_pdf_from_url(path, files):
    """
    Load one or more PDF documents from the directory in the parameter path.

    Args:
    path: directory where the file or files are located
    files: list of file names

    Returns:
    str: the loaded document
    """
    
    # creating a pdf reader object 
    if path=='' or files=='':
        print('Error: file not found')
        return None
    else:
        reader = PyPDFLoader(os.path.join(path, files[0])) # 'data/Retrieve rerank generate.pdf') 
    
    # printing number of pages in pdf file 
    # print(len(reader.pages)) 
    return reader.load()

def extract_text_from_pdf(path, file):
    """
    Extract and return the text inside a PDF documents in the directory in the parameter path.

    Args:
    path: directory where the file or files are located
    files: list of file names

    Returns:
    str: the the text in the PDF file
    """
    files=[file]
    pages = load_pdf_from_url(path, files)
    
    "Join all pages in one single text"
    text=""
    for page in pages:
        raw_page = re.sub('-\s+', '', page.page_content)
        text += " ".join(raw_page.split())
        text += "\n"
        
    return text

def load_pdf_from_file(path, file, chunk_size, chunk_overlap):
    """
    Load the PDF document file from the directory in the parameter path. Returns a list of Langchain Documents

    Args:
    path: directory where the file or files are located
    file: file name

    Returns:
    lst: list of documents
    """
    
    # creating a pdf reader object 
    if path=='' or file=='':
        print('Error: file not found')
        return None
    else:
        # Read and clean the text in the PDf file
        text= extract_text_from_pdf(path, file)
        # Split the text and create a list of documents
        documents = split_text_documents(text, source=file, chunk_size=chunk_size, chunk_overlap=chunk_overlap)

  
    # printing number of pages in pdf file 
    print(len(documents)) 
    
    return documents


Read the PDf documento of ther user 1

In [4]:
path="data"
file="Attention is all you need.pdf"

docs1= load_pdf_from_file(path, file, 1000, 50)

42


In [5]:
print(len(docs1))

42


Read the PDf documento of ther user 2

In [6]:
path="data"
file="llama-2.pdf"

docs2= load_pdf_from_file(path, file, 1000, 50)
print(len(docs2))

266
266


### Create the Vector index

In [7]:
import pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone

  from tqdm.autonotebook import tqdm


In [11]:
# initialize pinecone
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"),  # find at app.pinecone.io
    environment=os.getenv("PINECONE_ENVIRONMENT_REGION"),  # next to api key in console
)

index_name = "langchain-demo"

# First, check if our index already exists. If it doesn't, we create it
if index_name not in pinecone.list_indexes():
    # we create a new index
    pinecone.create_index(name=index_name, metric="cosine", dimension=1536)


Add the docs of user 1 into the index
 

In [12]:
# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`
embeddings = OpenAIEmbeddings()
# Add the documents to the index
docsearch = Pinecone.from_documents(docs1, embeddings, index_name=index_name, namespace="user 1")
# Add the documents to the index
docsearch = Pinecone.from_documents(docs2, embeddings, index_name=index_name, namespace="user 2")


## Get the index and Create the Retriever

In [14]:
# Get the index
index = pinecone.Index("langchain-demo")
# Set the embeddings we'll use and was used to add the docs
embeddings = OpenAIEmbeddings()
# Create the vectorstore
vectorstore = Pinecone(index, embeddings, "text")

Now, we can create a retriever for the user 1 and test it

In [15]:
# This will only get documents for Ankush
vectorstore.as_retriever(search_kwargs={"namespace": "user 1"}).get_relevant_documents(
    "How are transformers related to convolutional neural networks?"
)

[Document(page_content='neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difﬁcult to learn dependencies between distant positions [ 12]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual ent

And we can repeat the operation for user 2

In [16]:
# This will only get documents for Ankush
vectorstore.as_retriever(search_kwargs={"namespace": "user 2"}).get_relevant_documents(
    "What datasets have benn used to train Llama-2?"
)

[Document(page_content='Useinanyotherway that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. Hardware and Software (Section 2.2) Training Factors We usedcustomtraininglibraries, Meta’sResearchSuperCluster, andproductionclustersforpretraining. Fine-tuning,annotation,andevaluationwerealso performed on third-party cloud compute. Carbon Footprint Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO 2eq, 100% of which were offset by Meta’s sustainability program. Training Data (Sections 2.1 and 3) Overview Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as wellasoveronemillionnewhuman-annotatedexamples. Neitherthepretraining nor the fine-tuning datasets include Meta user data. Data Freshness The pretraining data has a cutoff of September 2022, but some tu

## Build the RAG Chain

In [17]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import (
    ConfigurableField,
    RunnableBinding,
    RunnableLambda,
    RunnablePassthrough,
)

Define the Prompt template

In [25]:
# Create the Template 
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
# Create the model for LLM Chain
model = ChatOpenAI(temperature=0)


Create the retriever. Here we mark the retriever as having a configurable field. All vectorstore retrievers have search_kwargs as a field. This is just a dictionary, with vectorstore specific fields

In [26]:
# Create the retriever
retriever = vectorstore.as_retriever()
configurable_retriever = retriever.configurable_fields(
    search_kwargs=ConfigurableField(
        id="search_kwargs",
        name="Search Kwargs",
        description="The search kwargs to use",
    )
)

We can now create the chain using our configurable retriever

In [27]:
chain = (
    {"context": configurable_retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Invoke the chain to answer a question from user 1

In [28]:
chain.invoke(
    "How are transformers related to convolutional neural networks?",
    config={"configurable": {"search_kwargs": {"namespace": "user 1"}}},
)

'Transformers are related to convolutional neural networks in that they both use convolutional neural networks as a basic building block. However, transformers differ from convolutional neural networks in that they rely entirely on an attention mechanism to draw global dependencies between input and output, while convolutional neural networks use convolutional operations to relate signals from different positions.'

And now for user 2

In [29]:
chain.invoke(
    "What datasets have benn used to train Llama-2?",
    config={"configurable": {"search_kwargs": {"namespace": "user 2"}}},
)

'The datasets used to train Llama-2 include 2 trillion tokens of data from publicly available sources for pretraining, publicly available instruction datasets for fine-tuning, and over one million new human-annotated examples for fine-tuning.'

Now, we try to solve the question of user 2 using the user 1 

In [30]:
chain.invoke(
    "What datasets have benn used to train Llama-2?",
    config={"configurable": {"search_kwargs": {"namespace": "user 1"}}},
)

'Based on the given context, there is no information about the datasets used to train Llama-2.'

As we expected, user 1 has no access to data of user 2, so it can not solve the question