In [4]:
%load_ext dotenv
%dotenv

## Step 1: Load

We need to first load the blog post contents. We can use DocumentLoaders for this, which are objects that load in data from a source as Documents. A Document is an object with page_content (str) and metadata (dict) attributes.

In [1]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='../data/design_patterns.csv', source_column='content', metadata_columns=['page_title'])
docs = loader.load()

print(docs[0].page_content)

content: Design patterns are typical solutions to commonly occurring problems in software design. They are like pre-made blueprints that you can customize to solve a recurring design problem in your code.

You can’t just find a pattern and copy it into your program, the way you can with off-the-shelf functions or libraries. The pattern is not a specific piece of code, but a general concept for solving a particular problem. You can follow the pattern details and implement a solution that suits the realities of your own program.

Patterns are often confused with algorithms, because both concepts describe typical solutions to some known problems. While an algorithm always defines a clear set of actions that can achieve some goal, a pattern is a more high-level description of a solution. The code of the same pattern applied to two different programs may be different.

An analogy to an algorithm is a cooking recipe: both have clear steps to achieve a goal. On the other hand, a pattern is mo

## Step 2: Split

Loaded documents tend to be too long to fit in the context window of many models. And even for those models that could fit the full post in their context window, empirically models struggle to find the relevant context in very long prompts.

So we'll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
split_docs = text_splitter.split_documents(docs)

split_docs[0]

Document(page_content='content: Design patterns are typical solutions to commonly occurring problems in software design. They are like pre-made blueprints that you can customize to solve a recurring design problem in your code.\n\nYou can’t just find a pattern and copy it into your program, the way you can with off-the-shelf functions or libraries. The pattern is not a specific piece of code, but a general concept for solving a particular problem. You can follow the pattern details and implement a solution that suits the realities of your own program.\n\nPatterns are often confused with algorithms, because both concepts describe typical solutions to some known problems. While an algorithm always defines a clear set of actions that can achieve some goal, a pattern is a more high-level description of a solution. The code of the same pattern applied to two different programs may be different.', metadata={'source': 'Design patterns are typical solutions to commonly occurring problems in so

## Step 3: Embed and store

Now that we've got text chunks in memory, we need to store and index them so that we can search them later in our RAG app. The most common way to do this is to embed the contents of each document split and upload those embeddings to a vector store.

In [5]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(split_docs, embedding=embeddings)

In [6]:
query = "What's abstract factory pattern?"
docs = db.similarity_search(query)
docs[0].page_content

'content: Abstract Factory is a creational design pattern that lets you produce families of related objects without specifying their concrete classes.\n\nAbstract Factory pattern\n Problem\nImagine that you’re creating a furniture shop simulator. Your code consists of classes that represent:\n\nA family of related products, say: Chair + Sofa + CoffeeTable.\n\nSeveral variants of this family. For example, products Chair + Sofa + CoffeeTable are available in these variants: Modern, Victorian, ArtDeco.\n\nProduct families and their variants.\nProduct families and their variants.\n\nYou need a way to create individual furniture objects so that they match other objects of the same family. Customers get quite mad when they receive non-matching furniture.\n\n\nA Modern-style sofa doesn’t match Victorian-style chairs.'

In [7]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[-0.027123517018005243,
 -0.007989284208316762,
 -0.014224655260120537,
 -0.017870585528624007,
 -0.016655276680886226,
 -0.002409905082386039,
 -0.026709207014217813,
 -0.009107922708658876,
 -0.013050776295174461,
 -0.02559056944519823,
 0.012429310358170784,
 0.016461930398159703,
 0.0022683491799473754,
 0.0011574795393282623,
 0.0014587176746265172,
 0.017842966227644558,
 0.02891886285127753,
 -0.008555508749393948,
 0.0226213455783024,
 0.015467586202805363,
 -0.021792723708082475,
 0.02894648215225698,
 -0.001309393378126118,
 -0.0364316908346355,
 -0.01811917227595449,
 0.01319578414457429,
 0.020342637763503934,
 -0.022566103251053377,
 0.014417999680201996,
 -0.0014768437722168122,
 0.03336579196373135,
 -0.008161914152663634,
 -0.011041370634763598,
 -0.016613844004126927,
 -0.003922138179458467,
 0.007423060598562107,
 0.002478956827294155,
 -0.028255966100159614,
 0.039856655519433,
 0.007505922226790582,
 0.04797713978930493,
 -0.005748556191993898,
 0.002347758628384050

In [8]:
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

content: Abstract Factory is a creational design pattern that lets you produce families of related objects without specifying their concrete classes.

Abstract Factory pattern
 Problem
Imagine that you’re creating a furniture shop simulator. Your code consists of classes that represent:

A family of related products, say: Chair + Sofa + CoffeeTable.

Several variants of this family. For example, products Chair + Sofa + CoffeeTable are available in these variants: Modern, Victorian, ArtDeco.

Product families and their variants.
Product families and their variants.

You need a way to create individual furniture objects so that they match other objects of the same family. Customers get quite mad when they receive non-matching furniture.


A Modern-style sofa doesn’t match Victorian-style chairs.


## Step 4: Retrieve

Let's create create a simple application that let's the user ask a question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and finally returns an answer.



In [9]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Context: {context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-1106")

question = "What's abstract factory pattern?"

### Vector store-backed retriever

A vector store retriever is a retriever that uses a vector store to retrieve documents. It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.

In [11]:
vector_store_retriever = db.as_retriever(search_kwargs={"k": 3})
docs = vector_store_retriever.get_relevant_documents(query=question)
docs[1].page_content

'Builder focuses on constructing complex objects step by step. Abstract Factory specializes in creating families of related objects. Abstract Factory returns the product immediately, whereas Builder lets you run some additional construction steps before fetching the product.\n\nAbstract Factory classes are often based on a set of Factory Methods, but you can also use Prototype to compose the methods on these classes.\n\nAbstract Factory can serve as an alternative to Facade when you only want to hide the way the subsystem objects are created from the client code.\n\nYou can use Abstract Factory along with Bridge. This pairing is useful when some abstractions defined by Bridge can only work with specific implementations. In this case, Abstract Factory can encapsulate these relations and hide the complexity from the client code.\n\nAbstract Factories, Builders and Prototypes can all be implemented as Singletons.'

In [12]:
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

vs_retriever_chain = {'question': RunnablePassthrough(), 'context': vector_store_retriever} | QA_CHAIN_PROMPT | llm | StrOutputParser()

vs_retriever_chain.invoke(question)

'The abstract factory pattern is a creational design pattern that allows you to produce families of related objects without specifying their concrete classes. It involves creating interfaces for distinct product types, implementing concrete products based on these interfaces, and using concrete factories to create specific product variants. The client code works with both factories and products via their respective abstract interfaces, allowing for flexibility and future extensibility without depending on concrete classes.'

### MultiQuery Retriever

Retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query.

In [13]:
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [14]:
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever_llm = ChatOpenAI(temperature=0)
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=db.as_retriever(), llm=retriever_llm
)

In [15]:
multi_query_retriever.get_relevant_documents(query=question)
docs[0].page_content

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you explain the concept of the abstract factory pattern?', '2. How would you define the abstract factory pattern?', '3. Could you provide an overview of the abstract factory pattern?']


'content: Abstract Factory is a creational design pattern that lets you produce families of related objects without specifying their concrete classes.\n\nAbstract Factory pattern\n Problem\nImagine that you’re creating a furniture shop simulator. Your code consists of classes that represent:\n\nA family of related products, say: Chair + Sofa + CoffeeTable.\n\nSeveral variants of this family. For example, products Chair + Sofa + CoffeeTable are available in these variants: Modern, Victorian, ArtDeco.\n\nProduct families and their variants.\nProduct families and their variants.\n\nYou need a way to create individual furniture objects so that they match other objects of the same family. Customers get quite mad when they receive non-matching furniture.\n\n\nA Modern-style sofa doesn’t match Victorian-style chairs.'

In [16]:
multi_query_retriever_chain = {'question': RunnablePassthrough(), 'context': multi_query_retriever} | QA_CHAIN_PROMPT | llm | StrOutputParser()

multi_query_retriever_chain.invoke(question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you explain the concept of the abstract factory pattern?', '2. How would you define the abstract factory pattern?', '3. Could you provide an overview of the abstract factory pattern?']


'The Abstract Factory pattern is a creational design pattern that provides an interface for creating families of related objects without specifying their concrete classes. It allows you to produce different types and representations of an object using the same construction code. The pattern organizes object construction into a set of steps and lets you construct complex objects step by step, using only those steps that you really need. The pattern also ensures that the products you’re getting from a factory are compatible with each other and avoids tight coupling between concrete products and client code.'