# Exploring DDD model inference with RAG and OpenAI

In this notebook we explore the API and building blocks which are part of a LLM using RAG.

In [174]:
from langchain_openai import OpenAI
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.vectorstores import VectorStoreRetriever
from langchain.chains import RetrievalQA
import os


Load OpenAI API KEY in the environment. Make sure you have a `.env` file one directory higher, which contains your OpenAI API KEY. 

In [175]:
from dotenv import load_dotenv

In [176]:
load_dotenv("../.env")

True

## Loading text covering the theoritcal knowledge on DDD modeling 

In [177]:
loader = TextLoader("../examples/ddd-theory1.txt")

In [178]:
documents = loader.load()

In [201]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 150,
    length_function = len,
)

In [202]:
c_docs = text_splitter.split_documents(documents)

In [203]:
c_docs

[Document(metadata={'source': '../examples/ddd-theory1.txt'}, page_content='Domain-Driven Design: Simple Explanation\nIn this article, we will cover:\n\nStrategic Design\nTactical Design\nAggregate\nRepositories and Services\nWhen you are trying to build complex software it is important that everyone is on the same page. Even though most of us prefer to work alone, at home, with an endless supply of coffee, good software just isn’t built that way.'),
 Document(metadata={'source': '../examples/ddd-theory1.txt'}, page_content='The software itself should represent the business, and it should be clear from the code how the business functions. Software development is difficult enough without the business and engineering using different names for the same thing.\n\nThis is where Domain Driven Design (DDD) comes in, which was made popular by Eric Evans in his 2003 book Domain Driven Design: Tackling Complexity in the Heart of Software.'),
 Document(metadata={'source': '../examples/ddd-theory1

In [204]:
len(c_docs)

31

In [205]:
c_docs[0]

Document(metadata={'source': '../examples/ddd-theory1.txt'}, page_content='Domain-Driven Design: Simple Explanation\nIn this article, we will cover:\n\nStrategic Design\nTactical Design\nAggregate\nRepositories and Services\nWhen you are trying to build complex software it is important that everyone is on the same page. Even though most of us prefer to work alone, at home, with an endless supply of coffee, good software just isn’t built that way.')

## Embedding with FAISS

#### Embedding
The chunked documents will be embedded with OpenAIEmbeddings, which basically means that each chunk will be "laid down" in a high dimensional vector space whose "coordinates" represents its location in this space.
#### Vectorstore
FAISS implements a vectorstore and different similarity search algorithms for in-memory storing of vectors, for persisting vectors to disk and for performing similarity searches (semantic searching).
A popular search method: cosine similarity search where the distance between vectors are calculated. Vectors closely located to each other in space are more similar versus vectors more distantly located are less similar (related) to each other. When a sentence/word/text is embedded, that is "projected into a vector space location", it will be laid down near vectors (text representations) which are similar to it. The closest an embedded text (vector) is to another vector in the space the smaller the distance is (based on cosine method). The closest (nearest to 0), the most similar.
FAISS holds all projected chunks as vectors in a vectorstore and if the store is persisted, it can be used later for retrieval of vectors in combination with the proper embedding model.


In [206]:
# create an OpenAI Embedding model
embedding = OpenAIEmbeddings()

In [207]:
# get a vector store from the given document chunks (from one whole document)
vst = FAISS.from_documents(c_docs, embedding)

Let us do a similarity search based on the following sentence (query):

In [208]:
query = 'What is an value object'

In [209]:
# similarity search without score returned
answer = vst.similarity_search(query)

In [210]:
print(answer[1].page_content)

Even if you don’t end up using Domain Driven Design, value objects can be a great way to write cleaner code in your applications.

Entity or Value Object? #
When modelling your objects it can be difficult to decide whether something should be an entity or a value object.


In [211]:
# similarity search with score returned
answer_with_score = vst.similarity_search_with_score(query)

In [212]:
print(answer_with_score[0])

(Document(id='073a93ae-80bb-4dd7-b75b-57f3205b7833', metadata={'source': '../examples/ddd-theory1.txt'}, page_content='Unlike Entities, Value Objects should be immutable. You can’t update them, if you need a different value then you just create a new one.\n\nWe generally do this by only allowing values to be entered in the constructor and then not providing any setter methods.\n\nThe key thing to understand here is they are an object. You could just as easily create a string to store the email address but by creating a Value Object you are explicitly saying that this is an important part of your domain.'), 0.30747503)


## Querying -in memory: Questions and Answers with OpenAI LLM

We will ask the model (OpenAI LLM) about certain things on DDD augmented with DDD domain specific information/knowledge/vocabulary (RAG).

In [213]:
# acquire a retriever object which is a mechanism which knows how to retrieve data from the FAISS store
retriever = vst.as_retriever()

In [214]:
# create a RAG QA mechanism which knows how to construct a prompt with some user's question about the domain by including additional related information (context) to some question asked.
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

In [215]:
# The user's question about DDD

# retr_query= 'What is the most important DDD concept?'
# retr_query= 'Wat zijn de verschillende DDD modeleer niveaus?'
# retr_query= 'Wie heeft DDD bedacht?'
retr_query= 'Leg uit de relatie tussen entity, aggregate en value object. Geef voorbeelden voor ieder concept.'

In [216]:
# invoke by passing the question to qa object
retr_answer = qa.invoke(retr_query)

In [217]:
print(retr_answer)

{'query': 'Leg uit de relatie tussen entity, aggregate en value object. Geef voorbeelden voor ieder concept.', 'result': ' Een entity is een object met een unieke identiteit en verantwoordelijkheden binnen de domeinlogica. Een aggregate is een groep van meerdere entities en value objects die samenwerken om een specifieke taak of doel te bereiken. Een value object is een object dat geen unieke identiteit heeft en voornamelijk gebruikt wordt om data te dragen en te manipuleren.\n\nEen voorbeeld van een entity in een real estate applicatie zou een woning kunnen zijn. Deze heeft een unieke identiteit en verantwoordelijkheden binnen het domein, zoals het hebben van een adres en het kunnen worden gekocht of verkocht.\n\nEen voorbeeld van een aggregate in dezelfde applicatie zou een vastgoedportefeuille kunnen zijn. Dit is een groep van meerdere woningen en andere entiteiten en value objects die samenwerken om een bepaald doel te bereiken, zoals het beheren en verhuren van vastgoed.\n\nEen vo

## Querying -from disk:  Questions and Answers with OpenAI LLM

Before we start querying from a persistent vector store, let us persist what we have added to the vector store so far. At this point we have no persistent vector store yet. To create it on local filesystem we will need to save it first:

In [218]:
# save the current state to filesystem under the name 'ddd_concepts' (kinda filename of the store on filesystem).
vst.save_local('ddd_concepts')

At this point we have a vector store persisted in our local filesystem, which contains the vector representation of our chunked documents (all pertaining to one document, that is the ddd_concepts.txt file).
Let us construct a QA object with RAG support.

In [219]:
# load the vector representation of our chunked documents based on the same embedding which we used earlier for the chunk document's projection into the vector space.
vst_persisted = FAISS.load_local('ddd_concepts', embedding, allow_dangerous_deserialization=True)

In [223]:
# create the qa object using OpenAI LLM model and the retriever of the persisted FAISS vector store
retr_qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.6), chain_type="stuff", retriever=vst_persisted.as_retriever())

In [224]:
# invoke the same question we used in the "in-memory" situation above
retr_answer = retr_qa.invoke(retr_query)

In [225]:
print(retr_answer)

{'query': 'Leg uit de relatie tussen entity, aggregate en value object. Geef voorbeelden voor ieder concept.', 'result': "\nEntities, aggregates, and value objects are all important concepts in Domain-Driven Design (DDD) and they have a close relationship with each other.\n\nAn entity is a core concept in DDD and represents a distinct and identifiable object in the domain. Entities have a unique identity and can be referenced from other parts of the application. An example of an entity could be a customer or a product. These are objects that have a unique identity and can be referenced by other objects.\n\nAggregates, on the other hand, are a group of entities and value objects that are treated as a single unit. They are responsible for enforcing business invariants and are also transactional boundaries. An example of an aggregate could be a customer's order, which is made up of the customer entity, the products they have ordered (value objects), and other details such as the order pri