# Strategies to tackle context window limitations
Models always have a context window i.e. the length of words, actually **tokens**, it can process at a given time. This is an issue when we want to process any text that doesn't fit in this context window. We can break the text into chunks that can fit. In this notebook we demonstrate a number of techniques to tackle this issue.

In [1]:
text = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely."

### Fixed size chunking

The simplest approach is to make chunks that fit in your context window without worrying about where it cuts the text. This, as you can see in the example below, is not ideal. Some words get chopped and when the LLM will see these chunks separately, it won't be able to infer information correctly.

In [2]:
chunk_size = 20
[text[i: i + chunk_size] for i in range(0, len(text), chunk_size)]

['The dominant sequenc',
 'e transduction model',
 's are based on compl',
 'ex recurrent or conv',
 'olutional neural net',
 'works in an encoder-',
 'decoder configuratio',
 'n. The best performi',
 'ng models also conne',
 'ct the encoder and d',
 'ecoder through an at',
 'tention mechanism. W',
 'e propose a new simp',
 'le network architect',
 'ure, the Transformer',
 ', based solely on at',
 'tention mechanisms, ',
 'dispensing with recu',
 'rrence and convoluti',
 'ons entirely.']

### Splitting based on special characters

We can improve on this and start splitting by special characters such as "." or "\n". This keeps most semantic information in the same chunk. But of course, there can be cases where some information is lost in a previous chunk, sentence when we split by ".".

In [3]:
text.split(".")

['The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration',
 ' The best performing models also connect the encoder and decoder through an attention mechanism',
 ' We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely',
 '']

### Overlap between chunks

To try to not lose this semantic information too much, we can add some overlap between chunks. This way some information is trickled in from the previous chunk and some from the next. Imagine this as reading the last two lines of the last paragraph and the first two of the next alongside the current paragraph you are reading.

In [14]:
sentences = text.split(".")
overlap = 15
[sentences[i-1][-overlap:] + sentences[i] + sentences[i+1][:5]  for i in range(0, len(sentences)-1)]

['The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration The ',
 'r configuration The best performing models also connect the encoder and decoder through an attention mechanism We p',
 'ntion mechanism We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely']

### Embeddings, Vectors, RAG

If there are too many chunks to process all of them every time a query is made, a RAG, Retrieval Augmented Generation, approach can be used. This is usually used with a vector database to do a similarity search to find relevant chunks before querying the LLM.

In [None]:
!pip install sentence-transformers chromadb

In [15]:
from sentence_transformers import SentenceTransformer

We can use an embedding model to find suitable vectors to represent our vocabulary whether it is words or sentences. These vectors are then stored in a vector database for retrieval later.

In [None]:
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = text.split(".")

text_embeddings = model.encode(sentences)
print(text_embeddings.shape)

ChromaDB is an Open Source Vector Database that we can use for our RAG application to query before the request is sent to the LLM. ChromaDB as default uses the same Sentence Embedding model we used above, "all-MiniLM-L6-v2".

In [None]:
import chromadb
client = chromadb.Client()
collection = client.create_collection(name="MySentenceStore")
collection.add(documents=sentences, ids=[str(id) for id in range(0, len(sentences))])

Before we send our query to the LLM, we find a relevant chunk from our vector database. In this example, it will give us the sentence most relevant to our question.

In [18]:
query_results = collection.query(query_texts=["What do the best performing models do?"], n_results=1)

In [19]:
print(query_results["documents"])

[[' The best performing models also connect the encoder and decoder through an attention mechanism']]
