Either run this notebook in the conda environment specified by environment.yml,
or run the commands below to install necessary dependencies.

In [None]:
!sudo apt-get install --quiet -y poppler-utils
!pip install -q colpali-engine transformers qwen-vl-utils accelerate flash-attn matplotlib numpy pillow scikit-learn torch pdf2image requests
!pip install pyzotero langchain_community html2text unsloth qdrant_client stamina

In [1]:
import os
import torch
import time
import numpy as np
from tqdm import tqdm

from pyzotero import zotero

from zoterorag.datamodel import Document
from zoterorag.rag import RAG  

from dotenv import load_dotenv

load_dotenv()


USER_AGENT environment variable not set, consider setting it to identify your requests.

Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastVisionModel


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


True

In [2]:
# Choose the zotero library you want to use to build your RAG database.
library_id = "17717148"
library_type = "user"

zot = zotero.Zotero(library_id, library_type, os.getenv("ZOTERO_API_KEY"))

In [None]:
qdrant_local_url = ":memory:"
qdrant_collection = "zotero_library"
retrieval_model = "nomic-ai/nomic-embed-multimodal-3b"

rag = RAG(qdrant_local_url, qdrant_collection, retrieval_model, device="cuda")

  self.qdrant_client = QdrantClient(url=qdrant_url)


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/4.07G [00:00<?, ?B/s]

RuntimeError: CUDA driver error: out of memory

In [None]:

items = zot.everything(zot.top())#zot.all_top()
for item in items:
    print('Item: %s | Key: %s' % (item['data']['itemType'], item['data']['key']), item.get('title') or item['data']['title'])

In [None]:
from pathlib import Path

# Indexing process 
rag.create_collection()
# Create temp_folder for downloading PDFs
Path("temp_folder").mkdir(exist_ok=True)

items = zot.everything(zot.top())
text = []
images = []
for item in items:
    dat = Document.load(item['data']['url'],item, zot)
    text.extend(dat.get_text())
    images.extend(dat.get_images())


In [None]:
rag.index_dataset(text,type  = "text",batch_size = 2)
rag.index_dataset(images,type = "image",batch_size = 2)

# For some reason the text data has to be indexed twice..
rag.index_dataset(text,type  = "text",batch_size = 2)

In [22]:
query_text = "What did Aman say about long context extension?"
search_result = rag.search(query_text, top_k=10)

Qdrant search completed in 0.0204 seconds


In [None]:
from zotrag.show_thumbnails import show_thumbnails

# Example usage:
images = rag.get_images(search_result)
if images:
    show_thumbnails(images[:3],thumb_size=(500, 500))

In [28]:
response, text_query = rag.generate(query_text, search_result, top_k_text=3)

Text query has length 3146.


In [27]:
print([item.payload['url'] for item in search_result.points])

['https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/LLM/', 'https://aman.ai/primers/ai/LLM/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/context-length-extension/', 'https://aman.ai/primers/ai/LLM/']


In [29]:
from IPython.display import display, Markdown

Markdown(response[0])

Based on the provided information, Aman, through his articles on 'NLP • LLM Context Length Extension', discusses various methods for extending the context length of large language models. These methods include techniques like position interpolation, which help extend the context window of models like Llama 2, as well as methods such as LongLoRA and LongQLoRA that focus on efficiently and effectively extending the context lengths of large language models.

In his writings, Aman highlights the advantages of extended context length for language models, stating that increased context length enables models to provide more tailored and efficient interactions without needing model recalibration. This leads to enhancements in accuracy, fluency, and creativity due to on-the-fly learning capabilities provided by in-memory processing.

However, there isn't a specific mention of what Aman said about "long context extension" beyond these general points, so based on the available information, we understand that his works emphasize improving the scalability and capability of large language models through longer context windows.

In [30]:
Markdown(text_query)

Here is the text query:

        What did Aman say about long context extension?

        Here is text context 1: 
The following is chunk 6  from the web page  https://aman.ai/primers/ai/context-length-extension/ with title 'Aman's AI Journal • NLP • LLM Context Length Extension'. It was created by the author(s) Aman Chadha.
## Citation

    
    
    @article{Chadha2020DistilledContextLengthExtension,
      title   = {LLM Context Length Extension},
      author  = {Chadha, Aman and Jain, Vinija},
      journal = {Distilled AI},
      year    = {2020},
      note    = {\url{https://aman.ai}}
    }
    

  * [ ](https://github.com/amanchadha) | [ ](https://citations.amanchadha.com/) |  [ ](https://twitter.com/i_amanchadha) | [ ](mailto:hi@aman.ai) | 

[www.amanchadha.com](https://www.amanchadha.com/)
Here is text context 2: 
The following is chunk 0  from the web page  https://aman.ai/primers/ai/context-length-extension/ with title 'Aman's AI Journal • NLP • LLM Context Length Extension'. It was created by the author(s) Aman Chadha.
[Distilled AI](../) [Back to aman.ai](https://aman.ai)

# NLP • LLM Context Length Extension

  * Overview
  * Advantages of Extended Context Length
  * Background: Interpolation and how it increases context length
    * Extending Context Window of Large Language Models via Position Interpolation
    * Deep Dive into how Llama 2’s context window increased
  * Background: NTK, NTK-Aware, and Dynamic NTK
    * NTK (Neural Tangent Kernel)
    * NTK-Aware Method
    * Dynamic NTK Method
  * Related Papers
    * Extending Context Window of Large Language Models via Positional Interpolation
    * YaRN: Efficient Context Window Extension of Large Language Models
    * LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
    * LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models
    * MemGPT: Towards LLMs as Operating Systems
    * LM-Infinite: Simple On-The-Fly Length Generalization for Large Language Models
    * LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
    * In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
  * Citation
Here is text context 3: 
The following is chunk 2  from the web page  https://aman.ai/primers/ai/context-length-extension/ with title 'Aman's AI Journal • NLP • LLM Context Length Extension'. It was created by the author(s) Aman Chadha.
## Advantages of Extended Context Length

  * An LLM with an expanded context length can offer more tailored and efficient interactions by processing user-specific data without the need for model recalibration. This on-the-fly learning approach, leveraging in-memory processing, has the potential to enhance accuracy, fluency, and creativity.
  * **Analogy for Context:** Similar to how computer RAM retains the operational context of software applications, an extended context length allows an LLM to maintain and process a broader scope of user data.
  * In this article, we aim to present a detailed examination of methods focused on increasing the context length, emphasizing their practical implications and benefits.
