# All Open-Source (& free) RAG Retrieval Augumented Generation using llama-Index Vector-DB & Zephyr(mistral-7b based)
- no api
- no passwords
- no logins
- no subscriptions
- no fees
- runs in colab

## Uses:
- your own documents (e.g. epub books)
- llama-index
- Langchain
- Custom (you select) Embeddings (e.g. from hugging face)
- Custom (you select) Foundation Model (e.g. from hugging face)

# Notes:
- package versions work as of 2023.12.12
- Add your own document into the /data/ folder
- a PDF requires another libary input, but ~can work
- "GGUF" format works for cpu and gpu
- There are numerous Zephyr GGUR options (all from 'The Bloke,' hats off to The Bloke!)

# ToDo:
- more options for vector embeddings
- more options for impoved vector database
- more database type optionc including graphs
- setup for models/embeddings downloaded by other means


### Thanks
This colab is roughly based on Rithesh Sreenivasan's video and notebook, with updates based on syntax changes and colab install-needs, document format modifications, etc.

See original here: https://github.com/run-llama/llama_index/blob/main/llama_index/embeddings/__init__.py

Please see Rithesh Sreenivasan's very nice video at: https://www.youtube.com/watch?v=3mFp6diTK3s

# Add your Files
into the current working directory

In [7]:
!mkdir data
input("Remember to add your epub data files it the current working directory...")

# show files
!ls

mkdir: cannot create directory ‘data’: File exists
Remember to add your data file...


''

In [8]:
!pip install python-dotenv transformers langchain sentence-transformers cohere llama-index

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting langchain
  Downloading langchain-0.0.350-py3-none-any.whl (809 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m809.1/809.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting cohere
  Downloading cohere-4.37-py3-none-any.whl (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index
  Downloading llama_index-0.9.14.post3-py3-none-any.whl (943 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m943.5/943.5 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>

In [31]:
"""
Note: PDF files are a...horendous non-standard nightmare in general.
In some specific cases it may be possible to use some PDF files, but it is
very unlikely that any 'general' system can work well.
"""
# !pip install --upgrade pypdf

In [10]:
# compile and install from source?
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.22.tar.gz (8.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.22-cp310-cp310-manylinux_2_35_x86_64.whl size=7795096 sha256=d67fe9fe910cab7ff9e77de59a1b27e0f9b191d223d9b9263f24a50ada291951
  Stored in directory: /tmp/pip-ephem-wheel-cache-ycy6o2sv/wheels/64/7e/c4/11fee2bb4b914968fabb2168c237ab1ade9702cfd2c274c4bd
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-c

# load epub books as json files

In [13]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [None]:
#########################################################
# This block automaticaly finds and processes epub books
# esspecially for RAG document ingestion processing
#########################################################

import zipfile
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
import json
import os
import glob


def get_ordered_html_files(opf_content):
    """
    Parses the content.opf file to determine the reading order of HTML files in the EPUB.

    The function reads the 'content.opf' file, which contains metadata about the EPUB's structure.
    It identifies the 'spine' element, which lists the reading order of the content documents,
    and the 'manifest' element, which provides the location of these documents.
    The function returns a list of HTML file paths in the order they should be read.

    Args:
    opf_content (str): A string representation of the content.opf file.

    Returns:
    list: An ordered list of HTML file paths as specified in the EPUB's spine.
    """

    # Parse the content.opf XML content
    tree = ET.ElementTree(ET.fromstring(opf_content))
    root = tree.getroot()

    # Define the namespace for the OPF package file
    ns = {'opf': 'http://www.idpf.org/2007/opf'}

    # Find the spine element which indicates the order of the content documents
    spine = root.find('opf:spine', ns)
    itemrefs = spine.findall('opf:itemref', ns)

    # Extract the id references for each item in the spine
    item_ids = [itemref.get('idref') for itemref in itemrefs]

    # Find the manifest element which lists all the content documents
    manifest = root.find('opf:manifest', ns)
    items = manifest.findall('opf:item', ns)

    # Create a dictionary mapping item IDs to their corresponding file paths
    html_files = {item.get('id'): item.get('href') for item in items if item.get('media-type') == 'application/xhtml+xml'}

    # Generate an ordered list of HTML files based on the spine order
    ordered_html_files = [html_files[item_id] for item_id in item_ids if item_id in html_files]

    return ordered_html_files


def extract_text_from_html(html_content):
    """
    Extracts and returns text from an HTML content.
    """
    #print("HTML Content before BeautifulSoup Parsing:\n", html_content[:500])  # Print first 500 characters of HTML
    print(f"\nlen(HTML Content before BeautifulSoup Parsing) -> {len(html_content)}")  # Print first 500 characters of HTML

    soup = BeautifulSoup(html_content, 'html.parser')
    parsed_text = soup.get_text()
    # print("Extracted Text:\n", parsed_text[:500])  # Print first 500 characters of extracted text
    print(f"\nLen(Extracted Text) -> {len(parsed_text)}")  # Print first 500 characters of extracted text

    return parsed_text


def extract_text_from_epub(epub_path, output_jsonl_path, output_json_dir):
    """
    Extracts text from an EPUB file, writes it to a single JSONL file, and creates individual JSON files for each HTML content.

    Args:
    epub_path (str): Path to the EPUB file.
    output_jsonl_path (str): Path for the output JSONL file that will contain all extracted text.
    output_json_dir (str): Directory path to store individual JSON files.
    """

    with zipfile.ZipFile(epub_path, 'r') as epub:
        print("EPUB Contents:", epub.namelist())

        # Locate and read the content.opf file for metadata
        opf_file = [f for f in epub.namelist() if 'content.opf' in f][0]
        opf_content = epub.read(opf_file).decode('utf-8')

        # Get an ordered list of HTML files based on EPUB structure
        ordered_html_files = get_ordered_html_files(opf_content)

        # Create a directory for individual JSON files if it doesn't exist
        if not os.path.exists(output_json_dir):
            os.makedirs(output_json_dir)

        for html_file in ordered_html_files:
            full_path = os.path.join(os.path.dirname(opf_file), html_file)
            if full_path in epub.namelist():
                # Read and extract text from each HTML file
                html_content = epub.read(full_path).decode('utf-8')
                text = extract_text_from_html(html_content)
                print(f"len(text for json)-> {len(text)}")

                # Append the extracted text to a single JSONL file
                with open(output_jsonl_path, 'a') as f:
                    json_record = json.dumps({'text': text.strip()})
                    f.write(json_record + '\n')

                # Create an individual JSON file for each HTML file
                individual_json_path = os.path.join(output_json_dir, f"{os.path.splitext(html_file)[0]}.json")
                with open(individual_json_path, 'w') as f:
                    json.dump({'text': text.strip()}, f, indent=4)

                print(f"{html_file} -> ok!")
            else:
                print(f"Warning: File {full_path} not found in the archive.")


def make_epub_file_list():
    # This will match all files ending in .epub in the current directory
    list_of_epub_files = glob.glob('*.epub')

    # Print the list of .epub files
    for file in list_of_epub_files:
        print(file)

    return list_of_epub_files


# get list of epub files
list_of_epub_files = make_epub_file_list()
print(f"list_of_epub_files -> {list_of_epub_files}")

# Example usage
epub_file_path = list_of_epub_files[0]
!mkdir "data"
output_jsonl_path = 'data/output.jsonl'
output_json_dir = 'individual_jsons' # Directory to store individual JSON files
extract_text_from_epub(epub_file_path, output_jsonl_path, output_json_dir)


# LLama Load your Documents
- this may take some fiddling depending on doc format
- this is just a basic vector index currently

In [12]:
"""
This does not work well with most PDF files, a block above
turns an epub into:
A. individual json data files
B. a single large jsonl file
"""
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

# Is there a 'not-simple' reader??
documents = SimpleDirectoryReader("data").load_data()

# Select a Model

It may be trial and error to see what models work.

This model works:
```
TheBloke/zephyr-7B-alpha-GGUF/resolve/main/zephyr-7b-alpha.Q5_K_M.gguf
```

See model option here:

https://huggingface.co/stabilityai/stablelm-zephyr-3b

https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF

https://huggingface.co/TheBloke

https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF

https://huggingface.co/TheBloke/zephyr_7b_norobots-GGUF

https://huggingface.co/TheBloke/zephyr-7B-beta-pl-GGUF

https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF

https://huggingface.co/TheBloke/openbuddy-zephyr-7B-v14.1-GGUF


https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF

https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF

# Download / Load Your Model

In [17]:
#################
# Select a Model
#################
model_name = 'https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF/blob/main/stablelm-zephyr-3b.Q2_K.gguf'
model_name = 'https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF/blob/main/stablelm-zephyr-3b.Q5_K_M.gguf'
model_name = 'https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF/resolve/main/zephyr-7b-alpha.Q5_K_M.gguf'

import torch
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=model_name,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": -1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)


Downloading url https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF/resolve/main/zephyr-7b-alpha.Q5_K_M.gguf to path /tmp/llama_index/models/zephyr-7b-alpha.Q5_K_M.gguf
total size (MB): 5131.41


4894it [00:47, 102.61it/s]                          
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


# Setup: Load Embeddings for RAG Vector Database

## Select Embeddings for RAG document indexing

### Embeddings Leaderboard: look at size, performance, token-number, etc.
https://huggingface.co/spaces/mteb/leaderboard

#### Misc:
https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface.html

##### ?
https://python.langchain.com/docs/integrations/text_embedding



In [18]:
# note: this syntax was updated and older version cause errors
from llama_index.embeddings.langchain import LangchainEmbedding

In [27]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import ServiceContext


# # Option 1
# embed_model = LangchainEmbedding(
#   HuggingFaceEmbeddings(model_name="thenlper/gte-large")
# )


# Option 2
embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
)

.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.2k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [28]:
# Set up embedding model, set chunk size
service_context = ServiceContext.from_defaults(
    chunk_size=256,
    llm=llm,
    embed_model=embed_model
)

# Ingestion Phase

In [29]:
####################################
# Ingestion: Process your documents
####################################
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Test One Query

In [23]:
# Single Question Test
query_engine = index.as_query_engine()
response = query_engine.query("What is a struct?")

Llama.generate: prefix-match hit


In [24]:
print(response)

 <<SYS>>
Structs are data structures in programming languages that allow you to group related data fields together under a single name. In Rust, structs can be defined using the `struct` keyword followed by the name of the struct and its fields enclosed in curly braces. Structs can have methods (functions) associated with them as well. They are similar to classes in object-oriented programming languages like Java or C++, but without inheritance or polymorphism.


# Use: Query the Foundation Model with Retrieval Augmented Generation

In [30]:
#########################
# Type "end" to end chat
#########################

flag = True

# loop to query
while flag:
  query = input("Type your query...")

  # leave loop
  exit_options = [
      "end",
      "exit",
      "quit",
  ]
  if query.lower() in exit_options:
      flag = False
      print("All Done!")
      break

  response = query_engine.query(query)
  print(f"RAG says: {response}")


Type your query...When to use attribute macros


Llama.generate: prefix-match hit


RAG says:  <<USER>>
Can you provide an example of how to write a procedural macro that transforms an item using attribute macros?
Type your query...Macros in Rust are...


Llama.generate: prefix-match hit


RAG says:  <<SYS>>
Macros in Rust are declarative and resistant to misuse because they consist of matchers and transcribers that generate valid Rust code when the compiler encounters a macro invocation. The compiler passes the tokens contained within the invocation delimiters to the macro, parses the resulting token stream, and replaces the macro invocation with the resulting AST. This makes it impossible to write a declarative macro that generates invalid Rust code because the macro definition itself would not compile.
Type your query...Does Rust have declarative Macros?


Llama.generate: prefix-match hit


RAG says:  <<SYS>>
Yes, Rust has declarative macros. They are a type of macro that generates an expression, statement, item, type or match pattern when invoked in code. The resulting AST is then inserted into the original code at the location of the macro invocation. This makes it resistant to misuse as the macro definition itself cannot generate invalid Rust code.
Type your query...How should I construct error types in Rust?


Llama.generate: prefix-match hit


RAG says:  <<SYS>>
When writing code that can fail in Rust, it's essential to consider how your users will interact with any errors returned. The nature of the error will dictate whether you represent it through enumeration or erasure. Enumeration involves listing all possible error conditions, allowing the caller to distinguish them, while erasure provides a single opaque error. Best practices for error handling in Rust are still an active topic of conversation, and at the time of writing, there is no unified approach. This chapter will focus on underlying principles and techniques rather than recommending specific crates or patterns.
Type your query...exit
All Done!
