<a href="https://colab.research.google.com/github/frank-morales2020/Cloud_curious/blob/master/Copy_of_Rag_Fusion_Pipeline_PostgreSQL_Embedding_Mistral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Fusion Query Pipeline

This notebook shows how to implement RAG Fusion using the LlamaIndex Query Pipeline syntax.

Required Dependencies

In [None]:
#added by Frank Morales(FM) 11/01/2024
%pip install openai  --root-user-action=ignore -q
!pip install llama_index phoenix pyvis network -q
!pip install llama_hub -q
%pip install colab-env --upgrade --quiet --root-user-action=ignore
!pip install accelerate -q
#!pip install typing_extensions

!pip install langchain --quiet
!pip install accelerate --quiet
!pip install transformers --quiet
!pip install bitsandbytes --quiet

## Setup / Load Data

We load in the pg_essay.txt data.

In [18]:
import colab_env
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O pg_essay.txt
#!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
--2024-05-09 09:32:28--  https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt [following]
--2024-05-09 09:32:28--  https://www.dropbox.com/s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc04617dec439d553e225cca5a1e.dl.dropboxusercontent.com/cd/0/get/CSjiRqqmIWFC1Dq0yfUX5hqXGU2cRZ03mz5Q0nQwG_yClU3QCaRcSgESgcCpbUSAVY-cnj11y1KzjxEIJt4VgvbQnHJ3sMLhc61po0qmY9CvJy6Rg5gM4Z0vVwvXA9H2MkF25J8x6mKm5SllxnP8r4PQ/file?dl=1# [following]
--2024-05-09 09:32:28--  https://uc0461

POSTGRESQL

In [None]:
#ADDED By FM 11/01/2024

# install PSQL WITH DEV Libraries AND PGVECTOR
!apt install postgresql postgresql-contrib &>log
!service postgresql restart
!sudo apt install postgresql-server-dev-all

In [None]:
print()
# PostGRES SQL Settings
%cd /content/
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres'"

print('START: PG embedding COMPILATION')
%cd /content/
!git clone https://github.com/neondatabase/pg_embedding.git
%cd /content/pg_embedding
!make
!make install # may need sudo
print('END: PG embedding COMPILATION')
print()

#!sudo -u postgres psql -c "DROP EXTENSION embedding"
!sudo -u postgres psql -c "CREATE EXTENSION embedding"
!sudo -u postgres psql -c "DROP TABLE documents"
!sudo -u postgres psql -c "CREATE TABLE documents(id integer PRIMARY KEY, embedding real[])"

In [None]:
!pip install llama-index

In [8]:
import llama_index
llama_index

<module 'llama_index' (<_frozen_importlib_external._NamespaceLoader object at 0x7d474fe961a0>)>

In [19]:
#%pip install openai --root-user-action=ignore

openai.api_key = os.getenv("OPENAI_API_KEY")
#print(os.getenv("OPENAI_API_KEY"))

from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(input_files=["/content/pg_essay.txt"])
docs = reader.load_data()

In [10]:
#ADDED By FM 11/01/2024

from typing import List, Tuple
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

loader = TextLoader("/content/pg_essay.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs0 = text_splitter.split_documents(documents)

collection_name0 = "pg_essay"
print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs0)}')



# of Document Pages 1
# of Document Chunks: 100


## Setup Llama Pack

Next we download the LlamaPack. All the code is in the downloaded directory - we encourage you to take a look to see the QueryPipeline syntax!

In [11]:
#!pip install llama_index
import llama_index
llama_index.core.__version__
#llama_index.core.
from llama_index.core.query_pipeline import QueryPipeline
import llama_index.core.query_pipeline as query_pipeline
llama_index.core

<module 'llama_index.core' from '/usr/local/lib/python3.10/dist-packages/llama_index/core/__init__.py'>

In [12]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

In [13]:
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.core import PromptTemplate

# try chaining basic prompts
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")

p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)

In [14]:
output = p.run(movie_name="The Departed")

[1;3;38;2;155;135;227m> Running module dae74501-257f-4cd7-9eec-0454791d067b with input: 
movie_name: The Departed

[0m[1;3;38;2;155;135;227m> Running module 4379b712-221d-4ea6-9464-5deba594f2ac with input: 
messages: Please generate related movies to The Departed

[0m

In [15]:
print(str(output))

assistant: 1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed
2. The Town (2010) - A crime thriller directed by Ben Affleck
3. Goodfellas (1990) - A classic mobster film directed by Martin Scorsese
4. The Godfather (1972) - A legendary crime film directed by Francis Ford Coppola
5. Mystic River (2003) - A drama directed by Clint Eastwood
6. Casino (1995) - A crime film directed by Martin Scorsese
7. American Gangster (2007) - A crime drama directed by Ridley Scott
8. The Irishman (2019) - A crime epic directed by Martin Scorsese
9. Heat (1995) - A crime thriller directed by Michael Mann
10. The Departed (2006) - A crime thriller directed by Martin Scorsese


In [None]:
# Option 1: Use `download_llama_pack`
# from llama_index.llama_pack import download_llama_pack

# RAGFusionPipelinePack = download_llama_pack(
#     "RAGFusionPipelinePack",
#     "./rag_fusion_pipeline_pack",
#     # leave the below line commented out if using the notebook on main
#     # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_query_pipeline_pack/llama_hub"
# )

# Option 2: Import from llama_hub package
#RAGFusionPipelinePack                                           RAGFusionPipelinePack
from llama_hub.llama_packs.query.rag_fusion_pipeline.base import RAGFusionPipelinePack
from llama_index.llms import OpenAI

In [21]:
from llama_index.core.llama_pack.base import BaseLlamaPack

# Mistral - MODEL

In [22]:
from typing import Dict, Any, List, Optional
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.llms.llm import LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Document, VectorStoreIndex, ServiceContext
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import NodeWithScore
from llama_index.core.node_parser import SentenceSplitter

In [24]:
!pip install -i https://pypi.org/simple/ bitsandbytes -q

In [20]:
#ADDED By FM 11/01/2024
#%pip install colab-env --upgrade --quiet --root-user-action=ignore
#!pip install accelerate

import torch
from textwrap import fill
from IPython.display import Markdown, display

import colab_env
import os

access_token = os.getenv("HF_TOKEN")

from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    )

from langchain import PromptTemplate
from langchain import HuggingFacePipeline

from langchain.vectorstores import Chroma
from langchain.schema import AIMessage, HumanMessage
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredMarkdownLoader, UnstructuredURLLoader
from langchain.chains import LLMChain, SimpleSequentialChain, RetrievalQA, ConversationalRetrievalChain
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
import warnings
warnings.filterwarnings('ignore')

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=quantization_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

#from transformers import AutoTokenizer, MistralForCausalLM

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [21]:
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.8
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    pad_token_id=tokenizer.eos_token_id,
)

In [22]:
llm = HuggingFacePipeline(pipeline=pipeline)

In [23]:
import warnings
warnings.filterwarnings('ignore')

query = "the capital city of canada?"
result = llm(query)

display(Markdown(f"<b>{query}</b>"))
display(Markdown(f"<p>{result}</p>"))

<b>the capital city of canada?</b>

<p>the capital city of canada?

 Ottawa is a city in eastern Canada. It is the capital city, but it is not the largest city in Canada. Montreal and Vancouver are much larger cities with more people.</p>

# EMBEDDING

In [None]:
# 20x faster than pgvector: introducing pg_embedding extension for vector search in Postgres and LangChain
# https://neon.tech/blog/pg-embedding-extension-for-vector-search

#ADDED By FM 11/01/2024

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

# https://supabase.com/blog/fewer-dimensions-are-better-pgvector
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

collection_name='Paul Graham Essay'
connection_string = os.getenv("DATABASE_URL")

db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs0,
    collection_name=collection_name,
    connection_string=connection_string,
)

#db.create_hnsw_index(dims = 1536, m = 8, ef_construction = 16, ef_search = 16)

In [30]:
#ADDED By FM 11/01/2024
query='What did the author do growing up?'
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)

print()
print(query)
print()

for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)


What did the author do growing up?

--------------------------------------------------------------------------------
Score:  0.59927064
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright f

# RAG FUSION PIPELINE

In [31]:
!pip install llama-index-llms-langchain -q

In [63]:
%cd /content/
from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
RAGFusionPipelinePack = download_llama_pack(
  "RAGFusionPipelinePack", "./rag_fusion_pipeline_pack"
)

/content


In [33]:
!rm -rf /content/rag_fusion_pipeline_pack

In [70]:
def get_doc_scores_list(search_results_dict):
    doc_scores_list = []
    for query_description, doc_scores_list in search_results_dict:
        # Wrap the score in a list if it's not already
        if not isinstance(doc_scores_list, list):
            doc_scores_list = [doc_scores_list]

        # Filter out non-Document objects
        doc_scores_list = [doc for doc in doc_scores_list if isinstance(doc, Document)]

        # Ensure that each element in the list is a Document object or a tuple with three elements
        doc_scores_list = [(score, score, "dummy_source") if isinstance(score, str) else score for score in doc_scores_list]

        # Add page_content attribute to Document objects if it's missing
        for doc in doc_scores_list:
            if not hasattr(doc, 'page_content'):
                doc.page_content = str(doc)

        # Add page_content attribute to Document objects
        for doc in doc_scores_list:
            if isinstance(doc, Document):
                doc.page_content = str(doc)

        # Inside the get_doc_scores_list function:
        if not hasattr(doc, 'page_content'):
            doc.page_content = str(doc)

    return doc_scores_list

In [71]:
class HashableDocument:
    def __init__(self, document: Document, score: float):
        self.document = document
        self.score = score

    def __hash__(self):
        # Hash based on the document's content and score
        return hash((self.document.page_content, self.score))

    def __eq__(self, other):
        # Check for equality based on the document's content
        return isinstance(other, HashableDocument) and \
               self.document.page_content == other.document.page_content

In [103]:
"""RAG Fusion Pipeline."""

from typing import Any, Dict, List, Optional

from llama_index.core import Document, ServiceContext, VectorStoreIndex
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.llms.llm import LLM
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_pipeline.components.argpacks import ArgPackComponent
from llama_index.core.query_pipeline.components.function import FnComponent
from llama_index.core.query_pipeline.components.input import InputComponent
from llama_index.core.query_pipeline.query import QueryPipeline
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import NodeWithScore
from llama_index.llms.openai import OpenAI

DEFAULT_CHUNK_SIZES = [128, 256, 512, 1024]


def reciprocal_rank_fusion(
    results: List[List[NodeWithScore]],
) -> List[NodeWithScore]:
    """Apply reciprocal rank fusion.

    The original paper uses k=60 for best results:
    https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
    """
    k = 60.0  # `k` is a parameter used to control the impact of outlier rankings.
    fused_scores = {}
    text_to_node = {}

     # compute reciprocal rank scores by Frank Morales 09/05/2024
    for node_with_score in results:
        if not isinstance(node_with_score, NodeWithScore):
            raise TypeError("node_with_score must be a NodeWithScore object.")
        text = node_with_score.node.get_content()
        text_to_node[text] = node_with_score
        if text not in fused_scores:
          fused_scores[text] = 0.0
        fused_scores[text] += 1.0
        #/ (rank + k)

    # sort results
    reranked_results = dict(
        sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    )

    # adjust node scores
    reranked_nodes: List[NodeWithScore] = []
    for text, score in reranked_results.items():
        reranked_nodes.append(text_to_node[text])
        reranked_nodes[-1].score = score

    return reranked_nodes


class RAGFusionPipelinePack(BaseLlamaPack):
    """RAG Fusion pipeline.

    Create a bunch of vector indexes of different chunk sizes.

    """

    def __init__(
        self,
        documents: List[Document],
        llm: Optional[LLM] = None,
        chunk_sizes: Optional[List[int]] = None,
    ) -> None:
        """Init params."""
        self.documents = documents
        self.chunk_sizes = chunk_sizes or DEFAULT_CHUNK_SIZES

        # construct index
        self.llm = llm or OpenAI(model="gpt-3.5-turbo")

        self.query_engines = []
        self.retrievers = {}
        for chunk_size in self.chunk_sizes:
            splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=0)
            nodes = splitter.get_nodes_from_documents(documents)

            service_context = ServiceContext.from_defaults(llm=self.llm)
            vector_index = VectorStoreIndex(nodes, service_context=service_context)
            self.query_engines.append(vector_index.as_query_engine())

            self.retrievers[str(chunk_size)] = vector_index.as_retriever()

        # define rerank component
        rerank_component = FnComponent(fn=reciprocal_rank_fusion)

        # construct query pipeline
        p = QueryPipeline()
        module_dict = {
            **self.retrievers,
            "input": InputComponent(),
            "summarizer": TreeSummarize(),
            # NOTE: Join args
            "join": ArgPackComponent(),
            "reranker": rerank_component,
        }
        p.add_modules(module_dict)
        # add links from input to retriever (id'ed by chunk_size)
        for chunk_size in self.chunk_sizes:
            p.add_link("input", str(chunk_size))
            p.add_link(str(chunk_size), "join", dest_key=str(chunk_size))
        p.add_link("join", "reranker")
        p.add_link("input", "summarizer", dest_key="query_str")
        p.add_link("reranker", "summarizer", dest_key="nodes")

        self.query_pipeline = p

    def get_modules(self) -> Dict[str, Any]:
        """Get modules."""
        return {
            "llm": self.llm,
            "retrievers": self.retrievers,
            "query_engines": self.query_engines,
            "query_pipeline": self.query_pipeline,
        }

    def run(self, *args: Any, **kwargs: Any) -> Any:
        """Run the pipeline."""
        return self.query_pipeline.run(*args, **kwargs)


In [100]:
pack = RAGFusionPipelinePack(docs, llm)
#from langchain.docstore.document import Document
#/usr/local/lib/python3.10/dist-packages/langchain_core/documents/base.py
query0="What did the author do growing up?"
response0 = pack.run(query=query0)

#('score', 0.8230329654285323

In [101]:
print(response0)

The author, growing up, worked on writing short stories and programming. They wrote simple games, a program to predict rocket heights, and a word processor on a TRS-80 computer. Additionally, they tried programming on an IBM 1401 in 9th grade using an early version of Fortran.


# Examples of Queries

In [102]:
#modify By FM 11/01/2024

#response = pack.run(query="What did the author do growing up?")
query0="What did the author do growing up?"
query='I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.'
query1 = "Who is the President of the USA?"
query2 = "Who won the baseball World Series in 2020? and Who Lost"
query3 = 'Anything about FORTRAN'
query4 = 'Anything about LIPS'
query5 = 'Anything about Python'


response0 = pack.run(query=query0)
response1 = pack.run(query=query1)
response2 = pack.run(query=query2)
response4 = pack.run(query=query4)

print()
print(query0)
print(str(response0))
print()

print()
print(query1)
print(str(response1))
print()

print()
print(query2)
print(str(response2))
print()

print()
print(query4)
print(str(response4))
print()


What did the author do growing up?
The author, growing up, worked on writing short stories and programming. They wrote simple games, a program to predict rocket heights, and a word processor on a TRS-80 computer. Additionally, they took philosophy courses in college before switching to studying AI due to their interest sparked by a novel and a PBS documentary featuring intelligent computers.


Who is the President of the USA?
I am unable to provide real-time information or updates on current events.


Who won the baseball World Series in 2020? and Who Lost
The Los Angeles Dodgers won the baseball World Series in 2020, and the Tampa Bay Rays were the team that lost.


Anything about LIPS
LISP, a programming language, is discussed in the provided context. It is highlighted that LISP was initially created as a formal model of computation, distinct from a typical programming language. The core of LISP involves writing an interpreter within itself, which was a unique feature. LISP's elegan

In [None]:
#response.source_nodes