<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/Rag_Fusion_Langchain_Llamaindex_PostgreSQL_Mistral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Fusion Query Pipeline

This notebook shows how to implement RAG Fusion using the LlamaIndex Query Pipeline syntax.

In [1]:
!nvidia-smi

Sun Jun  2 03:31:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

Required Dependencies

In [None]:
#added by Frank Morales(FM) 11/01/2024
%pip install openai  --root-user-action=ignore -q
!pip install llama_index phoenix pyvis network -q
!pip install llama_hub -q
%pip install colab-env --upgrade --quiet --root-user-action=ignore
!pip install accelerate -q
#!pip install typing_extensions

!pip install langchain --quiet
!pip install accelerate --quiet
!pip install transformers --quiet
!pip install bitsandbytes --quiet

### llama index components
!pip install llama-index-llms-langchain -q
%pip install llama-index-llms-fireworks -q

## Mistral API components
!pip install mistralai --quiet
!pip install -U langchain-core langchain-mistralai -q

## Setup / Load Data

We load in the pg_essay.txt data.

In [None]:
import colab_env
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O pg_essay.txt

llama_index

In [6]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["/content/pg_essay.txt"])
docs = reader.load_data()

# POSTGRESQL

POSTGRESQL

https://www.atlantic.net/dedicated-server-hosting/how-to-install-and-configure-postgres-14-on-ubuntu/

In [None]:
#ADDED By FM 01/06/2024
!apt-get update -y
!apt-get install postgresql-14 -y

!service postgresql restart
!sudo apt install postgresql-server-dev-all

#apt-get -y install postgresql

In [8]:
print()
# PostGRES SQL Settings
%cd /content/
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres'"

print('START: PG embedding COMPILATION')
%cd /content/
!git clone https://github.com/neondatabase/pg_embedding.git
%cd /content/pg_embedding
!make
!make install # may need sudo
print('END: PG embedding COMPILATION')
print()

#!sudo -u postgres psql -c "DROP EXTENSION embedding"
!sudo -u postgres psql -c "CREATE EXTENSION embedding"
#!sudo -u postgres psql -c "DROP TABLE documents"
!sudo -u postgres psql -c "CREATE TABLE documents(id integer PRIMARY KEY, embedding real[])"


/content
ALTER ROLE
START: PG embedding COMPILATION
/content
Cloning into 'pg_embedding'...
remote: Enumerating objects: 553, done.[K
remote: Counting objects: 100% (183/183), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 553 (delta 140), reused 135 (delta 106), pack-reused 370[K
Receiving objects: 100% (553/553), 270.29 KiB | 16.89 MiB/s, done.
Resolving deltas: 100% (317/317), done.
/content/pg_embedding
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -Ofast -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D

# Langchain

In [None]:
!pip install -U langchain-community -q

In [10]:
#ADDED By FM 11/01/2024

from typing import List, Tuple
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

loader = TextLoader("/content/pg_essay.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs0 = text_splitter.split_documents(documents)

collection_name0 = "pg_essay"
print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs0)}')



# of Document Pages 1
# of Document Chunks: 100


# Llama Index

## Setup Llama Pack



In [11]:
#!pip install llama_index
import llama_index
print('LLAMA INDEX VERSION: %s'%llama_index.core.__version__)
#llama_index.core.
from llama_index.core.query_pipeline import QueryPipeline
import llama_index.core.query_pipeline as query_pipeline
#llama_index.core

LLAMA INDEX VERSION: 0.10.42


In [12]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

In [13]:
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.core import PromptTemplate

# try chaining basic prompts
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")

p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)

In [14]:
output = p.run(movie_name="The Departed")

[1;3;38;2;155;135;227m> Running module 78d61d99-6e93-4285-aa82-2b925f1175a3 with input: 
movie_name: The Departed

[0m[1;3;38;2;155;135;227m> Running module d406439c-0f61-4923-95bf-e4d503a16e91 with input: 
messages: Please generate related movies to The Departed

[0m

In [15]:
print(str(output))

assistant: 1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed
2. The Town (2010) - A crime thriller directed by Ben Affleck
3. Mystic River (2003) - A crime drama directed by Clint Eastwood
4. Goodfellas (1990) - A classic crime film directed by Martin Scorsese
5. The Irishman (2019) - Another crime film directed by Martin Scorsese, starring Robert De Niro and Al Pacino
6. Heat (1995) - A crime thriller directed by Michael Mann, starring Al Pacino and Robert De Niro
7. The Departed (2006) - A Hong Kong crime thriller directed by Andrew Lau and Alan Mak
8. The Godfather (1972) - A classic crime film directed by Francis Ford Coppola
9. Casino (1995) - A crime film directed by Martin Scorsese, starring Robert De Niro and Joe Pesci
10. American Gangster (2007) - A crime film directed by Ridley Scott, starring Denzel Washington and Russell Crowe.


In [16]:
# Option 1: Use `download_llama_pack`
# from llama_index.llama_pack import download_llama_pack

# RAGFusionPipelinePack = download_llama_pack(
#     "RAGFusionPipelinePack",
#     "./rag_fusion_pipeline_pack",
#     # leave the below line commented out if using the notebook on main
#     # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_query_pipeline_pack/llama_hub"
# )

# Option 2: Import from llama_hub package
#RAGFusionPipelinePack                                           RAGFusionPipelinePack
#from llama_hub.llama_packs.query.rag_fusion_pipeline.base import RAGFusionPipelinePack
#from llama_index.llms import OpenAI

# Mistral - MODEL - Hugging Face Hub

In [17]:
from typing import Dict, Any, List, Optional
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.llms.llm import LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Document, VectorStoreIndex, ServiceContext
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import NodeWithScore
from llama_index.core.node_parser import SentenceSplitter

In [None]:
#ADDED By FM 11/01/2024
#%pip install colab-env --upgrade --quiet --root-user-action=ignore
#!pip install accelerate

import torch
from textwrap import fill
from IPython.display import Markdown, display

import colab_env
import os

access_token = os.getenv("HF_TOKEN")

from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    )

from langchain import PromptTemplate
from langchain import HuggingFacePipeline

from langchain.vectorstores import Chroma
from langchain.schema import AIMessage, HumanMessage
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredMarkdownLoader, UnstructuredURLLoader
from langchain.chains import LLMChain, SimpleSequentialChain, RetrievalQA, ConversationalRetrievalChain
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
import warnings
warnings.filterwarnings('ignore')

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=quantization_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

#from transformers import AutoTokenizer, MistralForCausalLM

In [19]:
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.8
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    pad_token_id=tokenizer.eos_token_id,
)

In [20]:
llm = HuggingFacePipeline(pipeline=pipeline)

In [21]:
import warnings
warnings.filterwarnings('ignore')

query = "the capital city of canada?"
result = llm(query)

#display(Markdown(f"<b>{query}</b>"))
#display(Markdown(f"<p>{result}</p>"))

# EMBEDDING with OPENAI and Langchain

In [22]:
# 20x faster than pgvector: introducing pg_embedding extension for vector search in Postgres and LangChain
# https://neon.tech/blog/pg-embedding-extension-for-vector-search

#ADDED By FM 11/01/2024

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

# https://supabase.com/blog/fewer-dimensions-are-better-pgvector
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

collection_name='Paul Graham Essay'
connection_string = os.getenv("DATABASE_URL")

db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs0,
    collection_name=collection_name,
    connection_string=connection_string,
)

#db.create_hnsw_index(dims = 1536, m = 8, ef_construction = 16, ef_search = 16)

In [23]:
#ADDED By FM 11/01/2024
query='What did the author do growing up?'
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)

print()
print(query)
print()

for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)


What did the author do growing up?

--------------------------------------------------------------------------------
Score:  0.5991553
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fl

# RAG FUSION PIPELINE

In [24]:
"""RAG Fusion Pipeline."""

from typing import Any, Dict, List, Optional

from llama_index.core import Document, ServiceContext, VectorStoreIndex
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.llms.llm import LLM
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_pipeline.components.argpacks import ArgPackComponent
from llama_index.core.query_pipeline.components.function import FnComponent
from llama_index.core.query_pipeline.components.input import InputComponent
from llama_index.core.query_pipeline.query import QueryPipeline
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import NodeWithScore
from llama_index.llms.openai import OpenAI

DEFAULT_CHUNK_SIZES = [128, 256, 512, 1024]


def reciprocal_rank_fusion(
    results: List[List[NodeWithScore]],
) -> List[NodeWithScore]:
    """Apply reciprocal rank fusion.

    The original paper uses k=60 for best results:
    https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
    """
    k = 60.0  # `k` is a parameter used to control the impact of outlier rankings.
    fused_scores = {}
    text_to_node = {}
    rank=0

#for rank, node_with_score in enumerate(
#            sorted(nodes_with_scores, key=lambda x: x.score or 0.0, reverse=True)
#        ):

# The above lines commented generated this error AttributeError: 'tuple' object has no attribute 'score'


     # compute reciprocal rank scores by Frank Morales 09/05/2024
    for node_with_score in results:
        rank+=1
        if not isinstance(node_with_score, NodeWithScore):
            raise TypeError("node_with_score must be a NodeWithScore object.")
        text = node_with_score.node.get_content()
        text_to_node[text] = node_with_score
        if text not in fused_scores:
          fused_scores[text] = 0.0
        fused_scores[text] += 1.0 / (rank + k)

    # sort results
    reranked_results = dict(
        sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    )

    # adjust node scores
    reranked_nodes: List[NodeWithScore] = []
    for text, score in reranked_results.items():
        reranked_nodes.append(text_to_node[text])
        reranked_nodes[-1].score = score

    return reranked_nodes


class RAGFusionPipelinePack(BaseLlamaPack):
    """RAG Fusion pipeline.

    Create a bunch of vector indexes of different chunk sizes.

    """

    def __init__(
        self,
        documents: List[Document],
        llm: Optional[LLM] = None,
        chunk_sizes: Optional[List[int]] = None,
    ) -> None:
        """Init params."""
        self.documents = documents
        self.chunk_sizes = chunk_sizes or DEFAULT_CHUNK_SIZES

        # construct index
        self.llm = llm or OpenAI(model="gpt-3.5-turbo")

        self.query_engines = []
        self.retrievers = {}
        for chunk_size in self.chunk_sizes:
            splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=0)
            nodes = splitter.get_nodes_from_documents(documents)

            service_context = ServiceContext.from_defaults(llm=self.llm)
            vector_index = VectorStoreIndex(nodes, service_context=service_context)
            self.query_engines.append(vector_index.as_query_engine())

            self.retrievers[str(chunk_size)] = vector_index.as_retriever()

        # define rerank component
        rerank_component = FnComponent(fn=reciprocal_rank_fusion)

        # construct query pipeline
        p = QueryPipeline()
        module_dict = {
            **self.retrievers,
            "input": InputComponent(),
            "summarizer": TreeSummarize(),
            # NOTE: Join args
            "join": ArgPackComponent(),
            "reranker": rerank_component,
        }
        p.add_modules(module_dict)
        # add links from input to retriever (id'ed by chunk_size)
        for chunk_size in self.chunk_sizes:
            p.add_link("input", str(chunk_size))
            p.add_link(str(chunk_size), "join", dest_key=str(chunk_size))
        p.add_link("join", "reranker")
        p.add_link("input", "summarizer", dest_key="query_str")
        p.add_link("reranker", "summarizer", dest_key="nodes")

        self.query_pipeline = p

    def get_modules(self) -> Dict[str, Any]:
        """Get modules."""
        return {
            "llm": self.llm,
            "retrievers": self.retrievers,
            "query_engines": self.query_engines,
            "query_pipeline": self.query_pipeline,
        }

    def run(self, *args: Any, **kwargs: Any) -> Any:
        """Run the pipeline."""
        return self.query_pipeline.run(*args, **kwargs)


https://docs.llamaindex.ai/en/stable/examples/llm/fireworks/

In [25]:
pack = RAGFusionPipelinePack(docs, llm)
query0="What did the author do growing up?"
response0 = pack.run(query=query0)

In [26]:
print(response0)

The author, growing up, worked on writing short stories and programming. They wrote simple games, a program to predict rocket heights, and a word processor. They started programming on a TRS-80 computer and later transitioned to microcomputers, which allowed for more interactive programming experiences compared to the earlier punch card systems. Additionally, the author initially planned to study philosophy in college but switched to AI due to their interest sparked by a novel and a documentary showcasing intelligent computers.


# MISTRAL - MODEL - WITH API

In [27]:
import mistralai
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
import os
import colab_env
import json

api_key = os.environ["MISTRAL_API_KEY"]
client = MistralClient(api_key=api_key)

In [28]:
#open-mistral-7b
#mistral-tiny-2312
#mistral-tiny
#open-mixtral-8x7b
#open-mixtral-8x22b
#open-mixtral-8x22b-2404
#mistral-small-2312

#mistral-small
#mistral-small-2402
#mistral-small-latest
#mistral-medium-latest
#mistral-medium-2312
#mistral-medium
#mistral-large-latest
#mistral-large-2402
#codestral-2405
#codestral-latest
#mistral-embed

In [29]:
from langchain_core.messages import HumanMessage
from langchain_mistralai.chat_models import ChatMistralAI

# If api_key is not passed, default behavior is to use the `MISTRAL_API_KEY` environment variable.
llm_mistral = ChatMistralAI(api_key=api_key,model_name='open-mixtral-8x22b')

messages = [HumanMessage(content="knock knock")]
response=llm_mistral.invoke(messages)
print(response.content)

Who's there? Lettuce. Lettuce who? Lettuce in, it's too cold out here!


In [30]:
pack_mistral = RAGFusionPipelinePack(docs, llm_mistral)
query0="What did the author do growing up?"
response0 = pack_mistral.run(query=query0)

In [31]:
print(response0)

The author, growing up, worked on writing short stories and programming. They wrote simple games, a program to predict rocket heights, and a word processor. They started programming on a TRS-80 computer and later switched to studying AI in college due to their interest sparked by a novel and a PBS documentary.


# Examples of Queries

In [32]:
#modify By FM 11/01/2024

#response = pack.run(query="What did the author do growing up?")
query0="What did the author do growing up?"
query='I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.'
query1 = "Who is the President of the USA?"
query2 = "Who is the best poet of CANADA?"
#query2 = "Who won the baseball World Series in 2023? and Who Lost"
query3 = 'Anything about FORTRAN'
query4 = 'Anything about LIPS'
query5 = 'Anything about Python'


response0 = pack_mistral.run(query=query0)
response1 = pack_mistral.run(query=query1)
response2 = pack_mistral.run(query=query2)
response4 = pack_mistral.run(query=query4)

print()
print(query0)
print(str(response0))
print()

print()
print(query1)
print(str(response1))
print()

print()
print(query2)
print(str(response2))
print()

print()
print(query4)
print(str(response4))
print()


What did the author do growing up?
The author, growing up, worked on writing short stories and programming. They wrote simple games, a program to predict rocket heights, and a word processor on a TRS-80 computer. Additionally, they took philosophy courses in college before switching to AI due to their interest sparked by a novel and a PBS documentary.


Who is the President of the USA?
I am unable to provide an answer to that question based on the context information provided.


Who is the best poet of CANADA?
I cannot provide an answer to the query as the information provided does not mention or relate to any specific poet from Canada.


Anything about LIPS
LISP, a programming language, is discussed in the provided context. It is highlighted that LISP was initially designed as a formal model of computation, distinct from traditional programming languages. The core concept of LISP involves defining a language by writing an interpreter in itself. This unique approach gave LISP a power 