<a href="https://colab.research.google.com/github/frank-morales2020/MITDevOps/blob/master/Copy_of_rag_fusion_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Fusion Query Pipeline

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/query/rag_fusion_pipeline/rag_fusion_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows how to implement RAG Fusion using the LlamaIndex Query Pipeline syntax.

## Setup / Load Data

We load in the pg_essay.txt data.

In [1]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt

--2024-01-11 20:18:07--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘pg_essay.txt’


2024-01-11 20:18:07 (48.8 MB/s) - ‘pg_essay.txt’ saved [75042/75042]



In [None]:
#added by Frank Morales(FM) 11/01/2024
!pip install openai  --root-user-action=ignore
!pip install llama_index phoenix pyvis network
!pip install llama_hub
!pip install colab-env --upgrade --quiet --root-user-action=ignore
!pip install accelerate

from llama_index import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["pg_essay.txt"])
docs = reader.load_data()

### [Optional] Setup Tracing

We also setup tracing through Arize Phoenix to look at our outputs.

In [3]:
#import phoenix as px

#px.launch_app()
#import llama_index

#llama_index.set_global_handler("arize_phoenix")

## Setup Llama Pack

Next we download the LlamaPack. All the code is in the downloaded directory - we encourage you to take a look to see the QueryPipeline syntax!

In [4]:
# Option 1: Use `download_llama_pack`
# from llama_index.llama_pack import download_llama_pack

# RAGFusionPipelinePack = download_llama_pack(
#     "RAGFusionPipelinePack",
#     "./rag_fusion_pipeline_pack",
#     # leave the below line commented out if using the notebook on main
#     # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_query_pipeline_pack/llama_hub"
# )

# Option 2: Import from llama_hub package
from llama_hub.llama_packs.query.rag_fusion_pipeline.base import RAGFusionPipelinePack
from llama_index.llms import OpenAI

In [None]:
!pip install langchain --quiet
!pip install accelerate --quiet
!pip install transformers --quiet
!pip install bitsandbytes --quiet

In [None]:
#ADDED By FM 11/01/2024
#%pip install colab-env --upgrade --quiet --root-user-action=ignore
#!pip install accelerate

import torch
from textwrap import fill
from IPython.display import Markdown, display

import colab_env
import os

access_token = os.getenv("HF_TOKEN")

from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    )

from langchain import PromptTemplate
from langchain import HuggingFacePipeline

from langchain.vectorstores import Chroma
from langchain.schema import AIMessage, HumanMessage
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredMarkdownLoader, UnstructuredURLLoader
from langchain.chains import LLMChain, SimpleSequentialChain, RetrievalQA, ConversationalRetrievalChain
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
import warnings
warnings.filterwarnings('ignore')

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=quantization_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

#from transformers import AutoTokenizer, MistralForCausalLM

In [7]:
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.8
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    pad_token_id=tokenizer.eos_token_id,
)

In [8]:
llm = HuggingFacePipeline(pipeline=pipeline)

In [23]:
import warnings
warnings.filterwarnings('ignore')

query = "How AWS has evolved?"
query = "Who won the baseball World Series in 2020? and Who Lost"
result = llm(query)

display(Markdown(f"<b>{query}</b>"))
display(Markdown(f"<p>{result}</p>"))

<b>Who won the baseball World Series in 2020? and Who Lost</b>

<p> the MLB World Series?
Answer: The Los Angeles Dodgers won the 2020 baseball World Series, while the Tampa Bay Rays lost it.</p>

In [15]:
import colab_env
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")
#pack = RAGFusionPipelinePack(docs, llm=OpenAI(model="gpt-3.5-turbo"))

### MISTRAL
pack = RAGFusionPipelinePack(docs, llm)

## Inspecting the Code

If we take a look at how it's setup (in your downloaded directory, you'll see the following code using our QueryPipeline syntax).

`retrievers` is a dictionary mapping a chunk size to retrievers (chunk sizes: 128, 256, 512, 1024).

```python
# construct query pipeline
p = QueryPipeline()
module_dict = {
    **self.retrievers,
    "input": InputComponent(),
    "summarizer": TreeSummarize(),
    # NOTE: Join args
    "join": ArgPackComponent(),
    "reranker": rerank_component,
}
p.add_modules(module_dict)
# add links from input to retriever (id'ed by chunk_size)
for chunk_size in self.chunk_sizes:
    p.add_link("input", str(chunk_size))
    p.add_link(str(chunk_size), "join", dest_key=str(chunk_size))
p.add_link("join", "reranker")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("reranker", "summarizer", dest_key="nodes")
```

We visualize the DAG below.

In [17]:
from pyvis.network import Network

#"cdn_resources is not in ['in_line','remote','local']."
#net = Network(notebook=True, cdn_resources="in_line", directed=True)
#net.from_nx(pack.query_pipeline.dag)
#net.show("rag_dag.html")

In [18]:
#modify By FM 11/01/2024

#response = pack.run(query="What did the author do growing up?")
query0="What did the author do growing up?"
query='I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.'
query1 = "Who is the President of the USA?"
query2 = "Who won the baseball World Series in 2020? and Who Lost"
query3 = 'Anything about FORTRAN'
query4 = 'Anything about LIPS'
query5 = 'Anything about Python'

response0 = pack.run(query=query0)
response1 = pack.run(query=query1)
response2 = pack.run(query=query2)
response4 = pack.run(query=query4)

print()
print(query0)
print(str(response0))
print()

print()
print(query1)
print(str(response1))
print()

print()
print(query2)
print(str(response2))
print()

print()
print(query4)
print(str(response4))
print()


What did the author do growing up?
The author wrote short stories and tried programming on an IBM 1401 computer during their school days. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also expressed an interest in studying philosophy in college but ended up switching to AI.


Who is the President of the USA?
I'm sorry, but I cannot answer that question based on the given context information.


Who won the baseball World Series in 2020? and Who Lost
I'm sorry, but I cannot answer that query based on the given context information.


Anything about LIPS
Lisp is a programming language that was originally intended as a formal model of computation, an alternative to the Turing machine. It was invented by John McCarthy and later became a programming language in the ordinary sense. Lisp has a core that is defined by writing an interpreter in itself. McCarthy's original Lisp interpreter could only interpret Lisp expressions and la

In [14]:
# response.source_nodes