<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/GPT4_Rag_Fusion_LlamaIndex_Pipeline_PostgreSQL_Embedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Fusion Query Pipeline

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/query/rag_fusion_pipeline/rag_fusion_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows how to implement RAG Fusion using the LlamaIndex Query Pipeline syntax.

Required Dependencies

In [None]:
#added by Frank Morales(FM) 22/02/2024
%pip install openai  --root-user-action=ignore
!pip install llama_index phoenix pyvis network
!pip install llama_hub
%pip install colab-env --upgrade --quiet --root-user-action=ignore
!pip install accelerate
#!pip install typing_extensions

!pip install langchain --quiet
!pip install accelerate --quiet
!pip install transformers --quiet
!pip install bitsandbytes --quiet

## Setup / Load Data

We load in the pg_essay.txt data.

In [2]:
import colab_env
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt

Mounted at /content/gdrive
--2024-02-22 14:31:44--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘pg_essay.txt’


2024-02-22 14:31:45 (1.50 MB/s) - ‘pg_essay.txt’ saved [75042/75042]



POSTGRESQL

In [None]:
#ADDED By FM 11/01/2024

# install PSQL WITH DEV Libraries AND PGVECTOR
!apt install postgresql postgresql-contrib &>log
!service postgresql restart
!sudo apt install postgresql-server-dev-all

In [None]:
print()
# PostGRES SQL Settings
%cd /content/
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres'"

print('START: PG embedding COMPILATION')
%cd /content/
!git clone https://github.com/neondatabase/pg_embedding.git
%cd /content/pg_embedding
!make
!make install # may need sudo
print('END: PG embedding COMPILATION')
print()

#!sudo -u postgres psql -c "DROP EXTENSION embedding"
!sudo -u postgres psql -c "CREATE EXTENSION embedding"
!sudo -u postgres psql -c "DROP TABLE documents"
!sudo -u postgres psql -c "CREATE TABLE documents(id integer PRIMARY KEY, embedding real[])"

In [5]:
#!pip install llama-index
#!pip install llama-index
import llama_index.core.readers as readers
import os
import openai
#readers.SimpleDirectoryReader(input_files=["/content/pg_essay.txt"])

reader = readers.SimpleDirectoryReader(input_files=["/content/pg_essay.txt"])
docs = reader.load_data()

openai.api_key = os.getenv("OPENAI_API_KEY")

In [6]:
#%pip install openai --root-user-action=ignore
import openai
#%pip install llama_index
import colab_env
import os

openai.api_key = os.getenv("OPENAI_API_KEY")
#print(os.getenv("OPENAI_API_KEY"))


In [7]:
#ADDED By FM 22/02/2024

from typing import List, Tuple
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

loader = TextLoader("/content/pg_essay.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs0 = text_splitter.split_documents(documents)

collection_name0 = "pg_essay"
print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs0)}')



# of Document Pages 1
# of Document Chunks: 100


## Setup Llama Pack

Next we download the LlamaPack. All the code is in the downloaded directory - we encourage you to take a look to see the QueryPipeline syntax!

In [None]:
!git clone https://github.com/run-llama/llama_index.git

In [None]:


%cd /content/llama_index/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai
#%pip install -e .
#import AsyncOpenA

from llama_index.llms.openai.base import AsyncOpenAI, OpenAI, SyncOpenAI, Tokenizer
#__all__ = ["OpenAI", "Tokenizer", "SyncOpenAI", "AsyncOpenAI"]
#print(OpenAI)

llm=OpenAI(model="gpt-4")
#print(llm)


# RAG FUSION PIPELINE

In [10]:
import llama_hub
from llama_hub.llama_packs.query.rag_fusion_pipeline import rag_fusion_pipeline_pack
RAGFusionPipelinePack=rag_fusion_pipeline_pack



import llama_index.core.readers as readers
reader = readers.SimpleDirectoryReader(input_files=["/content/pg_essay.txt"])
docs = reader.load_data()

from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
RAGFusionPipelinePack = download_llama_pack(
    "RAGFusionPipelinePack", "./rag_fusion_pipeline_pack"
)


#Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview,
#gpt-4-vision-preview, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314,
#gpt-3.5-turbo, gpt-3.5-turbo-16k, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613,
#gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002,
#gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001,
#ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-1106,
#gpt-35-turbo-0613, gpt-35-turbo-16k-0613

#### OPENAI MODELS ########
#pack = RAGFusionPipelinePack(docs, llm=OpenAI(model="gpt-3.5-turbo")) ### ORIGINAL
#pack = RAGFusionPipelinePack(docs, llm=OpenAI(model="gpt-4-1106-preview"))
#pack = RAGFusionPipelinePack(docs, llm=OpenAI(model="gpt-4-vision-preview"))
pack = RAGFusionPipelinePack(docs, llm=OpenAI(model="gpt-4"))

query0="What did the author do growing up?"
response0 = pack.run(query=query0)
print(response0)

The author, growing up, worked on writing short stories and programming. They started writing short stories before college, focusing on characters with strong feelings rather than intricate plots. In terms of programming, they began by writing programs on an IBM 1401 using an early version of Fortran. Later on, they transitioned to working with microcomputers, specifically a TRS-80, where they wrote simple games and a word processor.


# GPT-4 - MODEL

In [11]:
def gpt_reponse(query):
  response = client.chat.completions.create(
    model="gpt-4",
    #model="gpt-3.5-turbo"
    #response_format={ "type": "json_object" },
    messages=[
      #{"role": "system", "content": "You are a helpful assistant designed to output JSON."},
      {"role": "system", "content": "You are a helpful assistant designed to output text."},
      {"role": "user", "content": query}
    ]
  )

  return response

In [12]:
import warnings
warnings.filterwarnings('ignore')

import colab_env
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")


from openai import OpenAI
client = OpenAI()

response=gpt_reponse("Who won the world series in 2020?")
print(response.choices[0].message.content)

The Los Angeles Dodgers won the World Series in 2020.


In [13]:
response=gpt_reponse("How AWS has evolved?")
print(response.choices[0].message.content)


Amazon Web Services, or AWS, has evolved significantly since its inception in 2006. Here are the key points in its evolution:

1. Initially launched as a business unit offering cloud storage (S3) and computation power (EC2), AWS was a pioneer in introducing a new model of business computing.

2. Over the next several years, AWS gradually expanded its services to cover almost all aspects of computing and data management. This includes launching databases (DynamoDB and RDS), content delivery (CloudFront), and networking (VPC).

3. AWS further differentiated itself by launching platform as a service (PaaS) offerings like Elastic Beanstalk for application deployment and Lambda for serverless computing.

4. AWS also evolved to offer Artificial Intelligence and Machine Learning services with the introduction of Amazon SageMaker, Recognition, Polly, and Lex.

5. In an effort to penetrate every layer of business technology, AWS moved into IoT services, launched Amazon Connect for call center a

# EMBEDDING

In [14]:
# 20x faster than pgvector: introducing pg_embedding extension for vector search in Postgres and LangChain
# https://neon.tech/blog/pg-embedding-extension-for-vector-search

#ADDED By FM 22/02/2024

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

# https://supabase.com/blog/fewer-dimensions-are-better-pgvector
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

collection_name='Paul Graham Essay'
connection_string = os.getenv("DATABASE_URL")

db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs0,
    collection_name=collection_name,
    connection_string=connection_string,
)

#db.create_hnsw_index(dims = 1536, m = 8, ef_construction = 16, ef_search = 16)

In [15]:
#ADDED By FM 22/02/2024
query='What did the author do growing up?'
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)

print()
print(query)
print()

for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)


What did the author do growing up?

--------------------------------------------------------------------------------
Score:  0.5986952
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fl

# Examples of Queries

In [16]:
#modify By FM 22/02/2024

#response = pack.run(query="What did the author do growing up?")
query0="What did the author do growing up?"
query='I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.'
query1 = "Who is the President of the USA?"
query2 = "Who won the baseball World Series in 2020? and Who Lost"
query3 = 'Anything about FORTRAN'
query4 = 'Anything about LIPS'
query5 = 'Anything about Python'

response0 = pack.run(query=query0)
response1 = pack.run(query=query1)
response2 = pack.run(query=query2)
response4 = pack.run(query=query4)

print()
print(query0)
print(str(response0))
print()

print()
print(query1)
print(str(response1))
print()

print()
print(query2)
print(str(response2))
print()

print()
print(query4)
print(str(response4))
print()


What did the author do growing up?
The author, growing up, worked on writing short stories and programming. They wrote short stories that lacked plot but focused on characters with strong feelings. In terms of programming, they started by writing programs on an IBM 1401 using an early version of Fortran. Later on, they transitioned to working with microcomputers, specifically a TRS-80, where they wrote simple games, a rocket prediction program, and a word processor.


Who is the President of the USA?
I am unable to provide the current President of the USA as it is not mentioned in the provided context information.


Who won the baseball World Series in 2020? and Who Lost
I cannot provide the answer to who won the baseball World Series in 2020 and who lost based on the context information provided.


Anything about LIPS
LISP, a programming language, is discussed in the provided context. The language is noted for its association with AI and its unique characteristics. It was initially i

In [17]:
#response.source_nodes