<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/PGEmbeddingEmbedding_T4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dependencies

In [None]:
#Install Libraries to access Google Drive and OpenAI resources.
%pip install colab-env --upgrade --quiet --root-user-action=ignore
%pip install openai==0.28  --root-user-action=ignore
%pip install langchain
%pip install "unstructured[all-docs]"
%pip install tiktoken
%pip install -q -U sentence-transformers

import colab_env

# Documents loader

Postgres with the pg_embedding extension as a vector store.

pg_embedding uses sequential scan by default. but you can create a HNSW index using the create_hnsw_index method.



# State of the Union

In [None]:
%pip install langchain
%pip install "unstructured[all-docs]"
%pip install tiktoken

## Loading Environment Variables
from typing import List, Tuple

from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding
#import getpass

%cd /content/
!git clone https://github.com/hwchase17/chat-your-data.git

from langchain.document_loaders import UnstructuredFileLoader

#loader = UnstructuredFileLoader("/content/chat-your-data/state_of_the_union.txt")
loader = TextLoader("/content/chat-your-data/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs0 = text_splitter.split_documents(documents)

collection_name0 = "state_of_the_union"
print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs0)}')

# AWS Documents

In [3]:
#!rm -rf /content/*.pdf
!mkdir -p /content/data/
%cd /content/data/

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

metadata = [
    dict(year=2022, source=filenames[0]),
    dict(year=2021, source=filenames[1]),
    dict(year=2020, source=filenames[2]),
    dict(year=2019, source=filenames[3])]

data_root = "/content/data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    #print(file_path)
    urlretrieve(url, file_path)

/content/data


In [4]:
from pypdf import PdfReader, PdfWriter
import glob

local_pdfs = glob.glob(data_root + '*.pdf')

for local_pdf in local_pdfs:
    pdf_reader = PdfReader(local_pdf)
    pdf_writer = PdfWriter()
    for pagenum in range(len(pdf_reader.pages)-3):
        page = pdf_reader.pages[pagenum]
        pdf_writer.add_page(page)

    with open(local_pdf, 'wb') as new_file:
        new_file.seek(0)
        pdf_writer.write(new_file)
        new_file.truncate()

In [5]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

#%cd /content/data

documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]

    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 512,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)

print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs)}')

# of Document Pages 25
# of Document Chunks: 299


# PostgreSQL and PG embedding with OpenAI

In [6]:
# install PSQL WITH DEV Libraries AND PG embedding
!apt install postgresql postgresql-contrib &>log
!service postgresql restart
!sudo apt install postgresql-server-dev-all

!cp -pr /content/gdrive/MyDrive/tools/pg_embedding /content/
%cd /content/pg_embedding/
print()
print('START: PG embedding COMPILATION')
!make
!make install # may need sudo
print('END: PG embedding COMPILATION')
print()

 * Restarting PostgreSQL 14 database server
   ...done.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  binfmt-support libffi-dev libpfm4 libz3-4 libz3-dev llvm-14 llvm-14-dev
  llvm-14-runtime llvm-14-tools postgresql-server-dev-14 python3-pygments
  python3-yaml
Suggested packages:
  llvm-14-doc python-pygments-doc ttf-bitstream-vera
The following NEW packages will be installed:
  binfmt-support libffi-dev libpfm4 libz3-4 libz3-dev llvm-14 llvm-14-dev
  llvm-14-runtime llvm-14-tools postgresql-server-dev-14
  postgresql-server-dev-all python3-pygments python3-yaml
0 upgraded, 13 newly installed, 0 to remove and 24 not upgraded.
Need to get 59.8 MB of archives.
After this operation, 361 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 python3-yaml amd64 5.4.1-1ubuntu1 [129 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 bi

In [7]:
import psycopg2 as ps

# PostGRES SQL Settings
%cd /content/
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres'"

#!sudo -u postgres psql -c "DROP EXTENSION embedding"
!sudo -u postgres psql -c "CREATE EXTENSION embedding"

!sudo -u postgres psql -c "DROP TABLE documents"
!sudo -u postgres psql -c "CREATE TABLE documents(id integer PRIMARY KEY, embedding real[])"

h="{0,1,2}"
hh= "INSERT INTO documents(id, embedding) VALUES (1,'%s'), (2,'{1,2,3}'),  (3,'{1,1,1}')"%h
print(hh)

def insert_document(id,embedding):
    #review_embedding=get_embedding(text)
    ### INSERT INTO DB
    DB_NAME = "postgres"
    DB_USER = "postgres"
    DB_PASS = "postgres"
    DB_HOST = "localhost"
    DB_PORT = "5432"
    conn = ps.connect(database=DB_NAME,
							user=DB_USER,
							password=DB_PASS,
							host=DB_HOST,
							port=DB_PORT)


    cur = conn.cursor() # creating a cursor

    cur.execute("""
        INSERT INTO documents
        (id, embedding)
        VALUES ('%s',
                '%s')""" % (id,embedding))

    conn.commit()
    print("INSERT EMBEDDING %s successfully"%embedding)
    conn.close()
    cur.close()


insert_document(1,'{0,1,2}')
insert_document(2,"{1,2,3}")
insert_document(3,"{1,1,1}")

!sudo -u postgres psql -c "CREATE INDEX ON documents USING hnsw(embedding) WITH (dims=3, m=3, efconstruction=5, efsearch=5)"
!sudo -u postgres psql -c "SET enable_seqscan = off"

ARRAY = [3, 3, 3]

def select_document(HNSW_index):
    DB_NAME = "postgres"
    DB_USER = "postgres"
    DB_PASS = "postgres"
    DB_HOST = "localhost"
    DB_PORT = "5432"
    conn = ps.connect(database=DB_NAME,
							user=DB_USER,
							password=DB_PASS,
							host=DB_HOST,
							port=DB_PORT)

    cur = conn.cursor() # creating a cursor

    cur.execute("""
    SELECT id FROM documents
    ORDER BY embedding %s ARRAY[%s,%s,%s] LIMIT 1
    """ % (HNSW_index,str(ARRAY[0]), str(ARRAY[1]), str(ARRAY[2])))

    conn.commit()
    print(cur.fetchone())
    #print("INSERT EMBEDDING %s successfully"%embedding)
    conn.close()
    cur.close()

# <->, <=>, and <~> operators define the distance metric, which calculates the distance between the query vector and each row of the dataset.
select_document('<->')
select_document('<=>')
select_document('<~>')

/content
ALTER ROLE
CREATE EXTENSION
ERROR:  table "documents" does not exist
CREATE TABLE
INSERT INTO documents(id, embedding) VALUES (1,'{0,1,2}'), (2,'{1,2,3}'),  (3,'{1,1,1}')
INSERT EMBEDDING {0,1,2} successfully
INSERT EMBEDDING {1,2,3} successfully
INSERT EMBEDDING {1,1,1} successfully
CREATE INDEX
SET
(2,)
(3,)
(2,)


In [8]:
%cd /content/

%pip install colab-env --upgrade --quiet --root-user-action=ignore
%pip install openai==0.28  --root-user-action=ignore
%pip install langchain

import colab_env
import os
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding
import openai

connection_string = os.getenv("DATABASE_URL")
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')
collection_name = "AWS"

from langchain.vectorstores import PGEmbedding

db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=collection_name,
    connection_string=connection_string,
)

/content


In [9]:
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

collection_name='AWS'

#Install Libraries to access Google Drive and OpenAI resources.
#%pip install colab-env --upgrade --quiet --root-user-action=ignore
#%pip install openai==0.28  --root-user-action=ignore
#%pip install langchain
#%pip install "unstructured[all-docs]"
#%pip install tiktoken
#!pip install -q -U sentence-transformers

from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding

#%pip install colab-env
import colab_env
#import openai

connection_string = os.getenv("DATABASE_URL")
#openai.api_key = os.getenv("OPENAI_API_KEY")

#!pip install tiktoken
%cd /content/

# https://supabase.com/blog/fewer-dimensions-are-better-pgvector
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

#https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=collection_name,
    connection_string=connection_string,
)

#del query
#query = "What did the president say about Ketanji Brown Jackson"
query = "How has AWS evolved?"
print(query)
print()

results_with_scores = db.similarity_search_with_score(query)
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\nScore: {score}\n\n")

/content
How has AWS evolved?

Content: customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
Metadata: {'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}
Score: 0.52011997


Content: customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
Metadata: {'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}
Score: 0.5201462


Content: in AWS. Our new customer pipeline is robust, as are our active migrations. Many companies usediscontinuous periods like this to step back and determine what they strategically want to change, and wefind an increasing number of enterprises opting out of managing their own infrastructure, and preferring tomove to AWS to enjoy the agility, innovation, cost-e

In [10]:
filter={"year": 2022}

results_with_scores = db.similarity_search_with_score(query,filter=filter)

for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\nScore: {score}\n\n")

Content: in AWS. Our new customer pipeline is robust, as are our active migrations. Many companies usediscontinuous periods like this to step back and determine what they strategically want to change, and wefind an increasing number of enterprises opting out of managing their own infrastructure, and preferring tomove to AWS to enjoy the agility, innovation, cost-efficiency, and security benefits. And most importantlyfor customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and
Metadata: {'year': 2022, 'source': 'AMZN-2022-Shareholder-Letter.pdf'}
Score: 0.5205088


Content: in AWS. Our new customer pipeline is robust, as are our active migrations. Many companies usediscontinuous periods like this to step back and determine what they strategically want to change, and wefind an increasing number of enterprises opting out of managing their own infrastructure, and preferring tomove to AWS to enjoy the agility, innovation, cost-efficiency, and security benefi

In [11]:
db = PGEmbedding.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=collection_name,
    connection_string=connection_string,
    pre_delete_collection=False,
)

# https://github.com/langchain-ai/langchain/issues/10454

import sqlalchemy

dims=1536
m=8,
ef_construction=16,
ef_search=16

create_index_query = sqlalchemy.text(
        "CREATE INDEX IF NOT EXISTS langchain_pg_embedding_idx "
        "ON langchain_pg_embedding USING hnsw (embedding) "
        "WITH ("
        "dims = {}, "
        "m = {}, "
        "efconstruction = {}, "
        "efsearch = {}"
        ");".format(dims, m, ef_construction, ef_search)
    )

In [12]:
!sudo -u postgres psql -c "CREATE INDEX ON documents USING hnsw(embedding) WITH (dims=3, m=8, efconstruction=16, efsearch=16)"

CREATE INDEX


In [13]:
store = PGEmbedding(
    connection_string=connection_string,
    embedding_function=embeddings,
    collection_name=collection_name,
)

retriever = store.as_retriever()
retriever


db1 = PGEmbedding.from_existing_index(
    embedding=embeddings,
    collection_name=collection_name,
    pre_delete_collection=False,
    connection_string=connection_string,
)

#del query
#query = "What did the president say about Ketanji Brown Jackson"
#query = "What did the president say about AWS"
#query = "How has AWS evolved?"
#query = "Amazon inventions"

docs_with_score: List[Tuple[Document, float]] = db1.similarity_search_with_score(query)

print(query)
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)
#VectorStoreRetriever(vectorstore=<langchain.vectorstores.pghnsw.HNSWVectoreStore object at 0x121d3c8b0>, search_type='similarity', search_kwargs={})

How has AWS evolved?
--------------------------------------------------------------------------------
Score:  0.52011997
customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score:  0.5201462
customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score:  0.5201462
customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), b

In [14]:
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

print(connection_string)
engine = create_engine(os.getenv("DATABASE_URL"))
#!ls /usr/share/postgresql/14/extension/*control*

postgresql://postgres:postgres@localhost:5432/postgres


In [15]:
# https://towardsdatascience.com/4-ways-of-question-answering-in-langchain-188c6707cc5a

from langchain.chains import RetrievalQA
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

from langchain.llms import OpenAI

# load document
#from langchain.document_loaders import PyPDFLoader
#loader = PyPDFLoader("materials/example.pdf")
#documents = loader.load()

# split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# select which embeddings we want to use
embeddings = OpenAIEmbeddings()

# create the vectorestore to use as the index
#db = Chroma.from_documents(texts, embeddings)

# expose this index in a retriever interface
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":2})
print(retriever)

#del qa
# create a chain to answer questions
qa = RetrievalQA.from_chain_type(
     llm=OpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True)


query = "How AWS has evolved?"
#query = "How many AI publications in 2022?"
result = qa({"query": query})
print()
print(result['result'])
print()
#print(result['source_documents'])

tags=['PGEmbedding', 'OpenAIEmbeddings'] vectorstore=<langchain_community.vectorstores.pgembedding.PGEmbedding object at 0x7d399ed65de0> search_kwargs={'k': 2}

 It has evolved by offering much more functionality than other platforms and becoming a game-changing offering.



# LLM Definitions with Mistral-7B

In [16]:
#https://platform.openai.com/docs/guides/text-generation

!pip install gradio --quiet
!pip install xformer --quiet
!pip install chromadb --quiet
!pip install langchain --quiet
!pip install accelerate --quiet
!pip install transformers --quiet
!pip install bitsandbytes --quiet
!pip install unstructured --quiet
!pip install sentence-transformers --quiet
!pip install pypdf

%pip install openai==0.28  --root-user-action=ignore
%pip install tiktoken

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m305.1/305.1 kB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.9/381.9 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.3/60.3 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 kB[0m [31m14.4

In [17]:
import torch
from textwrap import fill
from IPython.display import Markdown, display

from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    )

from langchain import PromptTemplate
from langchain import HuggingFacePipeline

from langchain.vectorstores import Chroma
from langchain.schema import AIMessage, HumanMessage
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredMarkdownLoader, UnstructuredURLLoader
from langchain.chains import LLMChain, SimpleSequentialChain, RetrievalQA, ConversationalRetrievalChain
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
import warnings
warnings.filterwarnings('ignore')

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=quantization_config
)

generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.8
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
)

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [18]:
llm = HuggingFacePipeline(pipeline=pipeline,)

# Load chain from chain type

In [19]:
from langchain.llms import OpenAI
#import colab_env

from langchain.chains import RetrievalQA
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":2})

# create a chain to answer questions
#qa = RetrievalQA.from_chain_type(
#     llm=OpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True)

qa = RetrievalQA.from_chain_type(
     llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

query = "How AWS has evolved?"
#query = "How many AI publications in 2022?"
result = llm(query)

display(Markdown(f"<b>{query}</b>"))
display(Markdown(f"<p>{result}</p>"))

print()
print('chain to answer questions')
print("-" * 80)
result = qa({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<b>How AWS has evolved?</b>

<p>
Answer: Q: AWS is a cloud computing platform that was launched in 2006. Since then, it has grown exponentially and become one of the largest and most successful technology companies in the world. AWS provides a wide range of services for developers, businesses, and individuals to store, process, analyze, and share data on a global scale. In recent years, AWS has continued to innovate by launching new products and services, expanding into new markets, and partnering with other companies to provide even more value to its customers.</p>


chain to answer questions
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Query: How AWS has evolved?

Result:  AWS has evolved from providing customers with much more functionality in the cloud than they could find elsewhere, to offering a much more game-changing service today.

Context Documents: 
page_content='customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.' metadata={'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}

page_content='customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.' metadata={'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}



In [20]:
query = "Why is Amazon successful?"
result = llm(query)

display(Markdown(f"<b>{query}</b>"))
display(Markdown(f"<p>{result}</p>"))

print()
print('chain to answer questions')
print("-" * 80)
result = qa({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<b>Why is Amazon successful?</b>

<p>
Answer: Question: What are the biggest challenges facing Amazon today?</p>


chain to answer questions
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Query: Why is Amazon successful?

Result:  Amazon is successful because it constantly adapts to changes in the market by challenging itself internally, taking calculated risks, and making strategic investments in new technologies and business models.

Context Documents: 
page_content='shareholders, and employees.\nWhile there were an unusual number of simultaneous challenges this past year, the reality is that if you\noperate in large, dynamic, global market segments with many capable and well-funded competitors (theconditions in which Amazon operates all of its businesses), conditions rarely stay stagnant for long.\nIn the 25 years I’ve been at Amazon, there has been constant change, much of which we’ve initiated ourselves.' metadata={'year': 2022, 'source': 'AMZN-2022-Shareholder-Letter.pdf'}

page_content='shareholders, and employees.\nWhile there were an unusual number of simultaneous challenges this past year, the reality is that if you\noperate in large, dynamic, global market se

In [21]:
query = "What business challenges has Amazon experienced?"
result = llm(query)

display(Markdown(f"<b>{query}</b>"))
display(Markdown(f"<p>{result}</p>"))

print()
print('chain to answer questions')
print("-" * 80)
result = qa({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<b>What business challenges has Amazon experienced?</b>

<p>
Answer: 1. Price War: Amazon has faced intense competition from other e-commerce giants like Walmart, eBay, and Alibaba. The price war has put a strain on the company's profit margins.
2. Antitrust Issues: In recent years, Amazon has faced antitrust investigations in various countries, including the US, UK, and EU. These investigations have raised concerns about the company's dominance in the online retail market.
3. Data Privacy Concerns: Amazon has been criticized for its data collection practices and has faced regulatory scrutiny over data privacy issues.
4. Logistics Challenges: Managing a global supply chain is no easy task, especially when dealing with millions of products. Amazon has struggled to maintain efficient logistics operations while also keeping up with demand.
5. Human Resource Management: As the company grows rapidly, managing human resources becomes more challenging. Amazon has had to deal with labor disputes and union organizing efforts in several locations.</p>


chain to answer questions
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Query: What business challenges has Amazon experienced?

Result:  In my experience working at Amazon for over 10 years, I can recall numerous challenges that the company has faced. Some of these include increased competition from established companies such as Walmart and eBay, regulatory issues related to antitrust laws, difficulties with managing a rapidly growing workforce, and navigating complex international markets. Additionally, Amazon has had to adapt to changing consumer behavior and preferences, including the rise of mobile commerce and the shift towards online shopping. Despite these challenges, Amazon has continued to innovate and expand its operations, making it one of the most successful companies in the world today.

Context Documents: 
page_content='shareholders, and employees.\nWhile there were an unusual number of simultaneous challenges this past year, the reality is that if you\noperate in large, dynamic, global market segments with many capable and well-funded compe