In [15]:
'''
Steps to follow :
1. Prompt llm to create multiple similar queries (q)
2. for each of the q queries get k similar docs from the data
3. get a set(kXq) docs 
4. Retrieve top k from the total set using bert cross encoder
5. Re-order highest ranked in the in beginning and end and least ranked in middle to solve lost in the middle problem.
6. use these top k for RAG context 
'''

'\nSteps to follow :\n1. Prompt llm to create multiple similar queries (q)\n2. for each of the q queries get k similar docs from the data\n3. get a set(kXq) docs \n4. Retrieve top k from the total set using bert cross encoder\n5. Re-order highest ranked in the in beginning and end and least ranked in middle to solve lost in the middle problem.\n6. use these top k for RAG context \n'

In [2]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [4]:
embeddings = OpenAIEmbeddings()

loader = PyPDFLoader("/Users/sarthak/Documents/llm_projects/openai_simple_operations/serverless-core.pdf")
docs = loader.load()

In [5]:
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

Create multiple similar queries passing to llms

In [6]:
vector = FAISS.from_documents(documents, embeddings)
llm = ChatOpenAI()
prompt_multiquery = ChatPromptTemplate.from_template("""Create  multiple queries similar to the input query given number of queries to create (n) :
query : {query}
n: {n}""")

In [7]:
retriever = vector.as_retriever()
#setup_and_retrieval = RunnableParallel({"n":RunnablePassthrough(),"input_statement":RunnablePassthrough()})
output_parser = StrOutputParser()
chain = prompt_multiquery | llm | output_parser



In [8]:
multiple_queries = chain.invoke({"query":"can you design a image classification system using all the above aws services, mention the steps required for it.","n":5}).split('\n')

In [9]:
multiquery_ss_results = []
for query in multiple_queries : 
    query_results =  vector.similarity_search(query)
    multiquery_ss_results = multiquery_ss_results + [result.page_content for result in query_results ]
multiquery_ss_results = list(set(multiquery_ss_results))
    

In [10]:
multiquery_ss_results

['Serverless Developer Guide\nPicking up serverless prerequisites\nRevised: 2023-03-13\nBefore you dive in to code, there are some key concepts you need to understand:\n•Amazon Web Services Account\n•Programming Language\n•Development Environment\n•Cloud Infrastructure\n•Security Model\nReview the serverless learning path in the following diagram.\nTopics are shown in orange pill boxes. Large topics may be broken down into several sub-topics in a \nblue bubble. Icons represent related services or tools. Essential topics are noted with a green check \nbox. Important, but not essential, items are noted with a red heart. When a high level orange topic \nis marked as essential, that means all of the sub-topics are essential too.\nThis map is a starting point. Not everyone will agree which topics are essential or important, so \ncheck with your team. Find out which topics are essential or important for your journey.\n9',
 'Serverless Developer Guide\n•AWS Lambda for compute processing tasks

Re-rank multiqueries using bert cross-encoder

In [16]:
from  sentence_transformers.cross_encoder import CrossEncoder
model = CrossEncoder("cross-encoder/stsb-distilroberta-base")
input_query = "can you design a image classification system using all the above aws services, mention the steps required for it."
score_query_pair = [[q,model.predict([[q,input_query]][0])] for q in multiquery_ss_results]

In [20]:
score_query_pair.sort(key=lambda x : x[1],reverse=True)

In [21]:
score_query_pair

[['Serverless Developer Guide\n•Image identiﬁcation  — In the previous photo sharing application concept, imagine you want to \nprovide automatic categorization of images for your users. Images will be queued for processing \nby Amazon Rekognition. After analysis, faces are detected and your app can use similarity scores \nto group photos by family members. Objects, scenes, activities, landmarks, and dominant colors \nare detected and labels are applied to improve categorization and search.\nServices you’ll likely use:\n•AWS Lambda for compute processing tasks\n•AWS Step Functions for managing and orchestrating microservice workﬂows\n•Amazon Simple Notiﬁcation Service - for message delivery from publishers to subscribers, \nplus fan out  which is when a message published to a topic is replicated and pushed to multiple \nendpoints for parallel asynchronous processing\n•Amazon Simple Queue Service - for creating secure, durable, and available queues for \nasynchronous processing\n•Amazon

Limp re-ranking

In [22]:
def limp_rerank(sorted_list):
    beginning_list = [sorted_list[i] for i in range(0,len(sorted_list),2)]
    end_list = [sorted_list[i] for i in range(1,len(sorted_list),2)][::-1]
    return  beginning_list + end_list

In [25]:
query_reranked = limp_rerank(score_query_pair)

In [27]:
query_reranked = [q[0] for q in query_reranked]

In [28]:
query_reranked

['Serverless Developer Guide\n•Image identiﬁcation  — In the previous photo sharing application concept, imagine you want to \nprovide automatic categorization of images for your users. Images will be queued for processing \nby Amazon Rekognition. After analysis, faces are detected and your app can use similarity scores \nto group photos by family members. Objects, scenes, activities, landmarks, and dominant colors \nare detected and labels are applied to improve categorization and search.\nServices you’ll likely use:\n•AWS Lambda for compute processing tasks\n•AWS Step Functions for managing and orchestrating microservice workﬂows\n•Amazon Simple Notiﬁcation Service - for message delivery from publishers to subscribers, \nplus fan out  which is when a message published to a topic is replicated and pushed to multiple \nendpoints for parallel asynchronous processing\n•Amazon Simple Queue Service - for creating secure, durable, and available queues for \nasynchronous processing\n•Amazon 

In [29]:
context = ('\n').join(query_reranked)

In [31]:
llm = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

In [33]:
output_parser = StrOutputParser()
chain = prompt | llm | output_parser
output = chain.invoke({"input":"can you design a image classification system using all the above aws services, mention the steps required for it.","context":context})

In [34]:
print (output)

To design an image classification system using the mentioned AWS services, follow these steps:

1. Set up an Amazon Web Services account if you don't already have one.
2. Use AWS Lambda for compute processing tasks to create functions that will process the images for classification.
3. Utilize AWS Step Functions for managing and orchestrating microservice workflows to control the flow of image processing tasks.
4. Configure Amazon Simple Notification Service for message delivery from publishers to subscribers, including fan-out for parallel asynchronous processing.
5. Use Amazon Simple Queue Service for creating secure, durable, and available queues for asynchronous processing of images.
6. Store and retrieve data and files using Amazon DynamoDB and Amazon S3.
7. Configure Amazon Rekognition to analyze the images and detect faces, objects, scenes, activities, landmarks, and dominant colors.
8. Group photos by family members using similarity scores provided by Amazon Rekognition.
9. App