## What is Pinecone?
Pinecone is a managed vector database that lets you efficiently store and search through high-dimensional vector embeddings. It's particularly useful for applications like semantic search, recommendation systems, and conversational AI.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY')

In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embedding_model = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
embedding_model

  from .autonotebook import tqdm as notebook_tqdm


GoogleGenerativeAIEmbeddings(client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x00000251F7E6C9E0>, model='models/embedding-001', task_type=None, google_api_key=None, credentials=None, client_options=None, transport=None, request_options=None)

In [5]:
vectors = embedding_model.embed_query("DataSciLearn")
len(vectors)

768

In [6]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [7]:
loader = PyPDFLoader("RAG.pdf")
data = loader.load()

In [8]:
data

[Document(metadata={'source': 'RAG.pdf', 'page': 0}, page_content='Adaptive-RAG: Learning to Adapt Retrieval-Augmented\nLarge Language Models through Question Complexity\nSoyeong Jeong1Jinheon Baek2Sukmin Cho1Sung Ju Hwang1,2Jong C. Park1*\nSchool of Computing1Graduate School of AI2\nKorea Advanced Institute of Science and Technology1,2\n{starsuzi,jinheon.baek,nelllpic,sjhwang82,jongpark}@kaist.ac.kr\nAbstract\nRetrieval-Augmented Large Language Models\n(LLMs), which incorporate the non-parametric\nknowledge from external knowledge bases into\nLLMs, have emerged as a promising approach\nto enhancing response accuracy in several tasks,\nsuch as Question-Answering (QA). However,\neven though there are various approaches deal-\ning with queries of different complexities, they\neither handle simple queries with unnecessary\ncomputational overhead or fail to adequately\naddress complex multi-step queries; yet, not\nall user requests fall into only one of the sim-\nple or complex categories.

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=30)
text_chunks = text_splitter.split_documents(data)
text_chunks

[Document(metadata={'source': 'RAG.pdf', 'page': 0}, page_content='Adaptive-RAG: Learning to Adapt Retrieval-Augmented\nLarge Language Models through Question Complexity\nSoyeong Jeong1Jinheon Baek2Sukmin Cho1Sung Ju Hwang1,2Jong C. Park1*\nSchool of Computing1Graduate School of AI2\nKorea Advanced Institute of Science and Technology1,2\n{starsuzi,jinheon.baek,nelllpic,sjhwang82,jongpark}@kaist.ac.kr\nAbstract\nRetrieval-Augmented Large Language Models\n(LLMs), which incorporate the non-parametric\nknowledge from external knowledge bases into'),
 Document(metadata={'source': 'RAG.pdf', 'page': 0}, page_content='LLMs, have emerged as a promising approach\nto enhancing response accuracy in several tasks,\nsuch as Question-Answering (QA). However,\neven though there are various approaches deal-\ning with queries of different complexities, they\neither handle simple queries with unnecessary\ncomputational overhead or fail to adequately\naddress complex multi-step queries; yet, not\nall use

In [13]:
text_chunks[1]

Document(metadata={'source': 'RAG.pdf', 'page': 0}, page_content='LLMs, have emerged as a promising approach\nto enhancing response accuracy in several tasks,\nsuch as Question-Answering (QA). However,\neven though there are various approaches deal-\ning with queries of different complexities, they\neither handle simple queries with unnecessary\ncomputational overhead or fail to adequately\naddress complex multi-step queries; yet, not\nall user requests fall into only one of the sim-\nple or complex categories. In this work, we')

In [15]:
os.environ['PINECONE_API_KEY'] = os.getenv('PINECONE_API_KEY')

In [45]:
index_name = "test"

In [46]:
from langchain_pinecone import PineconeVectorStore
vectorstore_from_docs = PineconeVectorStore.from_documents(
        text_chunks,
        index_name=index_name,
        embedding=embedding_model
    )


In [47]:
query = "What is single-step approach"
vectorstore_from_docs.similarity_search(query, k=2)

[Document(metadata={'page': 6.0, 'source': 'RAG.pdf'}, page_content='sample and annotate 400 queries from 6 datasets\nbased on its inductive bias (single-hop for one-step\napproach and multi-hop for multi-step). In addition,\nwe use predicted outcomes of three different strate-\ngies over 400 queries sampled from each dataset.\nNote that those queries used for classifier training\ndo not overlap with the testing queries for QA.\n5 Experimental Results and Analyses\nIn this section, we show the overall experimental\nresults and offer in-depth analyses of our method.'),
 Document(metadata={'page': 1.0, 'source': 'RAG.pdf'}, page_content='a query, this single-step approach retrieves relevant documents and then generates an answer. However, it may not be sufficient\nfor complex queries that require multi-step reasoning. (B) This multi-step approach iteratively retrieves documents and generates\nintermediate answers, which is powerful yet largely inefficient for the simple query since it re

In [39]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model='gemini-1.5-flash')
qa_chain = RetrievalQA.from_chain_type(llm=llm , chain_type="stuff",retriever = vectorstore_from_docs.as_retriever())



In [48]:
query = "What is single-step approach"

In [49]:
qa_chain.run(query)

"The single-step approach is a method for answering queries that involves retrieving relevant documents and then generating an answer in a single step. This approach is suitable for simple queries that don't require complex reasoning. \n"