# Pincone Embeddings Retriever

Here we will develop a retriever that will perform a similarity search between our query and vectors within our vector database.

Pinecone holds all of your embeddings in the "cloud". In order to find content that will add context to our query's answer, we will search this database for documents that are similar to the query. 

A similarity search is performed finding the closest vectors to the query vector within the n-dimensional space.

First we need to initialize a connection with Pinecone.

In [10]:
from typing import List, Iterator
import pandas as pd
import numpy as np
import os
import wget
from ast import literal_eval


In [22]:
api_key = os.getenv("PINECONE_API_KEY")

We have a Pinecone index that already has our embeddings. We will need  to create an connection to that database in order to query it. 

In [25]:
# Initialize connection
import dotenv
dotenv.load_dotenv()

from pinecone import Pinecone

# configure client
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

In [26]:
from pinecone import ServerlessSpec

cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'

spec = ServerlessSpec(cloud=cloud, region=region)

In [27]:
index_name = "rag-retriever-v1"

In [28]:
# check if index already exists
if index_name not in pc.list_indexes().names():
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=768,
        metric="cosine",
        spec=spec,
    )
#connect to index   
index = pc.Index(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 127}},
 'total_vector_count': 127}

We now have a connection to our index `rag-retriever-v1`
It already has 127 vectors stored within it so we will use it to store our query embedding and perform similarity search

In [37]:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(model_name_or_path="all-mpnet-base-v2",
                                      device="cpu") # choose the device to load model to

# Notes: this will embed using local computing power. Learn more about the benefits (if any)
# of computing in the cloud

query = "this is a single sentence"
res = embedding_model.encode(query)

# retrieve from Pinecone
xq = res.tolist()

# get relevant contexts (including the questions)
res = index.query(vector=xq, top_k=2)

In [38]:
res

{'matches': [{'id': '122', 'score': 0.257955194, 'values': []},
             {'id': '126', 'score': 0.189704508, 'values': []}],
 'namespace': '',
 'usage': {'read_units': 5}}