[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/search/semantic-search/semantic-search-fast.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/search/semantic-search/semantic-search-fast.ipynb)

# Semantic Search (Fast)

In this walkthrough we will see how to use Pinecone for semantic search. To begin we must install the required prerequisite libraries:

In [1]:
!pip install -qU pinecone-client[grpc] pinecone-datasets sentence-transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.2/177.2 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m283.7/283.7 kB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.9/34.9 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.4/16.4 MB[0m [31m81.8 MB/s[0m e

## Data Preprocessing

The dataset preparation process requires a few steps:

1. We download the Quora dataset from Hugging Face Datasets.

2. The text content of the dataset is embedded into vectors.

3. We reformat into a `(id, vector, metadata)` structure to be added to Pinecone.

In this notebook we will skip these three steps as they can be very time consuming and jump straight into it with the prebuilt dataset from *Pinecone Datasets*. If you'd rather see how it's all done, please refer to [this notebook](https://github.com/pinecone-io/examples/blob/master/search/semantic-search/semantic-search.ipynb).

Let's go ahead and download the dataset.

In [2]:
from pinecone_datasets import load_dataset

dataset = load_dataset('quora_all-MiniLM-L6-bm25')
# we drop sparse_values as they are not needed for this example
dataset.documents.drop(['sparse_values', 'metadata'], axis=1, inplace=True)
dataset.documents.rename(columns={'blob': 'metadata'}, inplace=True)
dataset.head()

Unnamed: 0,id,values,metadata
0,1,"[0.06814987, -0.039664183, -0.06096721, 0.0074...",{'text': ' What is the step by step guide to i...
1,2,"[0.08983771, -0.03493085, -0.057357617, 0.0222...",{'text': ' What is the step by step guide to i...
2,3,"[-0.046798065, 0.1551149, -0.03920019, 0.04878...",{'text': ' What is the story of Kohinoor (Koh-...
3,4,"[-0.077349104, 0.14786911, -0.0128817065, -0.0...",{'text': ' What would happen if the Indian gov...
4,5,"[-0.028324936, 0.037209604, -0.00040033547, 0....",{'text': ' How can I increase the speed of my ...


## Creating an Index

Now the data is ready, we can set up our index to store it.

We begin by initializing our connection to Pinecone. To do this we need a [free API key](https://app.pinecone.io).

In [3]:
import os
import pinecone

# get api key from app.pinecone.io
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'
# find your environment next to the api key in pinecone console
PINECONE_ENV = os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENVIRONMENT'

pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENV
)

  from tqdm.autonotebook import tqdm


Now we create a new index called `semantic-search-fast`. It's important that we align the index `dimension` and `metric` parameters with those required by the `MiniLM-L6` model.

In [4]:
index_name = 'semantic-search-fast'

# only create index if it doesn't exist
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=len(dataset.documents.iloc[0]['values']),
        metric='cosine'
    )

# now connect to the index
index = pinecone.GRPCIndex(index_name)

Upsert the data:

In [6]:
index.upsert_from_dataframe(dataset.documents)

sending upsert requests:   0%|          | 0/522931 [00:00<?, ?it/s]

collecting async responses:   0%|          | 0/1046 [00:00<?, ?it/s]

upserted_count: 522931

## Making Queries

Now that our index is populated we can begin making queries. We are performing a semantic search for *similar questions*, so we should embed and search with another question. Let's begin.

In [11]:
from sentence_transformers import SentenceTransformer
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
model

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

In [12]:
query = "which city has the highest population in the world?"

# create the query vector
xq = model.encode(query).tolist()

# now query
xc = index.query(xq, top_k=3, include_metadata=True)
xc

{'matches': [{'id': '210880',
              'metadata': {'text': ' Which is the most populated city in the '
                                   'world.?'},
              'score': 0.88284826,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': '197238',
              'metadata': {'text': ' What is the most populated city in the '
                                   'world?'},
              'score': 0.8788713,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': '239582',
              'metadata': {'text': ' Which is the largest city in the world?'},
              'score': 0.81348896,
              'sparse_values': {'indices': [], 'values': []},
              'values': []}],
 'namespace': ''}

In the returned response `xc` we can see the most relevant questions to our particular query. We can reformat this response to be a little easier to read:

In [13]:
for result in xc['matches']:
    print(f"{round(result['score'], 2)}: {result['metadata']['text']}")

0.88:  Which is the most populated city in the world.?
0.88:  What is the most populated city in the world?
0.81:  Which is the largest city in the world?


These are clearly very relevant results. All of these questions either share the exact same meaning as our question, or are related. We can make this harder by using more complicated language, but as long as the "meaning" behind our query remains the same, we should see similar results.

In [14]:
query = "which urban locations have the highest concentration of homo sapiens?"

# create the query vector
xq = model.encode(query).tolist()

# now query
xc = index.query(xq, top_k=3, include_metadata=True)
for result in xc['matches']:
    print(f"{round(result['score'], 2)}: {result['metadata']['text']}")

0.62:  Which is the most populated city in the world.?
0.62:  What is the most populated city in the world?
0.61:  What are the world's most advanced cities?


Here we used *very different* language with completely different terms in our query than that of the returned documents. We substituted **"city"** for **"urban location"** and **"populated"** for **"concentration of homo sapiens"**.

Despite these very different terms and *lack* of term overlap between query and returned documents — we get highly relevant results — this is the power of *semantic search*.

You can go ahead and ask more questions above. When you're done, delete the index to save resources:

In [15]:
pinecone.delete_index(index_name)

---