[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/semantic-search.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/semantic-search.ipynb)

# Semantic Search  using FAISS - Facebook AI Similarity Search

In this walkthrough we will see how to use FAISS for similarity search. To begin we must install the required prerequisite libraries:

In [1]:
!pip install -qU faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import os
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

[link text](https://)
## Import the Document into FAISS

In [3]:
!pip install -qU langchain langchain_community langchain_openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/2.5 MB[0m [31m13.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m39.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.2/54.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m51.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS


In [6]:
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('/content/sotu_obama.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=500,separator='\n',chunk_overlap=50)
doc_chunks = text_splitter.split_documents(raw_documents)



In [7]:
len(doc_chunks)

76

In [8]:
for chunk in doc_chunks:
  print(chunk)

page_content='Mr. Speaker, Mr. Vice President, Members of Congress, my fellow Americans:
Tonight marks the eighth year that I’ve come here to report on the State of the Union. And for this final one, I’m going to try to make it a little shorter. (Applause.) I know some of you are antsy to get back to Iowa. (Laughter.) I've been there. I'll be shaking hands afterwards if you want some tips. (Laughter.)' metadata={'source': '/content/sotu_obama.txt'}
page_content='And I understand that because it’s an election season, expectations for what we will achieve this year are low. But, Mr. Speaker, I appreciate the constructive approach that you and the other leaders took at the end of last year to pass a budget and make tax cuts permanent for working families. So I hope we can work together this year on some bipartisan priorities like criminal justice reform -- (applause) -- and helping people who are battling prescription drug abuse and heroin abuse. (Applause.) So, who knows, we might surpri

In [9]:
db = FAISS.from_documents(doc_chunks, OpenAIEmbeddings())

In [10]:
len(doc_chunks)

76

# Similarity Search

In [11]:
import pprint

In [14]:
query = "Summarize was obama said about north korea"
docs = db.similarity_search(query, k=2)
#pprint.pprint(docs[0].page_content)
#print(docs[1].page_content)
docs

[Document(id='5a06e77f-8df0-4562-8dff-b8bbddfabf17', metadata={'source': '/content/sotu_obama.txt'}, page_content="That's American strength. That's American leadership. And that kind of leadership depends on the power of our example. That’s why I will keep working to shut down the prison at Guantanamo. (Applause.) It is expensive, it is unnecessary, and it only serves as a recruitment brochure for our enemies. (Applause.) There’s a better way. (Applause.)"),
 Document(id='94a99f97-65c4-4ea7-89ca-afe1e5ad918f', metadata={'source': '/content/sotu_obama.txt'}, page_content='It is not easy. Our brand of democracy is hard. But I can promise that a little over a year from now, when I no longer hold this office, I will be right there with you as a citizen, inspired by those voices of fairness and vision, of grit and good humor and kindness that helped America travel so far. Voices that help us see ourselves not, first and foremost, as black or white, or Asian or Latino, not as gay or straight

[link text](https://)
# Similarity search by vector

In [15]:
embedding_vector = OpenAIEmbeddings().embed_query(query)
print(embedding_vector)
docs = db.similarity_search_by_vector(embedding_vector, k=3)
docs

[-0.03763212636113167, -0.01612219400703907, 0.009633660316467285, -0.012867670506238937, -0.007370536215603352, 0.001234975061379373, -0.04053111374378204, 0.006700487341731787, -0.02250816859304905, -0.004187804646790028, 0.03858933970332146, 0.016313636675477028, 0.012601018883287907, -0.0094353798776865, 0.0010332766687497497, -0.01223180815577507, 0.029782982543110847, -0.047915320843458176, 0.006874836981296539, -0.020415974780917168, 0.003408360294997692, -0.008081608451902866, -0.003090428886935115, -0.023410683497786522, 0.0009016600088216364, -0.011719015426933765, 0.004680085461586714, -0.029618889093399048, -0.0005328767583705485, -0.011760039255023003, 0.026501111686229706, -0.015766657888889313, -0.0030562428291887045, 0.002890439936891198, -0.029153957962989807, -0.004273270256817341, -0.011971993371844292, -0.02164667658507824, 0.012484786100685596, -0.025762690231204033, 0.0030579520389437675, -0.017599035054445267, 0.02201588824391365, -0.016245262697339058, -0.007958

[Document(id='5a06e77f-8df0-4562-8dff-b8bbddfabf17', metadata={'source': '/content/sotu_obama.txt'}, page_content="That's American strength. That's American leadership. And that kind of leadership depends on the power of our example. That’s why I will keep working to shut down the prison at Guantanamo. (Applause.) It is expensive, it is unnecessary, and it only serves as a recruitment brochure for our enemies. (Applause.) There’s a better way. (Applause.)"),
 Document(id='94a99f97-65c4-4ea7-89ca-afe1e5ad918f', metadata={'source': '/content/sotu_obama.txt'}, page_content='It is not easy. Our brand of democracy is hard. But I can promise that a little over a year from now, when I no longer hold this office, I will be right there with you as a citizen, inspired by those voices of fairness and vision, of grit and good humor and kindness that helped America travel so far. Voices that help us see ourselves not, first and foremost, as black or white, or Asian or Latino, not as gay or straight