# Semantic Search - Wikipedia Question-Answer-Retrieval

This examples demonstrates the setup for Question-Answer-Retrieval.

You can input a query or a question. The script then uses semantic search
to find relevant passages in Simple English Wikipedia (as it is smaller and fits better in RAM).

As model, we use: nq-distilbert-base-v1

It was trained on the Natural Questions dataset, a dataset with real questions from Google Search together with annotated data from Wikipedia providing the answer. For the passages, we encode the Wikipedia article tile together with the individual text passages.

In [43]:
!pip install -U sentence-transformers



In [44]:


if not torch.cuda.is_available():
  print("Warning: No GPU found. Please add GPU to your notebook")


# We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
top_k = 5  # Number of passages we want to retrieve with the bi-encoder

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only
# about 170k articles. We split these articles into paragraphs and encode them with the bi-encoder

wikipedia_filepath = 'data/simplewiki-2020-11-01.jsonl.gz'

if not os.path.exists(wikipedia_filepath):
    util.http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

passages = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())
        for paragraph in data['paragraphs']:
            # We encode the passages as [title, text]
            passages.append([data['title'], paragraph])

# If you like, you can also limit the number of passages you want to use
print("Passages:", len(passages))

# To speed things up, pre-computed embeddings are downloaded.
# The provided file encoded the passages with the model 'nq-distilbert-base-v1'
if model_name == 'nq-distilbert-base-v1':
    embeddings_filepath = 'simplewiki-2020-11-01-nq-distilbert-base-v1.pt'
    if not os.path.exists(embeddings_filepath):
        util.http_get('http://sbert.net/datasets/simplewiki-2020-11-01-nq-distilbert-base-v1.pt', embeddings_filepath)

    corpus_embeddings = torch.load(embeddings_filepath)
    corpus_embeddings = corpus_embeddings.float()  # Convert embedding file to float
    if torch.cuda.is_available():
        corpus_embeddings = corpus_embeddings.to('cuda')
else:  # Here, we compute the corpus_embeddings from scratch (which can take a while depending on the GPU)
    corpus_embeddings = bi_encoder.encode(passages, convert_to_tensor=True, show_progress_bar=True)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/540 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/554 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  0%|          | 0.00/50.2M [00:00<?, ?B/s]

Passages: 509663


  0%|          | 0.00/783M [00:00<?, ?B/s]

In [45]:
import json
from sentence_transformers import SentenceTransformer, CrossEncoder, util
import time
import gzip
import os
import torch
!pip install datasets

from datasets import load_dataset
ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')



In [46]:
!pip install huggingface_hub
!pip install huggingface-cli



In [47]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
[1m[31mCannot authenticate through 

In [50]:
import time
from sentence_transformers import SentenceTransformer, util
from datasets import load_dataset

# Load the pre-trained bi-encoder model
bi_encoder = SentenceTransformer('sentence-transformers/bert-base-nli-stsb-mean-tokens')

# Load the dataset
ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')

# Extract passages from the 'Plot' column of the dataset
passages = ds['Plot']

# Compute embeddings for the passages using the bi-encoder model
corpus_embeddings = []
for passage in passages:
    embedding = bi_encoder.encode(passage, convert_to_tensor=True)  # Compute embedding
    corpus_embeddings.append(embedding)  # Append the embedding to corpus_embeddings list

def search(query, top_k=5):
    # Encode the query using the bi-encoder and find potentially relevant passages
    start_time = time.time()
    question_embedding = bi_encoder.encode(query, convert_to_tensor=True)
    hits = util.semantic_search(question_embedding, corpus_embeddings, top_k=top_k)
    hits = hits[0]  # Get the hits for the first query
    end_time = time.time()

    # Output top-k hits
    print("Input question:", query)
    print("Results (after {:.3f} seconds):".format(end_time - start_time))
    for hit in hits:
        print("\t{:.3f}\t{}".format(hit['score'], passages[hit['corpus_id']]))

search(query="What is the capital of France?")



Input question: What is the capital of France?
Results (after 0.013 seconds):
	0.250	Paul L. Hoefler heads a 1928 expedition to Africa.
	0.220	The main charter Karl Breitman played by Henry B. Walthall, thinks he is a descendant of Napoleon and tries bring back to France the French monarchy. As part of his plot he courts Hedda Gobert played by Rosemary Theby as she owns some Napoleon's papers. After winning Hedda haert he takes the documents from she. He travels to America to visit Admiral Killigrew played by Hardee Kirkland. He hopes the stolen papers will lead him to Napoleon wealth. He finds a treasure map in the Admiral's home and then travels to Corsica. Before finding the Napoleon wealth, he comes across someone that mocks him. He challenges them to a duel. In the duel he is mortally wounded. He dies at his love side, Hedda.[3]
	0.210	When an underworld figure inherits a fortune, he goes straight and endeavors to become a respectable businessman. But on a trip to Paris, he encoun

In [71]:
import time
from sentence_transformers import SentenceTransformer, util
from datasets import load_dataset

# Load the pre-trained bi-encoder model
bi_encoder = SentenceTransformer('sentence-transformers/bert-base-nli-stsb-mean-tokens')

# Load the dataset
ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')

# Extract passages from the 'Title' and 'Plot' columns of the dataset
titles = ds['Title']
plots = ds['Plot']

# Combine 'Title' and 'Plot' passages into a single text
combined_passages = [title + ' ' + plot for title, plot in zip(titles, plots)]

# Compute embeddings for the combined passages using the bi-encoder model
combined_embeddings = []
for passage in combined_passages:
    embedding = bi_encoder.encode(passage, convert_to_tensor=True)  # Compute embedding
    combined_embeddings.append(embedding)  # Append the embedding to combined_embeddings list

def search(query, top_k=5):
    # Encode the query using the bi-encoder and find potentially relevant passages
    start_time = time.time()
    question_embedding = bi_encoder.encode(query, convert_to_tensor=True)

    # Perform semantic search on the combined embeddings
    hits = util.semantic_search(question_embedding, combined_embeddings, top_k=top_k)
    hits = hits[0]  # Get the hits for the first query
    end_time = time.time()

    # Output top-k hits
    print("Input question:", query)
    print("Combined Results (after {:.3f} seconds):".format(end_time - start_time))
    for hit in hits:
        print("\t{:.3f}\t{}".format(hit['score'], combined_passages[hit['corpus_id']]))

search(query="What is the capital of France?")


Input question: What is the capital of France?
Combined Results (after 0.013 seconds):
	0.210	A Splendid Hazard The main charter Karl Breitman played by Henry B. Walthall, thinks he is a descendant of Napoleon and tries bring back to France the French monarchy. As part of his plot he courts Hedda Gobert played by Rosemary Theby as she owns some Napoleon's papers. After winning Hedda haert he takes the documents from she. He travels to America to visit Admiral Killigrew played by Hardee Kirkland. He hopes the stolen papers will lead him to Napoleon wealth. He finds a treasure map in the Admiral's home and then travels to Corsica. Before finding the Napoleon wealth, he comes across someone that mocks him. He challenges them to a duel. In the duel he is mortally wounded. He dies at his love side, Hedda.[3]
	0.199	A Woman of Paris Marie St. Clair and her beau, aspiring artist Jean Millet, plan to leave their small French village for Paris, where they will marry. On the night before their s

In [72]:
search(query = "What is the capital of the France?")

Input question: What is the capital of the France?
Combined Results (after 0.021 seconds):
	0.239	A Splendid Hazard The main charter Karl Breitman played by Henry B. Walthall, thinks he is a descendant of Napoleon and tries bring back to France the French monarchy. As part of his plot he courts Hedda Gobert played by Rosemary Theby as she owns some Napoleon's papers. After winning Hedda haert he takes the documents from she. He travels to America to visit Admiral Killigrew played by Hardee Kirkland. He hopes the stolen papers will lead him to Napoleon wealth. He finds a treasure map in the Admiral's home and then travels to Corsica. Before finding the Napoleon wealth, he comes across someone that mocks him. He challenges them to a duel. In the duel he is mortally wounded. He dies at his love side, Hedda.[3]
	0.196	When Knighthood Was in Flower Mary Tudor, Queen of France (Marion Davies), the younger sister of King Henry VIII (Lyn Harding), falls in love with commoner Charles Brandon, 1

In [73]:
search(query = "What is the best orchestra in the world?")

Input question: What is the best orchestra in the world?
Combined Results (after 0.018 seconds):
	0.415	The Midnight Girl Lugosi plays, according to an intertitle, "Nicholas Harmon, the immensely wealthy patron of music" who "loved his weaknesses — and his favorite weakness was Nina," his mistress, an opera singer whose voice is faltering. His stepson Don, an orchestra conductor, rejects the attentions of a society girl. Don becomes estranged from his stepfather in an argument, and leaves to succeed on his own. He helps the career of Anna, a newly arrived singer from Russia who becomes a nightclub star, the "Midnight Girl". Harmon sees her perform, and is entranced. He invites her to his apartment, where his attempts to seduce her become forceful. Anna fires at gun at him, but hits instead Nina, who has been hiding behind a curtain. Harmon realizes how much he loves Nina, and cradles her in his arms. At the end of the story, Don has married Anna, who is now a leading opera singer, and 

In [74]:
search(query = "Number countries Europe")

Input question: Number countries Europe
Combined Results (after 0.017 seconds):
	0.203	Africa Speaks! Paul L. Hoefler heads a 1928 expedition to Africa.
	0.201	His Hour Gritzko (John Gilbert) is a Russian nobleman and Tamara (Aileen Pringle) is the object of his desire.
	0.198	David Copperfield "David Copperfield consists of three reels and as three separate films, released in three consecutive weeks, with three different titles: The Early Life of David Copperfield, Little Em'ly and David Copperfield and The Loves of David Copperfield.[4]
	0.165	High Society Blues A new country family comes to live among established wealthy neighbors.
	0.075	In the Land of the Head Hunters The following plot synopsis was published in conjunction with a 1915 showing of the film at Carnegie Hall:


In [75]:
search(query = "When did the cold war end?")

Input question: When did the cold war end?
Combined Results (after 0.017 seconds):
	0.284	A Lady Surrenders A man is left by his wife and assuming her to be gone forever, he remarries. Complications ensue when his original wife returns home.[2][3]
	0.229	London After Midnight In a cultured and peaceful home on the outskirts of London,[9] the head of the household, Sir Roger Balfour, is found dead from what initially appears to be a self-inflicted bullet wound, despite the insistence of Balfour's friend and neighbour, Sir James Hamlin, that his old friend would never have taken his own life. Nonetheless, Balfour's death is officially declared a suicide by Inspector Edward C. Burke of Scotland Yard.[9][10][11]
Five years later, with the case still unresolved, a sinister-looking man with pointed teeth wearing a Beaver-skin top hat, arrives at the household accompanied by a cadaverous-looking woman in a long gown; the arrival of these two individuals prompts Sir James Hamlin, the friend a

In [76]:
search(query = "How long do cats live?")

Input question: How long do cats live?
Combined Results (after 0.018 seconds):
	0.203	Dusk to Dawn An Indian maid and American girl (both played by Florence Vidor) share a single soul which shifts between them each day when they are awake.[3]
	0.194	The Cat and the Canary In a decaying mansion overlooking the Hudson River, millionaire Cyrus West approaches death. His greedy family descends upon him like "cats around a canary", causing him to become insane. West orders that his last will and testament remain locked in a safe and go unread until the 20th anniversary of his death. As the appointed time arrives, West's lawyer, Roger Crosby (Tully Marshall), discovers that a second will mysteriously appeared in the safe. The second will may only be opened if the terms of the first will are not fulfilled. The caretaker of the West mansion, Mammy Pleasant (Martha Mattox), blames the manifestation of the second will on the ghost of Cyrus West, a notion that the astonished Crosby quickly reject

In [77]:
search(query = "How many people live in Toronto?")

Input question: How many people live in Toronto?
Combined Results (after 0.017 seconds):
	0.132	Nanook of the North The documentary follows the lives of an Inuk, Nanook, and his family as they travel, search for food, and trade in the Ungava Peninsula of northern Quebec, Canada. Nanook; his wife, Nyla; and their family are introduced as fearless heroes who endure rigors no other race could survive. The audience sees Nanook, often with his family, hunt a walrus, build an igloo, go about his day, and perform other tasks.
	0.108	River's End In remote northern Canada, Sergeant Conniston (Bickford) seeks to capture escaped convicted murderer Keith (also played by Bickford). He is accompanied by O'Toole (J. Farrell MacDonald), a guide who is constantly drunk. When he finally catches his quarry, he is shocked to find that they look exactly alike.
On their way back to the RCMP post, however, their sled overturns. Keith takes Conniston's gun and sled and leaves the policeman and his guide to d

In [78]:
search(query = "Oldest US president")

Input question: Oldest US president
Combined Results (after 0.021 seconds):
	0.299	Brewster's Millions As summarized in a film publication,[4] Monte Brewster's (Arbuckle) two grandfathers, one rich and the other a self-made man, squabble as to the way the infant should be raised. The mother steps in and decides to raise the child her way, which results in Monte being a clerk in a steamship office at the age of 21. At this point the grandfathers get together again, with one grandfather giving him $1 million, and the other offering $4 million provided that at the end of one year Monte spends the $1 million given by the other grandfather. Other conditions include that he be absolutely "broke" at the end of one year, that he not marry for five years, and not to tell any one of the arrangement. Young Brewster tries everything he can to get rid of the money, but everything he does and the wildest chances he takes result in more money for him. He hires three men to help him spend the money, b

In [79]:
search(query = "Coldest place earth")

Input question: Coldest place earth
Combined Results (after 0.015 seconds):
	0.226	The Frozen North The film opens near the "last stop on the subway", a terminal in Alaska, which appears to be emerging from deep snow in the middle of nowhere. A tough-looking cowboy (Buster Keaton) emerges. He arrives at a small settlement, finding people gambling in a saloon. He tries to rob them by scaring them with the cutout of a poster of a man holding a gun, which he places at the window, as if he has an accomplice. He tells the gamblers to raise their hands in the air. Frightened, they hand over their cash, but soon they find out the truth when a drunk man falls over the cutout. Keaton is thrown out through the window.
Next, he mistakenly enters a house thinking that it is his own house. Inside, he sees a man and a woman kissing. Thinking the woman is his wife, he gets red-hot angry and shoots the couple, later to realize his mistake. He goes to his own house, where he finds his wife (Sybil Seel

In [80]:
search(query = "When was Barack Obama born?")

Input question: When was Barack Obama born?
Combined Results (after 0.018 seconds):
	0.273	Africa Speaks! Paul L. Hoefler heads a 1928 expedition to Africa.
	0.172	Lights of New York When bootleggers Jake Jackson (Walter Percival) and Dan Dickson (Jere Delaney), who have been hiding out in a small upstate New York town, learn that they finally can return to New York, they try to convince a young kid named Eddie Morgan (Cullen Landis) and his friend, a local barber named Gene (Eugene Palette) to come with them.
With a promise from Jackson and Dickson that they will help the young men establish a barbershop in the city, Eddie asks his mother, Mrs. Morgan (Mary Carr), who owns the town's Morgan Hotel, to loan them $5,000 of her savings. Eddie and Gene set up the barbershop in New York but soon learn that it is merely a front for a speakeasy.
Frustrated and yearning for a return to the quiet life, Gene and Eddie vow to go home as soon as they earn enough to pay back Mrs. Morgan. Eddie is

In [81]:
search(query = "Paris eiffel tower")

Input question: Paris eiffel tower
Combined Results (after 0.017 seconds):
	0.240	Charlie Chan Carries On Charlie Chan tries to solve the murder of a wealthy American found dead in a London hotel room. Settings include London, Nice, France, San Remo, Honolulu and Hong Kong.
	0.221	The Valiant The credits (accompanied by organ music endemic to silent films), segue into title card: "A city street-----where laughter and tragedy rub elbows." A crowded block lined with tenement buildings, on Manhattan's Lower East Side, comes into view, followed by a look into the hallway of one of those buildings, then a shot is heard, a door to one of the apartments opens and a man holding a gun (Paul Muni) backs out, closes the door, puts the gun in his pocket, then walks down flights of stairs and into the busy street. While he passes along sidewalks teeming with human activity, an Irish American policeman berates a driver for parking in front of a hydrant, but when the driver removes his scarf, reveali

In [82]:
search(query = "Which US president was killed?")

Input question: Which US president was killed?
Combined Results (after 0.018 seconds):
	0.410	Charlie Chan Carries On Charlie Chan tries to solve the murder of a wealthy American found dead in a London hotel room. Settings include London, Nice, France, San Remo, Honolulu and Hong Kong.
	0.404	The Benson Murder Case A ruthless, crooked stockbroker is murdered at his luxurious country estate, and detective Philo Vance just happens to be there; he decides to find out who killed him.[3]
	0.355	The Martyred Presidents The film, just over a minute long, is composed of two shots. In the first, a girl sits at the base of an altar or tomb, her face hidden from the camera. At the center of the altar, a viewing portal displays the portraits of three U.S. Presidents—Abraham Lincoln, James A. Garfield, and William McKinley—each victims of assassination.
In the second shot, which runs just over eight seconds long, an assassin kneels feet of Lady Justice.
	0.313	Trent's Last Case A leading financier 

In [83]:
search(query="When is Chinese New Year")

Input question: When is Chinese New Year
Combined Results (after 0.016 seconds):
	0.199	Across to Singapore In 1857, Joel Shore (Ramon Novarro), the carefree youngest son of a seafaring family, has a flirtatious friendship with Priscilla Crowninshield (Joan Crawford), and he eventually falls in love with her. However, unbeknownst to him, Priscilla has been betrothed to Joel's much older brother, Mark (Ernest Torrence). The wedding is announced in church as a surprise, and Joel and Priscilla are both shocked, with Priscilla refusing to kiss her new husband after the ceremony.
Mark, a ship's captain, sails to Singapore, accompanied by Joel and their other brothers. Priscilla tells Joel she had no idea about the marriage and tries to kiss him, but Joel is hurt and rebuffs Priscilla's advances before he leaves. At the same time, Mark, mad about Priscilla spurning him, drinks heavily during the voyage and begins to see hallucinations of Priscilla. He senses that Priscilla loves someone els

In [84]:
search(query="what is the name of manchester united stadium")

Input question: what is the name of manchester united stadium
Combined Results (after 0.020 seconds):
	0.298	Alexander Hamilton The story depicts Hamilton's (George Arliss) efforts to pass the "Assumption Bill", which required the federal government to assume the debts incurred by the 13 states during the American Revolutionary War, and his agreement to a compromise by passage of the Residence Bill establishing the national capital.[2]
	0.230	In the Land of the Head Hunters The following plot synopsis was published in conjunction with a 1915 showing of the film at Carnegie Hall:
	0.195	David Copperfield "David Copperfield consists of three reels and as three separate films, released in three consecutive weeks, with three different titles: The Early Life of David Copperfield, Little Em'ly and David Copperfield and The Loves of David Copperfield.[4]
	0.188	Charlie Chan Carries On Charlie Chan tries to solve the murder of a wealthy American found dead in a London hotel room. Settings incl

In [85]:
search(query="who wrote cant get you out of my head lyrics")

Input question: who wrote cant get you out of my head lyrics
Combined Results (after 0.026 seconds):
	0.286	Show Girl in Hollywood When the film begins, a musical show before closed down before it has had a chance to even open. Jimmie Doyle (Jack Mulhall), who wrote the musical intends to rewrite it while his girlfriend, Dixie Dugan (Alice White), fed up at wasting her time for a show that never even opened, is intent on finding a new career. While at a nightclub, Dixie does a musical number and catches the eye of Frank Buelow (John Miljan), a Hollywood director. Buelow persuades Dixie to go to Hollywood, where he will have a part waiting for her in his upcoming films.
Dixie takes the next train to California. When she arrives, she is disappointed to find that Buelow has been fired from the studio and that there is no part for her. Dixie meets Donny Harris (Blanche Sweet), a former star who is now out of work because she is considered "as old as the hills" at the age of 32.[5] Soon af

In [86]:
search(query="where does the story the great gatsby take place")

Input question: where does the story the great gatsby take place
Combined Results (after 0.017 seconds):
	0.324	The Great Gatsby An adaptation of F. Scott Fitzgerald's Long Island-set novel, where Midwesterner Nick Carraway is lured into the lavish world of his neighbor, Jay Gatsby. Soon enough, however, Carraway will see through the cracks of Gatsby's nouveau riche existence, where obsession, madness, and tragedy await.
	0.287	In the Land of the Head Hunters The following plot synopsis was published in conjunction with a 1915 showing of the film at Carnegie Hall:
	0.279	Wizard of Oz A toymaker (Semon) reads L. Frank Baum's book to his granddaughter. The Land of Oz is ruled by Prime Minister Kruel (Josef Swickard), aided by Ambassador Wikked (Otto Lederer), Lady Vishuss (Virginia Pearson), and the Wizard (Charles Murray), a "medicine-show hokum hustler". When the discontented people, led by Prince Kynd (Bryant Washburn), demand the return of the princess, who disappeared while a baby m

In [87]:
search(query="who turned out to be the mother on how i met your mother")

Input question: who turned out to be the mother on how i met your mother
Combined Results (after 0.022 seconds):
	0.357	A Lady Surrenders A man is left by his wife and assuming her to be gone forever, he remarries. Complications ensue when his original wife returns home.[2][3]
	0.339	Rebecca of Sunnybrook Farm As described in a film magazine,[4] Rebecca Randall (Pickford) is taken into the home of her aunt Hannah (Eddy), a strict New England woman. Rebecca meets Adam Ladd (O'Brien), a young man of the village, and they become great friends. One day Rebecca promises to marry Adam when she becomes of age. Unable to withstand her pranks any longer, her aunt sends her away to a boarding school. She graduates a beautiful young lady. Shortly thereafter, Adam demands a fulfillment of her promise.
	0.331	Stage Struck Jennie Hagan (Swanson) is a waitress who dreams of becoming a star. When a real theatrical diva (Astor) arrives in town, Jennie schemes to get a part on the stage.
	0.318	Sentimen

In [88]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('nq-distilbert-base-v1')

query_embedding = model.encode('How many people live in London?')

#The passages are encoded as [title, text]
passage_embedding = model.encode([['London', 'London has 9,787,426 inhabitants at the 2011 census.']])

print("Similarity:", util.pytorch_cos_sim(query_embedding, passage_embedding))

Similarity: tensor([[0.6503]])


In [89]:
query_embedding = model.encode('who turned out to be the mother on how i met your mother')

#The passages are encoded as [title, text]
passage_embedding = model.encode([['The Mother (How I Met Your Mother)', 'The Mother (How I Met Your Mother) Tracy McConnell (colloquial: "The Mother") is the title character from the CBS television sitcom "How I Met Your Mother". The show, narrated by Future Ted (Bob Saget), tells the story of how Ted Mosby (Josh Radnor) met The Mother. Tracy McConnell appears in eight episodes, from "Lucky Penny" to "The Time Travelers", as an unseen character; she was first seen fully in "Something New" and was promoted to a main character in season 9. The Mother is played by Cristin Milioti. The story of how Ted met The Mother is the framing device'],
                                  ['Make It Easy on Me', 'and Pete Waterman on her 1993 album "Good \'N\' Ready", on which a remixed version of the song is included. "Make It Easy On Me", a mid-tempo R&B jam, received good reviews (especially for signalling a different, more soulful and mature sound atypical of the producers\' Europop fare), but failed to make an impact on the charts, barely making the UK top 100 peaking at #99, and peaking at #52 on the "Billboard" R&B charts. The pop group Steps covered the song on their 1999 album "Steptacular". It was sung as a solo by Lisa Scott-Lee. Make It Easy on']])

print("Similarity:", util.pytorch_cos_sim(query_embedding, passage_embedding))

Similarity: tensor([[ 0.7562, -0.0835]])


In [90]:
query_embedding = model.encode('where does the story the great gatsby take place')
passage_embedding = model.encode([['The Great Gatsby',
 'The Great Gatsby The Great Gatsby is a 1925 novel written by American author F. Scott Fitzgerald that follows a cast of characters living in the fictional towns of West Egg and East Egg on prosperous Long Island in the summer of 1922. The story primarily concerns the young and mysterious millionaire Jay Gatsby and his quixotic passion and obsession with the beautiful former debutante Daisy Buchanan. Considered to be Fitzgerald\'s magnum opus, "The Great Gatsby" explores themes of decadence, idealism, resistance to change, social upheaval, and excess, creating a portrait of the Roaring Twenties that has been described as'],
 ['The Producers (1967 film)', '2005 (to coincide with the remake released that year). In 2011, MGM licensed the title to Shout! Factory to release a DVD and Blu-ray combo pack with new HD transfers and bonus materials. StudioCanal (worldwide rights holder to all of the Embassy Pictures library) released several R2 DVD editions and Blu-ray B releases using a transfer slightly different from the North Ameri can DVD and BDs. The Producers (1967 film) The Producers is a 1967 American satirical comedy film written and directed by Mel Brooks and starring Zero Mostel, Gene Wilder, Dick Shawn, and Kenneth Mars. The film was Brooks\'s directorial']
])

print("Similarity:", util.pytorch_cos_sim(query_embedding, passage_embedding))


Similarity: tensor([[ 0.8294, -0.2055]])


In [91]:
search(query="Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions")


Input question: Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions
Combined Results (after 0.020 seconds):
	0.454	Nanook of the North The documentary follows the lives of an Inuk, Nanook, and his family as they travel, search for food, and trade in the Ungava Peninsula of northern Quebec, Canada. Nanook; his wife, Nyla; and their family are introduced as fearless heroes who endure rigors no other race could survive. The audience sees Nanook, often with his family, hunt a walrus, build an igloo, go about his day, and perform other tasks.
	0.291	The Homesteader The Homesteader involves six principal characters, the leading one being Jean Baptiste (Charles Lucas), a homesteader far off in the Dakotas, living where he alone is black. To this wilderness arrives Jack Stewart, a Scotsman, with his motherless daughter, Agnes (Iris Hall). In Agnes, Baptiste meets the girl of his dreams. Agnes, however, does not know that she is not white. Peculiar fate threw 

In [92]:
search(query="Western romance")


Input question: Western romance
Combined Results (after 0.019 seconds):
	0.421	The River Allen John Spender (Charles Farrell) is a virile outdoorsman and Rosalee (Mary Duncan) is his high society sweetheart.[3]
	0.358	Don't Bet on Women On a whim, Herbert Blake proposes a wager with Roger Fallon that he won't be able to get a kiss during the coming 48 hours from the next woman who happens to walk into the room. Fallon takes the bet, whereupon the woman who turns up is Herbert's wife.
	0.343	The Deciding Kiss As described in a film magazine,[3] Eleanor Hamlin (Roberts), who has been living with an old and impoverished couple, is adopted by two couples, Mr. and Mrs. Sears and Beulah Page (Greenwood) and Peter Bolling (Unterkircher), young people who have read of cooperative parenting and wish to try out the theory. It works very well until Jimmy Sears (Cooley) loses control of himself under the spell of his adopted daughter's kisses. This passes, however, but then Peter falls in love wit

In [93]:
search(query="Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo.")


Input question: Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo.
Combined Results (after 0.019 seconds):
	0.576	Little Lord Fauntleroy In a shabby New York City side street in the mid-1880s, young Cedric Errol lives with his mother (known only as Mrs. Errol or "Dearest") in genteel poverty after the death of his father, Captain Cedric Errol. One day, they are visited by an English lawyer named Havisham with a message from young Cedric's grandfather, the Earl of Dorincourt, an unruly millionaire who despises the United States and was very disappointed when his youngest son married an American woman. With the deaths of his father's elder brothers, Cedric has now inherited the title Lord Fauntleroy and is the heir to the earldom and a vast estate. Cedric's grandfather wants him to live in England and be educated as an English aristocrat. He offers his son's widow a house and guaranteed i

In [94]:
search(query="Comedy film, office disguises, boss's daughter, elopement.")


Input question: Comedy film, office disguises, boss's daughter, elopement.
Combined Results (after 0.020 seconds):
	0.632	Mabel's Blunder Mabel's Blunder tells the tale of a young woman who is secretly engaged to the boss's son.[1] The young man's sister comes to visit at their office, and a jealous Mabel, not knowing who the visiting woman is, dresses up as a (male) chauffeur to spy on them.
	0.589	The Saturday Night Kid Set in May 1929, the film focuses on two sisters - Mayme (Clara Bow) and Janie (Jean Arthur) - as they share an apartment in New York City. In daytime, they work as salesgirls at the Ginsberg's department store, and at night they vie for the attention of their colleague Bill (James Hall) and fight over Janie's selfish and reckless behavior, such as stealing Mayme's clothes and hitchhiking to work with strangers. Bill prefers Mayme over Janie and constantly shows his affection for her. This upsets Janie, who schemes to break up the couple.
One day at work, Bill is pro

In [95]:
search(query="Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.")


Input question: Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.
Combined Results (after 0.028 seconds):
	0.593	Cleopatra Because the film has been lost, the following summary is reconstructed from a description in a contemporary film magazine.
Cleopatra (Bara), the Siren of Egypt, by a clever ruse reaches Caesar (Leiber) and he falls victim to her charms. They plan to rule the world together, but then Caesar falls. Cleopatra's life is desired by the church, as the wanton woman's rule has become intolerable. Pharon (Roscoe), a high priest, is given a sacred dagger to take her life. He gives her his love instead and, when she is in need of some money, leads her to the tomb of his ancestors, where she tears the treasure from the breast of the mummy. With this wealth she goes to Rome to meet Antony (Hall). He leaves the affairs of state and travels to Alexandria with her, where they revel. Antony is rec

In [96]:
search(query="Denis Gage Deane-Tanner")


Input question: Denis Gage Deane-Tanner
Combined Results (after 0.019 seconds):
	0.320	His Hour Gritzko (John Gilbert) is a Russian nobleman and Tamara (Aileen Pringle) is the object of his desire.
	0.272	Captain Alvarez A melodrama about an American who becomes a revolutionary leader battling evil government spies in Argentina. William Desmond Taylor portrays the title role, and Denis Gage Deane-Tanner, Taylor's younger brother, is thought to have played the small role of a blacksmith.
	0.249	Mr. Fix-It As described in a film magazine,[3] because of his ability to fix things Dick Remington (Fairbanks) becomes known as "Mr. Fix-It" and enters the aristocratic home of the Burroughs as their nephew. Before long he has melted the stone hearts of three aunts and one uncle and won the heart of Mary McCullough (Hawley) in addition to setting aright the affairs of pretty Georgiana Burroughs (MacDonald) and Olive Van Tassell (Landis).
	0.248	The Black Arrow: A Tale of the Two Roses The novel i