<a href="https://colab.research.google.com/github/rafajak/gpt3_examples/blob/master/GPT3_search_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### The following is a simple example of OpenAI's Search API in action

In [106]:
!pip install openai
!pip install transformers

import pandas as pd 
import json 
from google.colab import files
import openai
import requests



In [None]:
#To keep your secret API key secure - create a api_key.json file containing the key and upload it with the function below. 
# (It should look like this: {"api_key": "heresmysecretkey"})

upload = files.upload()
openai.api_key = json.load(open("api_key.json", "r"))["api_key"]

In [107]:
documents=["Shostakovich", "Gershwin", "Mussorgsky", "Chopin"]
query="Russian 20th century composer"

response = openai.Engine("davinci").search(
  documents = documents,
  query = query
)

response_df = pd.concat([pd.DataFrame(response["data"]),
                         pd.Series(documents, name= "documents")],axis=1)
response_df = response_df[["documents", "score"]]

print(f"Semantic search results for '{query}':")
response_df.sort_values(by="score",ascending=False)

Semantic search results for 'Russian 20th century composer':


Unnamed: 0,documents,score
0,Shostakovich,226.203
2,Mussorgsky,188.573
3,Chopin,103.092
1,Gershwin,92.714


Higher score indicates higher semantic similarity of a document to the query.  In the example above, the model correctly assigns the highest score to Shostakovich, who is the only composer from the list born in Russia in the 20th century. Well done, DaVinci!

# Q&A:

```(Source: Nik from OpenAI) ```

---
1.   How is the score calculated?
---


For a set of documents {d1, d2, …, dn} and a query q the score is given by
s[j] = 100.0 * (log(p(q|dj)) - log(p(q|"")) / ntokens(q) .
This can be interpreted as “100 x average log-prob per query token given the document as context referenced to an empty-document baseline  

In the extremal case log(p(q|dj)) might be 0, because the document perfectly predicts the query, and log(p(q|""))  might be maximally uninformative, i.e., expected per token logprob of the uniform distribution over tokens, so while this is not a hard bound, we should typically expect -log(p(q|"")) <= log(n_vocab) ~= 4.7 . So most of the time the score should be somewhere between 0 and 470.

The full definition of log(p(q=[q1,q1,...,qN]|d)) is log(p(q1|d)) + log(p(q2|d,q1)) + ... + log(p(qN|d,q1,q2,...,q{N-1})) , i.e., we evaluate the sum of log-probabilities for each query token given one of the documents as the initial context. This sum has exactly ntokens(q) terms, so really we are just using that to normalize to an average log-probability per query token.

<br>



---
2.   What's the interpretation of scores below 0? 
---

The meaning of a negative score is that the model found the query to be more likely to occur without the document as context (ie. ""), than with the document.