[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/openai/openai-ml-qa/01-making-queries.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/openai/openai-ml-qa/01-making-queries.ipynb)

# Making Queries

In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a generative OpenAI model to generate an answer backed by real data sources. Required installs for this notebook are:

In [1]:
!pip install -qU \
  openai==0.27.7 \
  pinecone-client==2.2.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/72.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.2/177.2 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m283.7/283.7 kB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

---

🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._

---

## Initializing Everything

We will start by initializing everything we will be using. Those components are:

* Pinecone vector DB for retrieval (we must also connect to the previously build `openai-ml-qa` index)

* OpenAI `text-embedding-ada-002` embedding model for embedding queries

* OpenAI `text-davinci-003` generation model for generating answers

We first initialize the vector DB. For that we need our [free Pinecone API key](https://app.pinecone.io).

In [2]:
import os
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY"
# find your environment in the console next to your api key
env = os.getenv("PINECONE_ENVIRONMENT") or "PINECONE_ENVIRONMENT"

pc = Pinecone(api_key=api_key)
pinecone.whoami()

  from tqdm.autonotebook import tqdm


WhoAmIResponse(username='c78f2bd', user_label='default', projectname='9a4fbb6')

In [3]:
index_name = 'openai-ml-qa'

In [4]:
# connect to index
index = pinecone.Index(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 5458}},
 'total_vector_count': 5458}

Now initialize the OpenAI models (or _"engines"_), for this we need an [OpenAI API key](https://platform.openai.com/).

In [5]:
import openai

# get API key from top-right dropdown on OpenAI website
openai.api_key = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

openai.Engine.list()  # check we have authenticated

<OpenAIObject list at 0x7fba8339c1d0> JSON: {
  "data": [
    {
      "created": null,
      "id": "whisper-1",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "davinci",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-edit-001",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-003",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage-code-search-code",
      "object": "engine",
      "owner": "ope

We will use the embedding model `text-embedding-ada-002` like so:

In [6]:
embed_model = "text-embedding-ada-002"

query = 'What are the differences between PyTorch and TensorFlow?'

res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

And use the returned query vector to query Pinecone like so:

In [7]:
xq = res['data'][0]['embedding']

res = index.query(vector=xq, top_k=3, include_metadata=True)
res

{'matches': [{'id': '3626',
              'metadata': {'category': 'General Discussion',
                           'context': 'I think this post might help you:\n'
                                      '  \n'
                                      '      \n'
                                      '\n'
                                      '      Medium – 8 Jun 21\n'
                                      '  \n'
                                      '\n'
                                      '  \n'
                                      '    \n'
                                      '\n'
                                      'Pytorch vs Tensorflow 2021 7\n'
                                      '\n'
                                      '  Tensorflow/Keras & Pytorch are by far '
                                      'the 2 most popular major machine '
                                      'learning libraries. Tensorflow is '
                                      'maintained and released by

We have some relevant contexts there, and some irrelevant. Now we rely on the generative model `text-davinci-003` to generate our answer.

In [8]:
limit = 3750

contexts = [
    x['metadata']['context'] for x in res['matches']
]

# build our prompt with the retrieved contexts included
prompt_start = (
    "Answer the question based on the context below.\n\n"+
    "Context:\n"
)
prompt_end = (
    f"\n\nQuestion: {query}\nAnswer:"
)
# append contexts until hitting limit
for i in range(1, len(contexts)):
    if len("\n\n---\n\n".join(contexts[:i])) >= limit:
        prompt = (
            prompt_start +
            "\n\n---\n\n".join(contexts[:i-1]) +
            prompt_end
        )
        break
    elif i == len(contexts)-1:
        prompt = (
            prompt_start +
            "\n\n---\n\n".join(contexts) +
            prompt_end
        )

# now query text-davinci-003
res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    temperature=0,
    max_tokens=400,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None
)

We check the generated response like so:

In [9]:
res['choices'][0]['text'].strip()

'If you ask me for my personal opinion I find Tensorflow more convenient in the industry (prototyping, deployment and scalability is easier) and PyTorch more handy in research (its more pythonic and it is easier to implement complex stuff).'

What we get here essentially an extract from the top result, we can ask for more information by modifying the prompt.

In [10]:
query = 'What are the differences between PyTorch and TensorFlow?'

# create query vector
res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)
xq = res['data'][0]['embedding']

# get relevant contexts
res = index.query(vector=xq, top_k=10, include_metadata=True)
contexts = [
    x['metadata']['context'] for x in res['matches']
]

# build our prompt with the retrieved contexts included
prompt_start = (
    "Give an exhaustive summary and answer based on the question using the contexts below.\n\n"+
    "Context:\n"+
    "\n\n---\n\n".join(contexts)+"\n\n"+
    f"Question: {query}\n"+
    f"Answer:"
)
prompt_end = (
    f"\n\nQuestion: {query}\nAnswer:"
)
# append contexts until hitting limit
for i in range(1, len(contexts)):
    if len("\n\n---\n\n".join(contexts[:i])) >= limit:
        prompt = (
            prompt_start +
            "\n\n---\n\n".join(contexts[:i-1]) +
            prompt_end
        )
    elif i == len(contexts):
        prompt = (
            prompt_start +
            "\n\n---\n\n".join(contexts) +
            prompt_end
        )

# now query text-davinci-003
res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    temperature=0,
    max_tokens=400,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None
)
res['choices'][0]['text'].strip()

'PyTorch and TensorFlow are two of the most popular major machine learning libraries. TensorFlow is maintained and released by Google while PyTorch is maintained and released by Facebook. TensorFlow is more convenient in the industry for prototyping, deployment, and scalability, while PyTorch is more handy in research as it is more pythonic and easier to implement complex stuff. TensorFlow.js has several unique advantages over Python equivalent as it can run on the client side too, not just the server side (via Node) and on the server side it can potentially run faster than Python due to the JIT compiler of JS. Other than that the APIs etc are similar and Python is older so is more mature in terms of development.'

The advantage of Tensorflow.js could have been framed better and the fact that PyTorch has no equivalent explicitly stated. However, the answer is good and gives a nice summary and answer to our question — using information pulled from multiple sources.

Once you're finished with the index we delete it to save resources:

In [11]:
pinecone.delete_index(index_name)

---