## Virtual Environment Configuration
### Install Python 3.10 on Ubuntu
Follow https://computingforgeeks.com/how-to-install-python-on-ubuntu-linux-system/ .

    sudo apt-get update
    sudo apt install software-properties-common -y
    sudo add-apt-repository ppa:deadsnakes/ppa
    sudo apt install python3.10

### Create Python 3.10 virtualenv in ~/py310

    cd ~
    pip3 install virtualenv
    virtualenv --python=/usr/bin/python3.10 py310
    source py310/bin/activate
    pip list # Show packages
    pip install --upgrade pip

### Add py310 virtualenv to Jupyter

    ipython kernel install --user --name py310

Now select the kernel when running this notebook.    

In [2]:
import sys
print (sys.version)

3.10.11 (main, Apr  5 2023, 14:15:30) [GCC 7.5.0]


# Retrieval-Augmented Generation with Pinecone & ChatGPT
More information at https://www.pinecone.io/learn/openai-gen-qa/
Github repository: https://github.com/pinecone-io/examples/tree/master/generation/generative-qa/openai/gen-qa-openai

## Install dependencies
   
    pip install -qU openai pinecone-client datasets tqdm

## Set OPENAI_API_KEY before running jupyter

You need an API key set up from: https://platform.openai.com/account/api-keys

    export OPENAI_API_KEY="secret key from site"

In [1]:
import os
import openai

# get API key from top-right dropdown on OpenAI website
openai.api_key = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

openai.Engine.list()  # check we have authenticated

<OpenAIObject list at 0x7fc72e5a5670> JSON: {
  "data": [
    {
      "created": null,
      "id": "whisper-1",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "davinci",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-edit-001",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-003",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage-code-search-code",
      "object": "engine",
      "owner": "ope

## Test query with ChatGPT 3.5
Now run a query with ChatGPT 3.5-turbo. Using ChatCompletion.create ( https://platform.openai.com/docs/api-reference/chat/create ) you can construct a chat history (with memory) for the chatbot. (The limit is 4096 tokens.)

In [11]:
query = "You are a useful agent. If you don't know the answer, please say so. " + \
        "Answer the following question: " + \
        "What are the main components of the NIST Zero Trust Architecture?"

# now query GPT 3.5 WITHOUT context
res = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {"role": "user", "content" : query}
    ]
)

print(res['choices'][0]['message']['content'])

The main components of the NIST Zero Trust Architecture are as follows: 

1. Identity and Access Management (IAM)
2. Network Infrastructure Security (NIS)
3. Cyber Defense Technologies (CDT)
4. Data Security and Privacy (DSP)
5. Data and Analytics Visibility and Automation (DAVA)


## Correcting an answer by providing more context

The answer is not accurate (the NIST ZTA is defined in NIST SP 800-207 https://csrc.nist.gov/publications/detail/sp/800-207/final ).

Can we create context (provide additional information) in the chat that will allow the chatbot to answer the question. The general approach to this is "retrieval-augmented generation" in which we use a vector database to store documents and pull relevant text into a context that we provide the chatbot.

### Initiate the vector database