# Azure OpenAI Service - Q&A with semantic answering Quickstart app

This notebook helps you to build a simple Q&A demo application by doing the following steps

1. Data preparation - you will need to adapt this code to have it work with your data
1. Embedding creation - this will mostly work out of the box
1. Prompt creation - this will mostly work out of the box, but you could adapt this a little bit
1. App creation - this will mostly work out of the box, but you can make changes if needed

Firstly, create a file called `.env` in this folder, and add the following content, obviously with your values:

```
OPENAI_API_KEY=xxxxxx
OPENAI_API_BASE=https://xxxxxxx.openai.azure.com/
```

Then, let's install all dependencies:

In [23]:
!pip install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable


In [31]:
import os
import json
import tiktoken
import openai
import numpy as np
import pickle
from dotenv import load_dotenv
from openai.embeddings_utils import cosine_similarity
from tenacity import retry, wait_random_exponential, stop_after_attempt

# Load environment variables
load_dotenv('../.env')

# Configure Azure OpenAI Service API
openai.api_type = "azure"
openai.api_version = os.getenv('OPENAI_OPENAI_API_API_VERSION', "2022-12-01")
OPENAI_API_BASE=openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")

# Define embedding model and encoding
EMBEDDING_MODEL = os.getenv('OPENAI_EMBEDDING_MODEL', 'text-embedding-ada-002')
EMBEDDING_ENCODING = os.getenv('OPENAI_EMBEDDING_ENCODING', 'cl100k_base')
EMBEDDING_CHUNK_SIZE = os.getenv('OPENAI_EMBEDDING_CHUNK_SIZE',)
COMPLETION_MODEL = os.getenv('OPENAI_COMPLETION_MODEL', 'text-davinci-003')

# initialize tiktoken for encoding text
encoding = tiktoken.get_encoding(EMBEDDING_ENCODING)

params_gathered = dict(
    EMBEDDING_MODEL=EMBEDDING_MODEL,
    EMBEDDING_ENCODING=EMBEDDING_ENCODING,
    EMBEDDING_CHUNK_SIZE=EMBEDDING_CHUNK_SIZE,
    COMPLETION_MODEL=COMPLETION_MODEL,
    OPENAI_OPENAI_API_API_VERSION=openai.api_version ,
    OPENAI_API_BASE=OPENAI_API_BASE
)
for key, val in params_gathered.items():
    print(key, val)

EMBEDDING_MODEL gpt-35-turbo-16k
EMBEDDING_ENCODING cl100k_base
EMBEDDING_CHUNK_SIZE 8000
COMPLETION_MODEL gpt-35-turbo
OPENAI_OPENAI_API_API_VERSION 2022-12-01
OPENAI_API_BASE https://neuronvisionws1.openai.azure.com/


## Data preparation

Adapt this code to read in our data, the output should be an Python array with dicts inside, containing the keys filename, text

In [25]:
# list all files in the data
data_dir = os.path.join(os.getcwd(), "../data/qna/")
files = os.listdir(data_dir)

# read content from each file and append it to documents
documents = []
for file in files:
    with open(os.path.join(data_dir, file), "r") as f:
        # read the content from the txt file
        content = f.read()
        documents.append({
            "filename": file,
            "content": content,
        })

# print some stats about the documents
print(f"Loaded {len(documents)} documents")
for doc in documents:
    num_tokens = len(encoding.encode(doc['content']))
    print(f"Filename: {doc['filename']} Content: {doc['content'][:80]}... \n---> Tokens: {num_tokens}\n")

Loaded 3 documents
Filename: overview_translator.txt Content: 
# What is Azure Cognitive Services Translator?

Translator Service is a cloud-b... 
---> Tokens: 745

Filename: overview_openai.txt Content: 
# What is Azure OpenAI?

The Azure OpenAI service provides REST API access to O... 
---> Tokens: 1912

Filename: overview_clu.txt Content: 
# What is conversational language understanding?

Conversational language under... 
---> Tokens: 1344



Let's create the function to embed a single document:

In [32]:
try_me  = openai.Embedding.create(input='embedd this text', engine=EMBEDDING_MODEL)


InvalidRequestError: The embeddings operation does not work with the specified model, gpt-35-turbo-16k. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.

In [None]:
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(0))
def get_embedding(text):
    # remove newlines and double spaces
    text = text.replace("\n", " ").replace("  ", " ")
    return ["data"][0]["embedding"]

In [29]:
# Create embeddings for all docs
for doc in documents:
    doc['embedding'] = get_embedding(doc['content'])
    print(f"Created embedding for {doc['filename']}")
    
# Save documents to disk
pickle.dump(documents, open("documents.pkl", "wb"))

RetryError: RetryError[<Future at 0x7feeab58cdc0 state=finished raised InvalidRequestError>]

Lastly, run the app:

```
streamlit run app.py
```