# Power your products with ChatGPT and your own data using Notion Reader API

This is a walkthrough taking readers through how to build starter Q&A and Chatbot applications using the ChatGPT API and their own data. 

It is laid out in these sections:
- **Setup:** 
    - Initiate variables and source the data
- **Lay the foundations:**
    - Load notion documents into LlamaIndex for vectorizing
    - Save documents in Redis
- **Make it a product:**
    - Add a retrieval step where users provide queries and we return the most relevant entries
    - Summarise search results with GPT-3
    - Test out this basic Q&A app in Streamlit
- **Build your moat:**
    - Create an Assistant class to manage context and interact with our bot
    - Use the Chatbot to answer questions using semantic search context
    - Test out this basic Chatbot app in Streamlit


In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
!{sys.executable} -m pip install -r ./../requirements.txt

## Setup

First we'll setup our libraries and environment variables

In [None]:
import openai
import pandas as pd

from database import get_redis_connection

# Set our default models and chunking size
from config import COMPLETIONS_MODEL, CHAT_MODEL

# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ImportWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning) 

In [None]:
pd.set_option('display.max_colwidth', 0)

## Laying the foundations

### Storage

We're going to use Redis as our database for both document contents and the vector embeddings. You will need the full Redis Stack to enable use of Redisearch, which is the module that allows semantic search - more detail is in the [docs for Redis Stack](https://redis.io/docs/stack/get-started/install/docker/).

To set this up locally, you will need to install Docker and then run the following command: ```docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest```.

The code used here draws heavily on [this repo](https://github.com/RedisAI/vecsim-demo).

After setting up the Docker instance of Redis Stack, you can follow the below instructions to initiate a Redis connection and create a Hierarchical Navigable Small World (HNSW) index for semantic search.

In [None]:
# Setup Redis and running?
from database import get_redis_connection

redis_client = get_redis_connection()

redis_client.ping()

In [None]:
# Optional step to drop the index if it already exists
from config import INDEX_NAME

# redis_client.ft(INDEX_NAME).dropindex()

### Ingestion

We'll load up our PDF pages into documents

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [None]:
from llama_index import NotionPageReader, SimpleDirectoryReader
from config import NOTION_API_KEY
import os

# integration_token = os.getenv("NOTION_INTEGRATION_TOKEN")
# integration_token = NOTION_API_KEY

# page_ids = ["76d816d82434423d8fbec83a3979d245", #AI Knowledge Base
#             "40b801917ab04deb8f5759d9d3e2da59", #The Chicago Office
#             "64c61657f90b48f786e8b55098f26e3a", #Denver Lightning Talks
#             "642768dbfd6041e699a24b2863fab5b2", #Denver IRL Agenda
#             "9f45258c6cab4592badeec6f1060e5df", #Pairing
#             "c4e0d82f59d444c486066205919e2088", #Why we do what we do
#             "5b25eda9934745fa8e9dd4bbf131eabc", #Chicago IRL Logistics
#             "4747650f9fd74f9b9d43e817963d6759", #Denver IRL Logistics
#             "53f3ff1456ea4c298e70281902080d9f", #Pairing Interview
#             "f621f275876945dcab7a298d21fb95c4", #Denver Family Friend Day
#             "2bcdac9fedd14404ba85a47323f5a1ad", #Tech Lead
#             "87d80ebf66c64422a21570b5c2d0b0bc", #Daily Company Stand up
#             "a255954246c44ab1a6178f54f1095b41", #Pair Retros
#             "0050c37a71464a73ab77e01a5ab5a76d", #Project Rotations
#             "6977e3c091384b1abe1f1692fce9995f", #2023 Strategy
#             "ff2478fce704496d85462d8c8656f074", #The Denver Office
#             "d30c87af7d60458aaf7fb412d422a691", #2023 Chicago IRL
#             "690037a4c2a64688b43303bc4d2a65d0", #Product Design Lunch
#             "5ade11d0a39342129580364d35106eee", #Ski Weekend
#             # "50901371968b4dbb9423186c7cd392dc", #Software Development - Issues?
#             "dc8864d18f7c4c1595c2e278d36743c4", #So you want to be a TPIer?
#             # "447552ca7fec4e2fa568cb918d166c42", #Anchors - Issues?
#             "94f952deb00740929e9b526f93609c46", #Denver activities and meals
#             "8ea1eccd201842c28f9cd709b95754a1"] #Chicago IRL Agenda
#
# documents = NotionPageReader(integration_token=integration_token).load_data(page_ids=page_ids)

documents = SimpleDirectoryReader(input_dir="/Users/skainec/Documents/Workspace/aiknowledgehubdemo/data").load_data()
# index = VectorStoreIndex.from_documents(data)

In [None]:
from transformers import sanitize_text

for document in documents:
    document.text = sanitize_text(document.text)
    print(document.text)

In [None]:
documents[0].text

In [None]:
from llama_index import GPTVectorStoreIndex
from llama_index.vector_stores import RedisVectorStore
from config import OPENAI_API_KEY, INDEX_NAME, PREFIX
from llama_index.storage.storage_context import StorageContext
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

vector_store = RedisVectorStore(
    index_name=INDEX_NAME,
    index_prefix=PREFIX,
    redis_url="redis://localhost:6379",
    overwrite=True,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

vector_store.persist(persist_path="")

In [None]:
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()
response = query_engine.query("What is the address of the Chicago office?")

In [None]:
response

## Build your moat

The Q&A was useful, but fairly limited in the complexity of interaction we can have - if the user asks a sub-optimal question, there is no assistance from the system to prompt them for more info or conversation to lead them down the right path.

For the next step we'll make a Chatbot using the Chat Completions endpoint, which will:
- Be given instructions on how it should act and what the goals of its users are
- Be supplied some required information that it needs to collect
- Go back and forth with the customer until it has populated that information
- Say a trigger word that will kick off semantic search and summarisation of the response

For more details on our Chat Completions endpoint and how to interact with it, please check out the docs [here](https://platform.openai.com/docs/guides/chat).

### Framework

This section outlines a basic framework for working with the API and storing context of previous conversation "turns". Once this is established, we'll extend it to use our retrieval endpoint.

In [None]:
# A basic example of how to interact with our ChatCompletion endpoint
# It requires a list of "messages", consisting of a "role" (one of system, user or assistant) and "content"
from llama_index import StorageContext, load_index_from_storage

question = 'How can you help me'

new_vector_store = RedisVectorStore(
    index_name=INDEX_NAME,
    index_prefix=PREFIX,
    redis_url="redis://localhost:6379"
)

new_storage_context = StorageContext.from_defaults(vector_store=new_vector_store)
new_index = GPTVectorStoreIndex([], storage_context=new_storage_context)

query_engine = new_index.as_query_engine()

In [None]:
response = query_engine.query("What is the 2023 strategy?")
response.response

In [None]:
response

In [None]:
from termcolor import colored

# A basic class to create a message as a dict for chat
class Message:
    
    
    def __init__(self,role,content):
        
        self.role = role
        self.content = content
        
    def message(self):
        
        return {"role": self.role,"content": self.content}
        
# Our assistant class we'll use to converse with the bot
class Assistant:
    
    def __init__(self):
        self.conversation_history = []

    def _get_assistant_response(self, prompt):
        
        try:
            completion = openai.ChatCompletion.create(
              model="gpt-3.5-turbo",
              messages=prompt
            )
            
            response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])
            return response_message.message()
            
        except Exception as e:
            
            return f'Request failed with exception {e}'

    def ask_assistant(self, next_user_prompt, colorize_assistant_replies=True):
        [self.conversation_history.append(x) for x in next_user_prompt]
        assistant_response = self._get_assistant_response(self.conversation_history)
        self.conversation_history.append(assistant_response)
        return assistant_response
            
        
    def pretty_print_conversation_history(self, colorize_assistant_replies=True):
        for entry in self.conversation_history:
            if entry['role'] == 'system':
                pass
            else:
                prefix = entry['role']
                content = entry['content']
                output = colored(prefix +':\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\n' + content
                print(output)

In [None]:
# Initiate our Assistant class
conversation = Assistant()

# Create a list to hold our messages and insert both a system message to guide behaviour and our first user question
messages = []
system_message = Message('system','You are a helpful business assistant who has innovative ideas')
user_message = Message('user','What can you do to help me')
messages.append(system_message.message())
messages.append(user_message.message())
messages

In [None]:
# Get back a response from the Chatbot to our question
response_message = conversation.ask_assistant(messages)
print(response_message['content'])

In [None]:
next_question = 'Tell me more about option 2'

# Initiate a fresh messages list and insert our next question
messages = []
user_message = Message('user',next_question)
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
print(response_message['content'])

In [None]:
# Print out a log of our conversation so far

conversation.pretty_print_conversation_history()

### Knowledge retrieval

Now we'll extend the class to call a downstream service when a stop sequence is spoken by the Chatbot.

The main changes are:
- The system message is more comprehensive, giving criteria for the Chatbot to advance the conversation
- Adding an explicit stop sequence for it to use when it has the info it needs
- Extending the class with a function ```_get_search_results``` which sources Redis results

In [None]:
from config import OPENAI_API_KEY
openai.api_key = OPENAI_API_KEY

# Updated system prompt requiring Question and Year to be extracted from the user
system_prompt = '''
You are a helpful virtual assistant for the employees of Focused Labs. Focused Labs is a boutique Software Consulting firm that specializes in enterprise application development and digital transformation. Employees will ask you questions about the inner workings of the company. Questions could range in areas such as process, procedure, policy, and culture. Employees have different roles. The roles are either Developer, Designer, or Product Manager. The question is about how the company of Focused Labs operates. For each question, you need to capture their role.
If they haven't provided their role, ask them for it.
Once you have their role, say "let me check on that for you...".

Example 1:

User: I'd like to know how many IRLs Focused Labs has hosted

Assistant: Certainly, what is your role at the Company?

User: I am a designer.

Assistant: let me check on that for you...
'''

# New Assistant class to add a vector database call to its responses
class RetrievalAssistant:
    
    def __init__(self):
        self.conversation_history = []  

    def _get_assistant_response(self, prompt):
        
        try:
            completion = openai.ChatCompletion.create(
              model=CHAT_MODEL,
              messages=prompt,
              temperature=0.1
            )
            
            response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])
            return response_message.message()
            
        except Exception as e:
            
            return f'Request failed with exception {e}'
    
    # The function to retrieve Redis search results
    def _get_search_results(self,prompt):
        latest_question = prompt
        search_content = get_redis_results(redis_client,latest_question,INDEX_NAME)['result'][0]
        return search_content
        

    def ask_assistant(self, next_user_prompt):
        [self.conversation_history.append(x) for x in next_user_prompt]
        assistant_response = self._get_assistant_response(self.conversation_history)
        
        # Answer normally unless the trigger sequence is used "searching_for_answers"
        if 'let me check on that for you...' in assistant_response['content'].lower():
            question_extract = openai.Completion.create(model=COMPLETIONS_MODEL,prompt=f"Extract the employees' latest question and their role from this conversation: {self.conversation_history}. Extract it as a sentence stating their question question and their role")
            search_result = self._get_search_results(question_extract['choices'][0]['text'])
            
            # We insert an extra system prompt here to give fresh context to the Chatbot on how to use the Redis results
            # In this instance we add it to the conversation history, but in production it may be better to hide
            self.conversation_history.insert(-1,{"role": 'system',"content": f"Answer the user's question using this content: {search_result}. If you cannot answer the question, say 'Sorry, I don't know the answer to this one. You should call Austin Vance at (970) 306-8100' and he will be happy to provide an answer. He is easiest to reach between the hours of 2am and 4am MDT"})
            #[self.conversation_history.append(x) for x in next_user_prompt]
            
            assistant_response = self._get_assistant_response(self.conversation_history)
            print(next_user_prompt)
            print(assistant_response)
            self.conversation_history.append(assistant_response)
            return assistant_response
        else:
            self.conversation_history.append(assistant_response)
            return assistant_response
            
        
    def pretty_print_conversation_history(self, colorize_assistant_replies=True):
        for entry in self.conversation_history:
            if entry['role'] == 'system':
                pass
            else:
                prefix = entry['role']
                content = entry['content']
                output = colored(prefix +':\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\n' + content
                #prefix = entry['role']
                print(output)

In [None]:
conversation = RetrievalAssistant()
messages = []
system_message = Message('system',system_prompt)
user_message = Message('user','What is a Focused Labs IRL?')
messages.append(system_message.message())
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
response_message

In [None]:
messages = []
user_message = Message('user','I am a designer!')
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
#response_message

In [None]:
conversation.pretty_print_conversation_history()

### Chatbot

Now we'll put all this into action with a real (basic) Chatbot.

In the directory containing this app, execute ```streamlit run chat.py```. This will open up a Streamlit app in your browser where you can ask questions of your embedded data. 

__Example Questions__:
- what is the cost cap for a power unit in 2023
- what should competitors include on their application form
- how can a competitor be disqualified

### Consolidation

Over the course of this notebook you have:
- Laid the foundations of your product by embedding our knowledge base
- Created a Q&A application to serve basic use cases
- Extended this to be an interactive Chatbot

These are the foundational building blocks of any Q&A or Chat application using our APIs - these are your starting point, and we look forward to seeing what you build with them!