# Introduction

Pre-requisites for using this notebook

1. Pandas
2. [OpenAI](https://pypi.org/project/openai/)
3. [Tiktoken](https://pypi.org/project/tiktoken/)
4. [Dotenv](https://pypi.org/project/python-dotenv/)

Features

1. Load the [Functions](#chatgpt-functions)
2. [One Shot Question](#one-shot-question)
3. [List All Conversations History](#get-entire-chat-history) to Retrieve ID
4. [Search Conversation History](#search-history)
5.  [Continue Conversation by ID](#continue-chat-by-id)
6. [Summarize Conversation by ID](#summarize-conversation-by-id)
7. [Create Prompt Archive](#create-steering-sources-csv)
8. Get [Random Prompt](#random-steering-prompt)
9. [Check Token Size](#prompt-size-check) for a prompt string

TODO

- [ x ] Saved Persona List
- [ x ] Random Prompt
- [   ] New System Prompt
- Generate detailed description for DALL-E
- [ x ] Check Token size for History
- Update Continue Conversation to populate token size
- Warning when token size approaches limit
- Add Summary to Chat History ID
- Test Prompting for Sentiment

# ChatGPT

## Key Handling

In [8]:
import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv('CHATGPTAPI')
openai.organization = os.getenv('OPENAPI_ORG_ID')

## Create Empty Log Data Frame

In [77]:
import pandas as pd

columns = ['conversation_id', 'message', 'response','raw_message','raw_response','conversation_message','date']
chat_df = pd.DataFrame(columns=columns)
chat_df.to_pickle('chat_log.pkl')

## ChatGPT Functions

Prerequisites:
- OpenAI library
  - OpenAI ChatGPT API Key
  - OpenAI Organization ID
- dotenv library
- Pandas
- A pandas data frame to use as the log

The Pandas data frame log should have the following series:

`'conversation_id', 'message', 'response','raw_message','raw_response','conversation_message','date'`

In [4]:
import openai
import os
from dotenv import load_dotenv
from datetime import datetime
import time
import pandas as pd
import tiktoken

load_dotenv()
openai.api_key = os.getenv('CHATGPTAPI')
openai.organization = os.getenv('OPENAPI_ORG_ID')
MODEL = "gpt-3.5-turbo"

# Holds the foundational instructions for the AI. The role should be "system"
steering_prompt = [
        {"role": "system", "content": "You are a helpful assistant that provides direct clear responses with short explanations."}
    ]
STEERING_PROMPT = steering_prompt

def get_response(message_dict: dict) -> dict:
    """
    Function to call the ChatCompletion (ChatGPT) endpoint from OpenAI. Accepts a dict of messages.
    
    Stream is always set to false so we wait for a complete response instead of paging results.
    """
    response = openai.ChatCompletion.create(
        model=MODEL,
        messages=message_dict,
        temperature=0,
        stream=False
    )
    return response.to_dict()

# Get the most recent row for a chat_id value
def get_last_conversation(chat_id:str, chat_history:pd.DataFrame) -> list:
    """
    Function to test if a data frame log contains a specific chat ID. If it does it returns the most recent conversation list
    """
    verified_records = chat_history[chat_history['conversation_id'] == chat_id]
    if verified_records.empty:
        return None
    else:
        return verified_records.sort_values(by='date').iloc[-1]['conversation_message']

# Function  to append the new query to chat instruction list

def format_chat_message(message_history: list, new_prompt: str) -> list:
    """
    Function to append a new prompt with a list of previous prompts. New prompt is always appended with the `user` role.
    """
    message_history.append({"role": "user", "content": new_prompt})
    return message_history

    
def append_chat_df(log_dataframe:pd.DataFrame,
                   messages:list,
                   response:dict,
                   conv_id:str =None) -> pd.DataFrame:
    """
    Appends chat messages to a dataframe for the purpose of logging chat sessions.
    
    Parameters:
    log_dataframe: Pandas data frame that holds the log of messages
    messages: List that contains all prompts, including steering prompt
    response: Dictionary holding the OpenAI response
    conv_id: Conversation ID used to aggregrate chat into a thread. The default value is based on time stamp string.
    
    Returns:
    Data frame containing all previous log entries plus new entry from `messages` and `response`
    """
    if conv_id is None:
        conv_id = datetime.now().strftime("%Y%m%d%H%M%S")
    # The newest prompt will be the last itme in the list.
    
    new_prompt = messages[-1]['content']
    # The response will be the message.content value from choices
    new_response = response['choices'][0]['message']['content']
    # make a copy of the messages so we can append to it
    conversation = messages.copy()
    conversation.append({"role": "assistant", "content": new_response})
    # Create a new data frame to hold just the new log entry. Then concatenate it with the passed data frame.
    new_row = pd.DataFrame({'conversation_id':[conv_id], 'message': [new_prompt], 'response': [new_response],
                            'conversation_message': [conversation],
                            'raw_message':[messages], 'raw_response':[response], 
                            'date':[time.time()]})
    log_dataframe = pd.concat([log_dataframe, new_row], ignore_index=True)
    return log_dataframe

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

def num_tokens_from_messages(messages:list, model="gpt-3.5-turbo-0301") ->int:
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    if model == "gpt-3.5-turbo-0301":  # note: future models may deviate from this
        num_tokens = 0
        for message in messages:
            num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":  # if there's a name, the role is omitted
                    num_tokens += -1  # role is always required and always 1 token
        num_tokens += 2  # every reply is primed with <im_start>assistant
        return num_tokens
    else:
        raise NotImplementedError(f"""num_tokens_from_messages() is not presently implemented for model {model}.
See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")

## -> One Shot Question

Ask a stand alone question and get a succinct response. Markdown is rendered in the response.

In [None]:
from IPython.display import Markdown

df_chat = pd.read_pickle('chat_log.pkl')
new_prompt = r'''Describe how I can search Chembl using RDKit to find moleules that are schedule 1 substances

'''

chat_prompt = format_chat_message(steering_prompt, new_prompt)

# Print out the new question that we just appended
print(chat_prompt[-1]['content'])

# Call the API with the new chat prompt
new_chat = get_response(chat_prompt)

# Append with specified conversation ID
df_chat = append_chat_df(df_chat,chat_prompt,new_chat)

chat_id = df_chat.sort_values(by='date').iloc[-1]['conversation_id']
print(f'Chat ID = {chat_id}')
# Print the formatted response
display(Markdown(new_chat['choices'][0]['message']['content']))
#print(new_chat['choices'][0]['message']['content'])

# Persist our work
df_chat.to_pickle('chat_log.pkl')
df_chat.to_excel('df_chat.xlsx')



## New System Prompt

In [None]:
from IPython.display import Markdown

df_chat = pd.read_pickle('chat_log.pkl')
new_prompt = r'''How are controlled substances harmonized when transporting between countries?
'''
override_steering_prompt = r'''You are a PhD organic chemist. You work in the pharmaceutical industry as a research scientist. You have expertise in cheminformatics.
'''

#steering_prompt = STEERING_PROMPT.append({"role": "system", "content": f'{override_steering_prompt}'})

chat_prompt = format_chat_message(steering_prompt, new_prompt)

# Print out the new question that we just appended
print(chat_prompt[-1]['content'])

# Call the API with the new chat prompt
new_chat = get_response(chat_prompt)

# Append with specified conversation ID
df_chat = append_chat_df(df_chat,chat_prompt,new_chat)

chat_id = df_chat.sort_values(by='date').iloc[-1]['conversation_id']
print(f'Chat ID = {chat_id}')
# Print the formatted response
display(Markdown(new_chat['choices'][0]['message']['content']))
#print(new_chat['choices'][0]['message']['content'])

# Persist our work
df_chat.to_pickle('chat_log.pkl')
df_chat.to_excel('df_chat.xlsx')



## Continue Conversation

Have a multi-question chat with ChatGPT using a `conversation_id` from the chat log

### Test Case

Use a previous conversation as a test case for multi-response chat. `historic_chat` should provide context for ChatGPT to provide consistent follow up responses. It also allows for references to previous responses without quoting.

In [159]:

historic_chat = [{"role": "user", "content": r'I want you to act as a travel guide. I will write you my location and you will suggest a place to visit near my location. In some cases, I will also give you the type of places I will visit. You will also suggest me places of similar type that are close to my first location. My first suggestion request is "I am in Sorrento Italy and I want to visit pretty places to relax."'},
 {"role": "assistant", "content": r'Yes, there are several wineries near Positano that you can visit to taste local wines and learn about the winemaking process. Here are a few options:\n\nCantine Marisa Cuomo: This winery is located in the town of Furore, just a short drive from Positano. It\'s known for its unique wines, which are made using grapes grown on steep terraces overlooking the sea. The winery also offers tours and tastings, as well as a restaurant with stunning views of the Amalfi Coast.\nMontevetrano Winery: This winery is located in the town of San Cipriano Picentino, about an hour\'s drive from Positano. It\'s known for its high-quality wines, which are made using a blend of traditional Italian and French winemaking techniques. The winery also offers tours and tastings, as well as a beautiful outdoor terrace where you can enjoy a glass of wine while taking in the views of the vineyards.\nLe Vigne di Raito: This winery is located in the town of Raito, just a short drive from Positano. It\'s known for its organic wines, which are made using traditional winemaking techniques and local grape varieties. The winery also offers tours and tastings, as well as a beautiful terrace where you can relax and enjoy the views of the Mediterranean Sea.\nAll of these wineries offer a unique and memorable experience for wine lovers, and they\'re all located within a reasonable driving distance from Positano.\n'},
 {"role": "user", "content": r'Suggest places within walking distance of Sorrento town center'},
 {"role": "assistant", "content": r'Certainly! Here are some suggestions for places within walking distance of Sorrento town center:\n\nPiazza Tasso: This is the main square in Sorrento and is located right in the town center. Here you can find plenty of cafes, restaurants, and shops, as well as beautiful architecture and fountains. It\'s a great place to relax and people-watch.\nMarina Grande: This small fishing village is located just a short walk from Sorrento town center and is known for its colorful boats, charming restaurants, and beautiful sea views. Take a stroll along the harbor or enjoy a seafood meal at one of the local restaurants.\nVilla Comunale: This public park is located on a cliff overlooking the sea, just a short walk from Sorrento town center. Here you can enjoy stunning views of the Gulf of Naples, as well as beautiful gardens and a small outdoor theater.\nMuseo Correale di Terranova: This museum is located in a beautiful villa just a short walk from Sorrento town center and features a collection of art and decorative objects from the 17th to the 19th centuries. It\'s a great place to explore if you\'re interested in history and art.\nSorrento Cathedral: This beautiful cathedral is located in the heart of Sorrento town center and is known for its stunning frescoes and baroque architecture. It\'s a great place to visit if you\'re interested in art and architecture.\n'},
 {"role": "user", "content": r'Ok, I\'d like to visit the Marina from Sorrento center. Please provide walking directions'},
 {"role": "assistant", "content": r'Sure! Walking from Sorrento town center to Marina Grande is quite easy and will take you approximately 10-15 minutes. Here are the walking directions:\n\nHead south on Corso Italia, the main street in Sorrento town center, towards Piazza Tasso.\nOnce you reach Piazza Tasso, continue straight ahead on Via Luigi de Maio. You will pass by a few shops and restaurants on your left and right.\nKeep walking straight until you reach the end of Via Luigi de Maio, and then turn left onto Via Marina Grande. This street will lead you directly to the marina.\nFollow Via Marina Grande all the way down to the seafront, where you will find the small fishing village of Marina Grande.\nAlong the way, you will pass by some beautiful views of the Gulf of Naples and the Sorrento coastline. Once you arrive at Marina Grande, you can explore the harbor, take a stroll along the beach, or enjoy a meal at one of the many seafood restaurants.\n'}]

In [None]:
IPython.displayfrom  import Markdown
import html
df_chat = pd.read_pickle('chat_log.pkl')
new_prompt = 'Is there anywhere I can get gelato on the way to the Marina?'
chat_prompt = format_chat_message(historic_chat, new_prompt)

# Call the API with the new chat prompt
new_chat = get_response(chat_prompt)

# Append with specified conversation ID
df_chat = append_chat_df(df_chat,chat_prompt,new_chat)

chat_id = df_chat.sort_values(by='date').iloc[-1]['conversation_id']
print(f'Chat ID = {chat_id}')
# Print the response
display(Markdown(html.unescape(new_chat['choices'][0]['message']['content'])))

# Persist our work



df_chat.to_pickle('chat_log.pkl')
df_chat.to_excel('df_chat.xlsx')

### -> Continue Chat by ID

In [None]:
from IPython.display import Markdown
import pandas as pd

# Load the data frame log
df_chat = pd.read_pickle('chat_log.pkl')
new_prompt = """I don't think that's correct. The `filter` function accepts parameter assignments, not a dictionary.
"""

# Enter a conversation ID to instruct ChatGPT about the conversation history.
conversation_id = '20230309082116'

last_convo = get_last_conversation(conversation_id,df_chat)
if  last_convo == None:
        chat_prompt = format_chat_message(steering_prompt, new_prompt)
        conversation_id = None
else:
        chat_prompt = format_chat_message(last_convo, new_prompt)

# Print out the new question that we just appended
print(chat_prompt[-1]['content'])

# Call the API with the new chat prompt
new_chat = get_response(chat_prompt)

# Append with specified conversation ID
df_chat = append_chat_df(df_chat,chat_prompt,new_chat, conversation_id)

# Print the response
display(Markdown(new_chat['choices'][0]['message']['content'].replace('\\n', '\n')))

# Persist our work
df_chat.to_pickle('chat_log.pkl')
df_chat.to_excel('df_chat.xlsx')

# Search and View Results

## Get Entire Chat History

In [None]:
# Get List of Questions Asked
from IPython.display import Markdown
import pandas as pd
df_chat = pd.read_pickle('chat_log.pkl')

def display_questions(chat_history:pd.DataFrame):
    for index, row in chat_history.iterrows():
        display(Markdown(f'ID: {row["conversation_id"]} --> {row["message"]}'.replace('\\n', '\n')))


display_questions(df_chat)
    

## Search History

In [None]:
# Search History for String
import pandas as pd

df_chat = pd.read_pickle('chat_log.pkl')

def search_questions(search_phrase:str, chat_history:pd.DataFrame):
    search_mask = chat_history['message'].str.contains(search_phrase, case=False)
    matches = chat_history[search_mask]
    for index, row in matches.iterrows():
        display(Markdown(f'ID: {row["conversation_id"]} --> {row["message"]}'.replace('\\n', '\n')))

my_search = 'pandas'
search_questions(my_search, df_chat)


## Get Conversation by ID

In [None]:
from IPython.display import Markdown
import pandas as pd
df_chat = pd.read_pickle('chat_log.pkl')
chat_history_id = '20230305074049'

last_convo = get_last_conversation(chat_history_id,df_chat)
for response in last_convo:
    display(Markdown(f'{response["role"]}: {response["content"]}'.replace('\\n', '\n')))


## Summarize Conversation by ID

In [None]:
from IPython.display import Markdown
import pandas as pd
df_chat = pd.read_pickle('chat_log.pkl')
# 20230308100018
# 20230305185226
chat_history_id = '20230305185226'
summarize_prompt = 'Given the above conversation, please provide a short bullet list summary of the discussion with each bullet point no longer than 10 words. Provide the very first user question asked as the introduction to the bullet list'
last_convo = get_last_conversation(chat_history_id,df_chat)
chat_prompt = format_chat_message(last_convo, summarize_prompt)
new_chat = get_response(chat_prompt)
display(Markdown(new_chat['choices'][0]['message']['content'].replace('\\n', '\n')))



# Persona and Steering

## Create Steering Sources CSV

This only needs to be executed once. After creation new entries can be added manually or through a DF and resaved to CSV. The CSV file will remain as the permanent source.

In [46]:
## Create an empty DF
import pandas as pd

columns = ['description', 'role', 'prompt', 'category', 'tags', 'actor', 'source']
chat_steering_df = pd.DataFrame(columns=columns)
chat_steering_df.to_pickle('steering_prompts.pkl')

In [None]:
import pandas as pd
awesome_prompt_url = 'https://raw.githubusercontent.com/f/awesome-chatgpt-prompts/main/prompts.csv'

awesome_df = pd.read_csv(awesome_prompt_url, usecols=[0, 1], on_bad_lines='warn')
awesome_df.columns=['actor', 'prompt']
chat_steering_df = pd.concat([chat_steering_df, awesome_df], ignore_index=True)
chat_steering_df.loc[:, 'source'] = 'https://raw.githubusercontent.com/f/awesome-chatgpt-prompts/main/prompts.csv'
chat_steering_df.to_csv('steering_prompts.csv') # save as CSV so we can manually edit and add items as needed

chat_steering_df

## Read Prompt CSV to Data Frame

In [76]:
chat_steering_df = pd.read_csv("steering_prompts.csv")
chat_steering_df = chat_steering_df.astype('str')

## Search Steering Prompts

In [None]:
#chat_steering_df = pd.read_csv("steering_prompts.csv")
search_field = 'prompt'
search_word = 'excel'
exclude_category = 'template' #Template prompts have fields to fill in with additional information.
chat_steering_df = chat_steering_df.fillna('') #Fill with empty strings so we can conveniently search. Better to update and save the DF but this works too.
result_df = chat_steering_df[chat_steering_df[search_field].str.contains(search_word) & ~chat_steering_df['category'].str.contains(exclude_category)]
result_df

## Random Steering Prompt

In [None]:
from IPython.display import Markdown
exclude_term = 'template'
not_template_df = chat_steering_df.query('category != @exclude_term')

random_row = not_template_df.sample(n=1)
text = random_row["prompt"].values[0]
display(Markdown(f'{text}'))

# Prompt Size Check

In [128]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [None]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")


conversation_id = '20230308100018'

last_convo = get_last_conversation(conversation_id,df_chat)
print(num_tokens_from_messages(last_convo))
print(encoding)


# Scratch

## Interface

In [None]:
from IPython.display import Javascript
Javascript("Jupyter.notebook.execute_cells([2])")

<IPython.core.display.Javascript object>

In [None]:
from IPython.display import display, HTML
from IPython.core.getipython import get_ipython

def execute_cell(cell_index):
    shell = get_ipython()
    cells = shell.user_ns['get_ipython']().user_ns['In']
    result = shell.run_cell(cells[cell_index])
    return result
execute_cell(2)

In [None]:
from IPython.display import Javascript

# Replace 'my_unique_id' with the ID of the cell you want to execute
cell_id = 'abc1234567'

# Construct the JavaScript code to execute the cell
js_code = f"""
var output = null;
var callback = function (msg) {{
    if (msg.msg_type === "execute_result") {{
        output = msg.content.data["text/plain"];
    }}
}};
Jupyter.notebook.execute_cells_by_id(['{cell_id}'], {{'output': callback}});
output;
"""

# Execute the JavaScript code
results = Javascript(js_code)
display(results)


<IPython.core.display.Javascript object>

In [None]:
print(get_ipython().parent_header['msg_id'])
print("Hello")


## Read Streaming Response

In [None]:
import openai



MODEL = "gpt-3.5-turbo"

chat = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that provides direct clear responses."},
        {"role": "user", "content": "How large is your model?"}
    ],
    temperature=0,
    stream=True
)
#print(response['choices'][0]['message']['content'])
for response in chat:
    print(response)


## Access Response Value

In [None]:
import json

json_block = json.loads(str(df_chat['raw_response'][0]['choices'][0]))
print(str(df_chat['raw_response'][0]['choices'][0]['message']['content']))
print(df_chat['raw_message'][0][-1]['content'])
print(new_chat['choices'][0]['message']['content'])

## Check Types and Docs

Pretty handy in split view

In [18]:
print(type(results))
print(dir(results))

<class 'IPython.core.display.Javascript'>
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_check_data', '_data_and_metadata', '_read_flags', '_repr_javascript_', '_show_mem_addr', 'css', 'data', 'filename', 'lib', 'metadata', 'reload', 'url']


In [None]:

print(df_chat['messages'][0])
# Append the response from this query as a new element.
df_chat['messages'][0].append(str(df_chat['response'][0]['choices'][0]))
print(df_chat['messages'][0])

#json_message = json.loads(str(df_chat['response'][1]['choices'][0]))
#print(json_block['message']['content'])

# GPT

Not functional and not a priority

In [None]:
# Define a function to get the chatbot's response
def get_response(prompt):
    response = openai.Completion.create(
        model=MODEL,
        prompt=prompt,
        max_tokens=1024,
        n=1,
        stop=None,
        temperature=0.5,
    )
    return response.choices[0].text.strip()

# Start the chat interface
while True:
    user_input = input("You: ")
    transcript_file.write("You: " + user_input + "\n")
    prompt = "Me: " + user_input + "\nYou:"
    response = get_response(prompt)
    print("Me:", response)
    pyperclip.copy(response) # copy the response to clipboard
    transcript_file.write("Me: " + response + "\n")

    if user_input.lower() == "exit":
        break

# Close the transcript file
transcript_file.close()