Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Semantic Search #33

Merged
merged 29 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b97022a
wip
merefield Aug 20, 2023
05dee28
add semantic search function, post event embedding maintenance
merefield Aug 21, 2023
47722b9
move narrative strings to localisation file, rubocop
merefield Aug 21, 2023
a580570
move prompt text from agent to localisation
merefield Aug 21, 2023
688579c
improve wikipedia output to include link to source page
merefield Aug 21, 2023
9842864
experimental improvement to forum search
merefield Aug 21, 2023
6a6045a
Dont embed posts hidden to basic users
merefield Aug 22, 2023
0bec60d
add missing only option to embeddings rake task, remove load of rake …
merefield Aug 23, 2023
f35012b
uprate vector search accuracy
merefield Aug 24, 2023
2c8b18a
uprate vector search accuracy
merefield Aug 24, 2023
4a80d93
improve data in post search response
merefield Aug 24, 2023
8136537
merge in main
merefield Aug 24, 2023
ad1a83c
improve bot class inheritance
merefield Aug 24, 2023
9af3b1c
rubocop
merefield Aug 24, 2023
46872a8
walk back name change
merefield Aug 24, 2023
3470fef
expand github ci workflow to install pg_embeddings
merefield Aug 24, 2023
f06311e
fix workflow command
merefield Aug 24, 2023
79db1b6
fix yml indenting for workflow
merefield Aug 24, 2023
4dd96de
add postgres repo to apt sources for github workflow
merefield Aug 24, 2023
070cb94
fix locale key
merefield Aug 24, 2023
a2a45e8
rename embeddings table to better reflect granularity
merefield Aug 26, 2023
d0e66ad
move constants to plugin.rb
merefield Aug 26, 2023
72ed6b9
remove explicit repo source
merefield Aug 26, 2023
55d8306
rename post embedding process
merefield Aug 26, 2023
a210a2d
streamline client setup code
merefield Aug 26, 2023
5bdc646
fix workflow by adding correct pg_config path for current ver
merefield Aug 27, 2023
8b6b404
fix search query for new table name
merefield Aug 27, 2023
76a6e91
Add long prompt capable function calling models
merefield Aug 28, 2023
281ded4
Add optional post count parameter to forum search queries
merefield Aug 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/plugin-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@ jobs:
sudo -E -u postgres script/start_test_db.rb
sudo -u postgres psql -c "CREATE ROLE $PGUSER LOGIN SUPERUSER PASSWORD '$PGPASSWORD';"

- name: Install pg_embeddings
run: |
sudo apt-get update
sudo apt-get -y install -y postgresql-server-dev-13
git clone https://github.com/neondatabase/pg_embedding.git
cd pg_embedding
make PG_CONFIG=/usr/lib/postgresql/13/bin/pg_config
make PG_CONFIG=/usr/lib/postgresql/13/bin/pg_config install

- name: Bundler cache
uses: actions/cache@v3
with:
Expand Down
18 changes: 18 additions & 0 deletions app/jobs/regular/chatbot_post_embedding_delete_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

# Job is triggered on a Post destruction.
class ::Jobs::ChatbotPostEmbeddingDeleteJob < Jobs::Base
sidekiq_options retry: false

def execute(opts)
begin
post_id = opts[:id]

::DiscourseChatbot.progress_debug_message("101. Deleting a Post Embedding for Post id: #{post_id}")

::DiscourseChatbot::PostEmbedding.find_by(post_id: post_id).destroy!
rescue => e
Rails.logger.error ("OpenAIBot Post Embedding: There was a problem, but will retry til limit: #{e}")
end
end
end
20 changes: 20 additions & 0 deletions app/jobs/regular/chatbot_post_embedding_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# frozen_string_literal: true

# Job is triggered on an update to a Post.
class ::Jobs::ChatbotPostEmbeddingJob < Jobs::Base
sidekiq_options retry: 5, dead: false

def execute(opts)
begin
post_id = opts[:id]

::DiscourseChatbot.progress_debug_message("100. Creating/updating a Post Embedding for Post id: #{post_id}")

post_embedding = ::DiscourseChatbot::PostEmbeddingProcess.new

post_embedding.upsert_embedding(post_id)
rescue => e
Rails.logger.error ("OpenAIBot Post Embedding: There was a problem, but will retry til limit: #{e}")
end
end
end
7 changes: 7 additions & 0 deletions app/models/embedding.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# frozen_string_literal: true

class ::DiscourseChatbot::PostEmbedding < ActiveRecord::Base
self.table_name = 'chatbot_post_embeddings'

validates :post_id, presence: true, uniqueness: true
end
89 changes: 89 additions & 0 deletions config/locales/server.en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,95 @@ en:
title: "The subject of this conversation is %{topic_title}"
first_post: "The first thing someone said was %{username} who said %{raw}"
post: "%{username} said %{raw}"
function:
calculator:
description: |
Useful for getting the result of a math expression. It is a general purpose calculator. It works with Ruby expressions.

You can retrieve the current date from it too and using the core Ruby Time method to calculate dates.

The input to this tool should be a valid mathematical expression that could be executed by the base Ruby programming language with no extensions.

Be certain to prefix any functions with 'Math.'

Usage:
Action Input: 1 + 1
Action Input: 3 * 2 / 4
Action Input: 9 - 7
Action Input: Time.now - 2 * 24 * 60 * 60
Action Input: Math.cbrt(13) + Math.cbrt(12)
Action Input: Math.sqrt(8)
Action Input: (4.1 + 2.3) / (2.0 - 5.6) * 3
parameters:
input: the mathematical expression you need to process and get the answer to. Make sure it is Ruby compatible.
error: "'%{parameter}' is an invalid mathematical expression, make sure if you are trying to calculate dates use Ruby Time class"
forum_search:
description: |
Search the local forum for information that may help you answer the question. Especially useful when the forum specialises in the subject matter of the query.
Searching the local forum is preferable to searching google or the internet and should be considered higher priority. It is quicker and cheaper.

Input should be a search query. You can optionally also specify the number of posts you wish returned from your query.

Outputs text from the Post and a url link to it you can provide the user. When presenting the url in your reply, do not embed in an anchor, just write the straight link.
parameters:
query: "search query for looking up information on the forum"
number_of_posts: "specify the number of posts you want returned from your query"
answer_summary: "The top %{number_of_posts} posts on the forum related to this query are, best match first:\n\n"
answer: "Number %{rank}: the post is at this web address: %{url}, it was written by '%{username}' on %{date} and the text is '%{raw}'.\n\n"
error: "'%{query}': my search for this on the forum failed."
google_search:
description: |
A wrapper around Google Search.

Useful for when you need to answer questions about current events.
Always one of the first options when you need to find information on internet.

Input should be a search query.
parameters:
query: "search query for looking up information on the internet"
error: "%{query}: my search for this on the internet failed."
news:
description: |
A wrapper around the News API.

Useful for when you need to answer questions about current events in the news, current events or affairs.

Input should be a search query and a date from which to search news, so if the request is today, the search should be for todays date
parameters:
query: "query string for searching current news and events"
start_date: "start date from which to search for news in format YYYY-MM-DD"
answer: "The latest news about this is: "
error: "ERROR: Had trouble retrieving the news!"
stock_data:
description: |
An API for MarketStack stock data. You need to call it using the stock ticker. You can optionally also provide a specific date.
parameters:
ticker: "ticker for share or stock query"
date: "date for data in format YYYY-MM-DD"
answer: "Ticker %{ticker} had a day close of %{close} on %{date}, with a high of %{high} and a low of %{low}"
error: "ERROR: Had trouble retrieving information from Market Stack for stock market information!"
wikipedia:
description: |
A wrapper around Wikipedia.

Useful for when you need to answer general questions about
people, places, companies, facts, historical events, or other subjects.

Input should be a search query
parameters:
query: "query string for wikipedia search"
answer: "The relevant wikipedia page has the following summary: '%{summary}' and the article can be found at this url link: %{url}"
error: "ERROR: Had trouble retrieving information from Wikipedia!"
agent:
handle_function_call:
answer: "The answer is %{result}."
call_function:
error: "There was something wrong with your function arguments"
final_thought_answer:
opener: "To answer the question I will use these step by step instructions.\n\n"
thought_declaration: "I will use the %{function_name} function to calculate the answer with arguments %{arguments}.\n\n"
final_thought: "%{thoughts} Based on the above, I will now answer the question, this message will only be seen by me so answer with the assumption with that the user has not seen this message."

errors:
general: "Sorry, I'm not well right now. Lets talk some other time. Meanwhile, please ask the admin to check the logs, thank you!"
retries: "I've tried working out a response for you several times, but ultimately failed. Please contact the admin if this persists, thank you!"
Expand Down
6 changes: 4 additions & 2 deletions config/settings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,13 @@ plugins:
default: gpt-3.5-turbo
choices:
- gpt-3.5-turbo
- gpt-3.5-turbo-16k
- gpt-3.5-turbo-0613
- gpt-3.5-turbo-16k
- gpt-3.5-turbo-16k-0613
- gpt-4
- gpt-4-32k
- gpt-4-0613
- gpt-4-32k
- gpt-4-32k-0613
chatbot_reply_job_time_delay:
client: false
default: 3
Expand Down
18 changes: 18 additions & 0 deletions db/migrate/20230820010101_enable_embedding_extension.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

class EnableEmbeddingExtension < ActiveRecord::Migration[7.0]
def change
begin
enable_extension :embedding
rescue Exception => e
if DB.query_single("SELECT 1 FROM pg_available_extensions WHERE name = 'embedding';").empty?
STDERR.puts "----------------------------DISCOURSE CHATBOT ERROR----------------------------------"
STDERR.puts " Discourse Chatbot now requires the embedding extension on the PostgreSQL database."
STDERR.puts " Run a `./launcher rebuild app` to fix it on a standard install."
STDERR.puts " Alternatively, you can remove Discourse Chatbot to rebuild."
STDERR.puts "----------------------------DISCOURSE CHATBOT ERROR----------------------------------"
end
raise e
end
end
end
11 changes: 11 additions & 0 deletions db/migrate/20230820010103_create_chatbot_embeddings_table.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# frozen_string_literal: true

class CreateChatbotEmbeddingsTable < ActiveRecord::Migration[7.0]
def change
create_table :chatbot_embeddings do |t|
t.integer :post_id, null: false, index: { unique: true }, foreign_key: true
t.column :embedding, "real[]", null: false
t.timestamps
end
end
end
16 changes: 16 additions & 0 deletions db/migrate/20230820010105_create_chatbot_embeddings_index.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# frozen_string_literal: true

class CreateChatbotEmbeddingsIndex < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE INDEX hnsw_index_on_chatbot_embeddings ON chatbot_embeddings USING hnsw(embedding)
WITH (dims=1536, m=64, efconstruction=64, efsearch=64);
SQL
end

def down
execute <<-SQL
DROP INDEX hnsw_index_on_chatbot_embeddings;
SQL
end
end
13 changes: 13 additions & 0 deletions db/migrate/20230826010101_rename_chatbot_embeddings_table.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

# frozen_string_literal: true

class RenameChatbotEmbeddingsTable < ActiveRecord::Migration[7.0]
def change
begin
Migration::SafeMigrate.disable!
rename_table :chatbot_embeddings, :chatbot_post_embeddings
ensure
Migration::SafeMigrate.enable!
end
end
end
7 changes: 7 additions & 0 deletions db/migrate/20230826010103_rename_chatbot_embeddings_index.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# frozen_string_literal: true

class RenameChatbotEmbeddingsIndex < ActiveRecord::Migration[7.0]
def change
rename_index :chatbot_post_embeddings, 'hnsw_index_on_chatbot_embeddings', 'hnsw_index_on_chatbot_post_embeddings'
end
end
22 changes: 12 additions & 10 deletions lib/discourse_chatbot/bots/open_ai_agent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@

module ::DiscourseChatbot

class OpenAIAgent < Bot
class OpenAIAgent < OpenAIBotBase

def initialize
super

@model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model

calculator_function = ::DiscourseChatbot::CalculatorFunction.new
wikipedia_function = ::DiscourseChatbot::WikipediaFunction.new
news_function = ::DiscourseChatbot::NewsFunction.new
google_search_function = ::DiscourseChatbot::GoogleSearchFunction.new
forum_search_function = ::DiscourseChatbot::ForumSearchFunction.new
stock_data_function = ::DiscourseChatbot::StockDataFunction.new
functions = [calculator_function, wikipedia_function]

functions = [calculator_function, wikipedia_function, forum_search_function]

functions << news_function if !SiteSetting.chatbot_news_api_token.blank?
functions << google_search_function if !SiteSetting.chatbot_serp_api_key.blank?
Expand Down Expand Up @@ -106,7 +106,7 @@ def handle_function_call(res)
func_name = first_message["function_call"]["name"]
args_str = first_message["function_call"]["arguments"]
result = call_function(func_name, args_str)
res_msg = { 'role' => 'assistant', 'content' => "The answer is #{result}." }
res_msg = { 'role' => 'assistant', 'content' => I18n.t("chatbot.prompt.agent.handle_function_call.answer", result: result) }
@internal_thoughts << res_msg
end

Expand All @@ -121,24 +121,26 @@ def call_function(func_name, args_str)
func = @func_mapping[func_name]
res = func.process(args)
res
rescue
"There was something wrong with your function arguments"
rescue
I18n.t("chatbot.prompt.agent.call_function.error")
end
end

def final_thought_answer
thoughts = "To answer the question I will use these step by step instructions.\n\n"
thoughts = I18n.t("chatbot.prompt.agent.final_thought_answer.opener")
@internal_thoughts.each do |thought|
if thought.key?('function_call')
thoughts += "I will use the #{thought['function_call']['name']} function to calculate the answer with arguments #{thought['function_call']['arguments']}.\n\n"
thoughts += I18n.t("chatbot.prompt.agent.final_thought_answer.thought_declaration", function_name: thought['function_call']['name'], arguments: thought['function_call']['arguments'])
else
thoughts += "#{thought['content']}\n\n"
end
end

final_thought = {
'role' => 'assistant',
'content' => "#{thoughts} Based on the above, I will now answer the question, this message will only be seen by me so answer with the assumption with that the user has not seen this message."
'content' => I18n.t("chatbot.prompt.agent.final_thought_answer.final_thought", thoughts: thoughts)
}

final_thought
end

Expand Down
6 changes: 2 additions & 4 deletions lib/discourse_chatbot/bots/open_ai_bot.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

module ::DiscourseChatbot

class OpenAIBot < Bot
class OpenAIBot < OpenAIBotBase

def initialize
super
Expand All @@ -13,11 +13,9 @@ def get_response(prompt)
system_message = { "role": "system", "content": I18n.t("chatbot.prompt.system.basic") }
prompt.unshift(system_message)

model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model

response = @client.chat(
parameters: {
model: model_name,
model: @model_name,
messages: prompt,
max_tokens: SiteSetting.chatbot_max_response_tokens,
temperature: SiteSetting.chatbot_request_temperature / 100.0,
Expand Down
31 changes: 31 additions & 0 deletions lib/discourse_chatbot/bots/open_ai_bot_base.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# frozen_string_literal: true
require "openai"

module ::DiscourseChatbot

class OpenAIBotBase < Bot
def initialize
::OpenAI.configure do |config|
config.access_token = SiteSetting.chatbot_open_ai_token
end
if !SiteSetting.chatbot_open_ai_model_custom_url.blank?
::OpenAI.configure do |config|
config.uri_base = SiteSetting.chatbot_open_ai_model_custom_url
end
end
if SiteSetting.chatbot_open_ai_model_custom_api_type == "azure"
::OpenAI.configure do |config|
config.api_type = :azure
config.api_version = SiteSetting.chatbot_open_ai_model_custom_api_version
end
end
@client = ::OpenAI::Client.new
@model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model
end

def get_response(prompt)
raise "Overwrite me!"
end

end
end