Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Semantic Search #33

Merged
merged 29 commits into from Aug 28, 2023
Merged

FEATURE: Semantic Search #33

merged 29 commits into from Aug 28, 2023

Conversation

merefield
Copy link
Owner

@merefield merefield commented Aug 24, 2023

Summary

  • FEATURE: add function to permit LLM to search forum Posts and incorporate results in responses
    • creates new embeddings table dedicated to chatbot
    • adds rake task chatbot:refresh_embeddings to create embeddings
    • maintains embeddings for new and updated posts on a delta basis
  • FEATURE: Moves all remaining prompts strings (inc. agent internal thoughts and functions) to Localisations so they can be edited from Admin -> Customize -> Text or PR'd for additional languages.

Required changes to app.yml

These changes will require careful uninstalling if you wish to remove the bot. See the main README for removal instructions.

This new update brings forum search which requires embeddings and parts of the changes represent a breaking change so listen up!

I use the Postgres extension known as pg_embeddings. This promises vector searches 20x the speed of pgvector but requires a bespoke build.

Now needs the following added to app.yml in the after_code: section before the plugins are cloned.

(NB you may be able to omit the first three commands if your server can see the postgresql-server-dev-x package)

    - exec:
        cd: $home
        cmd:
          - sudo apt-get install wget ca-certificates
    - exec:
        cd: $home
        cmd:
          - wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    - exec:
        cd: $home
        cmd:
          - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
    - exec:
        cd: $home
        cmd:
          - apt-get update
    - exec:
        cd: $home
        cmd:
          - apt-get -y install -y postgresql-server-dev-${PG_MAJOR}
    - exec:
        cd: $home/tmp
        cmd:
          - git clone https://github.com/neondatabase/pg_embedding.git
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config install
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "create extension if not exists embedding;"'

This is necessary to add the pg_embeddings extension

Creating the Embeddings

Once built, we need to create the embeddings for all posts, so the bot can find forum information.

Enter the container:

./launcher enter app

and run the following rake command:

rake chatbot:refresh_embeddings[1]

which at present will run twice due to unknown reason (sorry! feel free to PR) but the [1] ensures the second time it will only add missing embeddings (ie none immediately after first run).

Compared to bot interactions, embeddings are not expensive to create, but do watch your usage on your Open AI dashboard in any case.

NB Embeddings are only created for Posts and only those Posts for which a Trust Level One user would have access. This seemed like a reasonable compromise. It will not create embeddings for posts from Trust Level 2+ only accessible content.

Model considerations

In order to use the bot in agent mode you must select one of the 0613 variants for the setting chatbot_open_ai_model otherwise the agent will not function correctly.

@merefield merefield changed the title FEATURE: Semantic search FEATURE: Semantic Search Aug 24, 2023
@merefield merefield merged commit 262a0a4 into main Aug 28, 2023
3 checks passed
@hifihedgehog
Copy link

hifihedgehog commented Sep 7, 2023

One caution to others! If you apply the changes as described in Required changes to app.yml to your .yml container file, even if you no longer use this plugin anymore and remove it from the plugin list, you still need these changes or Postgres will complain with the error "ERROR -- : PG::UndefinedFile: ERROR: could not access file "$libdir/embedding": No such file or directory." I think I now understand what the "breaking change" is even though it was not stated explicitly. ;)
@merefield, what is the sysadmin-y way of reversing course if possible?

@merefield
Copy link
Owner Author

merefield commented Sep 7, 2023

Hi @hifihedgehog, apologies for the inconvenience caused.

I have just added uninstall instructions to the README and signposted those in the PR comment above. They were already on Meta.

Let me know if you need any more help and please confirm that solves your issue.

I will turn that script into a rake task very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants