Skip to content

nestordemeure/letMeNERSCthatForYou

Repository files navigation

Let Me NERSC that For You

This is a custom-made documentation Chatbot based on the NERSC documentation.

Goals

This bot is not made to replace the documentation but rather to improve information discoverability. Our goals are to:

  • Being able to answer questions on NERSC with up-to-date, sourced, answers,
  • run fully open source technologies on-premise giving us control, security, and privacy,
  • serve this model to NERSC users in production with acceptable performance,
  • make it fairly easy to switch the underlying models, embeddings and LLM, to be able to follow a rapidly evolving technology.

Installation

  • clone the repo,
  • use the environment.yml file to install dependencies with conda
  • clone the NERSC doc repository into a folder.

Usage

Basic use

Those scripts are meant to be run locally, mainly by developers of the project:

  • chatbot.py this is a basic local question answering loop
  • chatbot_dev.py is a more feature rich version of local loop, making it easy to run test questions and switch models around.
  • update_database.py update the vector database (for a given llm, sentence embeder, and vector database)1
  • token_counter.py measure the size of questions and answers for a given tokenizer

On NERSC supercomputers, you might want to run module load python cudatoolkit cudnn pytorch before using those commands.

Superfacility API use

Those scripts are meant to be user with the superfacility API:

  • api_client.py this is a deonstration client, calling the chatbot via the superfacility API,
  • api_consumer.py this is a worker, answering questions asked to the superfacility API on a loop

TODO

In no particular order:

  • move this code to the NERSC github,

  • refresh prompt (moving chunk information within the prompt)

  • update whoosh to whoosh reloaded

  • use url as further keywords for keyword search (see here%3A-,file_name,-%3D%20file_path.replace(%22.%22%2C%20%22%20%22).replace))

  • move away from Whoosh (keyword search) to remove he need for a search.initialize method?

  • look into orgroup when parsing query

  • filter out when / how / etc from query before keyword search

Developers

Nestor Demeure
Nestor Demeure
leading the effort and writing the glue code
Ermal Rrapaj
Ermal Rrapaj
finetuning and testing home-made models
Gabor Torok
Gabor Torok
writing the superfacility API integration and web front-end
Andrew Naylor
Andrew Naylor
scaling the service to production throughputs

Footnotes

  1. This script is run once everyday (on a scron job).