Skip to content

NLP microservices for Stampy FAQ and AI Safety Info

Notifications You must be signed in to change notification settings

StampyAI/stampy-nlp

Repository files navigation

Stampy!

Stampy NLP performs semantic search and other NLP microservices for aisafety.info and stampy.ai, a database of questions and answers about AGI safety. Contributions will be welcome (once I get the messy things cleaned up), and the code is released under the MIT License.

The demo url is nlp.stampy.ai or direct link to stampy-nlp-t6p37v2uia-uw.a.run.app. If you're interested in learning more about Natural Language Processing (NLP) and Transformers, the HuggingFace course provides an excellent introduction.

Main Services Overview

Stampy NLP Overview

Our NLP services offer 4 features which depend on 2 key components:

  1. Three NLP models from HuggingFace, specifically SentenceTransformers, provide pre-trained models optimized for different types of semantic searches by generating sentence embeddings -- 768-dimension vectors, numerical representations that capture the meaning of the text. Think of it as an 768 element array of floats. In general, we use Python + PyTorch since that gives us the most flexibility to use a variety of models by default.
  • Retriever model (multi-qa-mpnet) for identifying paraphrased questions.
  • allenai-specter for searching titles & abstracts of scientific publications.
  • Reader model (electra-base-squad2) finds the start & end index of the answer given a question and a context paragraph containing the answer.
  1. Pinecone is a fully managed, high-performance database for vector search applications. Each data element contains the 768-dimension vector, a unique id (i.e. Coda id for our FAQ) and some metadata (original text, url, other relevant information).

Encodes a given query string, sends the vector embedding to search pinecone for nearest entries in faq-titles namespace, then returns payload as json sorted by score between 0 and 1 indicating the similarity of match.

Sample API usage:

https://nlp.stampy.ai/api/search?query=What+AI+safety%3F`
  • query (required) is the sentence or phrase to be encoded then have nearest entries returned.
  • top (optional) indicates the number of entries returned. If the value is not specified, the default is to return the 10 nearest entries.
  • showLive=0 (optional) will return only entries with status that are NOT "Live on site". The default is showLive=1 to return only entries that with status that are "Live on site".
  • status=all (optional) returns all entries including those that have not yet been canonically answered. Specify multiple values for status to return matching more than one value.
  • getContent=true (optional) returns the content of answers along with each entry. Otherwise, default is getContent=false and only the question titles without answers are returned.

Sample usages:

showLive=1 returns entries where status == "Live on site"

https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=What+AI+safety%3F&showLive=1

showLive=0 returns entries where status != "Live on site"

https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=What+AI+safety%3F&showLive=0

status=all returns all questions regardless of status

https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=Something+random%3F&status=all

status=value returns entries with status matching whatever value is specified. Multiple values may be listed separately. The example below returns entries with status == "Not started" and also status == "In progress"

https://stampy-nlp-t6p37v2uia-uw.a.run.app/api/search?query=Something+random%3F&status=Not%20started&status=In%20progress

getContent=true returns the content of answers along with each entry.

https://nlp.stampy.ai/api/search?query=Something+random%3F&getContent=true

Display a table with top pairs of most similar questions in Coda based on the last time paraphrase_mining was called.

Encodes a given query string, sends the vector embedding to search pinecone for nearest entry in paper-abstracts namespace, then returns payload as json sorted by score between 0 and 1 indicating the similarity of match. In an effort to minimize the number of huge models in our app container, this service still uses the external HuggingFace API so it's still a bit slow.

Sample API usage:

https://nlp.stampy.ai/api/literature?query=What+AI+safety%3F

Encodes a given query string then sends the vector embedding to search pinecone for the 10 nearest entries in extracted-chunks namespace. For each entry, run the HuggingFace pipeline task to extract the answer from each content then returns payload as json sorted by score between 0 and 1 indicating the confidence of the answer matches the query question. Since this runs +10 inferences, this can be rather slow.

Sample API usage:

https://nlp.stampy.ai/api/extract?query=What+AI+safety%3F

Setup Environment

Run the setup script

./setup.sh

If this is your first run, it will:

  • Download the appropriate models from Huggingface
  • Write the appropriate API keys/tokens to .env
  • Create a virtualenv
  • Install all requirements

Subsequent runs will skip bits that have already been done, but it does so by simply checking whether the appropriate files exist. API tokens for Coda, Pinecone and OpenAI are required, but the script will ask you for them.

Coda

The Stampy Coda table is https://coda.io/d/_dfau7sl2hmG

Pinecone

When creating a Pinecone project, make sure that the environment is set to us-west1-gcp

Duplicates generation

There is an /api/encode-faq-titles endpoint that will generate a duplicates file and save it to Cloud Storage. To avoid misusage, the endpoint is password protected. The password is provided via the AUTH_PASSWORD env variable. This is only used for that endpoint - if not set, the endpoint will simply return 401s.

Remote models

The models used are hosted separately and are provided via the following env variables:

QA_MODEL_URL=https://qa-model-t6p37v2uia-uw.a.run.app
RETRIEVER_MODEL_URL=https://retriever-model-t6p37v2uia-uw.a.run.app
LIT_SEARCH_MODEL_URL=https://lit-search-model-t6p37v2uia-uw.a.run.app

To help with local development you can set up the above model servers via docker-compose:

docker-compose up

This should work, but slowly. If you want faster results, consider either manually running the model that you're using (check the model_server folder for details), or provide a cloud server with the model.

Local models

Sentence transformer models can be run locally by providing the path to them, e.g.:

RETRIEVER_MODEL_URL=multi-qa-mpnet-base-cos-v1
LIT_SEARCH_MODEL_URL=allenai-specter

For this to work, its dependancies must first be installed via pip install -e '.[local_model]'.

Deployment

Install Google Cloud SDK

Linux

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get install google-cloud-cli
gcloud init
gcloud auth login --no-launch-browser

MacOS

brew install --cask google-cloud-sdk
gcloud init

Setup Docker

  1. Install Docker
  2. Authenticate Docker to Google Cloud: gcloud auth configure-docker us-west1-docker.pkg.dev

One thing worth remembering here is that Google Cloud Run containers assume that they'll get a Linux x64 image. The deployment scripts should generate appropriate images, but it might be an issue if your deployments don't want to work and you're not on a Linux x64 system

Deploy to Google Cloud Run

./deploy.sh <service name>

If no service name is provided, the script will deploy to stampy-nlp. Before actually doing anything, the script will ask to make sure everything is correct.