Skip to content
Framework for a information retrieval engine (QnA, knowledge base query, etc)
Jupyter Notebook Python Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
data remove sql db creation Dec 10, 2019
docs fixed navigation; now pulling from core repository, and dis… Nov 20, 2019
kubernetes increase memory from 7gb to 9gb Nov 15, 2019
notebooks Merge branch '3-run-demo-notebook-for-goldenretriever' into 'master' Dec 10, 2019
src correct handling of raw text, can press return to query Nov 22, 2019
templates improved table layout Nov 6, 2019
tests debug test Nov 20, 2019
.gitignore create database Nov 26, 2019
.gitlab-ci.yml add test to cicd Nov 20, 2019
Dockerfile support Kubernetes Nov 8, 2019
LICENSE Initial commit Aug 9, 2019
Makefile clean UI and assume no cache Nov 4, 2019 correct handling of raw text, can press return to query Nov 22, 2019
docs-environment.yml fixed navigation; now pulling from core repository, and dis… Nov 20, 2019
environment.yml update env Oct 11, 2019 add finetuning of models to aiap faq Sep 20, 2019 debug model restore and predict and finetune and test on insurance Oct 3, 2019

GoldenRetriever - Information retrieval using fine-tuned semantic similarity

GoldenRetriever is part of the HotDoc NLP project, which provides a series of open-source AI tools for natural language processing. HotDoc NLP is part of the AI Makerspace program. Please visit the demo page where you will be able to query a sample knowledge base.

Golden Retriever is a framework for a information retrieval engine (QnA, knowledge base query, etc) that works in 4 steps:

  • Step 1: The knowledge base has to be separated into "documents" or clauses. Each clause is an indexed unit of information e.g. a clause, a sentence, or a paragraph.
  • Step 2: The clauses (and query) should be encoded with the same encoder (Infersent, Google USE1, or Google USE-QA2).
  • Step 3: A similarity score is calculated (cosine dist, arccos dist, dot product, nearest neighbors).
  • Step 4: Clauses with the highest score (or nearest neighbors) are returned as the retrieved document. currently optimizes the framework for retrieving clauses from a contract or a set of terms and conditions, given a natural language query.

There is a potential for fine tuning following Yang et. al's (2018) paper on learning textual similarity from conversations.

A fully connected layer is inserted after the clauses are encoded to maximize the dot product between the transformed clauses and the encoded query.

In the transfer learning use-case, the Google-USEQA model is further fine-tuned using a triplet-cosine-loss function. This helps to push correct question-knowledge pairs closer together while maintaining a marginal angle between question-wrong-knowledge pairs. This method can be used to overfit towards any fixed FAQ dataset without losing the semantic similarity capabilities of the sentence encoder.


This model is implemented as a flask app.

Run python to launch a web interface from which you can query some pre-set documents.

To run the flask app using docker,

  1. Clone this repository.
  2. Build the container image: docker build -t goldenretriever .
  3. Run the container: docker run -p 5000:5000 goldenretriever
  4. Access the web interface on your browser by navigating to http://localhost:5000.


For comparison, we apply 3 sentence encoding models to the data set provided at InsuranceQA corpus. Each test case consists of a question, and 100 possible answers, of which the correct answer is one or more of the 100 possible answers.

Model evaluation metric is accuracy@k, where k is the number of clauses our model returns for a given query. A top score of 1 indicates that the returned k clauses contains a correct answer to the query, and a score of 0 indicates that none of the k clauses returned contain a correct answer.

Model acc@1 acc@2 acc@3 acc@4 acc@5
InferSent 0.083 0.134 0.1814 0.226 0.268
Google USE 0.251 0.346 0.427 0.481 0.534
Google USE-QA 0.387 0.519 0.590 0.648 0.698
TFIDF baseline 0.2457 0.3492 0.4127 0.4611 0.4989


  • 1 Google Universal Sentence Encoder
  • 2 Google Universal Sentence Encoder for Question-Answer Retrieval
You can’t perform that action at this time.