The BERT pre-trained models can be used for more than just question/answer tasks. See
https://www.ironmanjohn.com/home/question-and-answer-for-long-passages-using-bert for more details. They can also be used to determine how similar two sentences are to each other.
In this repo demonstrate how to find these similarities using a measure known as cosine similarity.
I do some very simple testing using 3 sentences that I have tokenized manually.
If using a larger corpus, you will definitely want to have the sentences tokenized using something like nltk.tokenize.
The first two sentences (0 and 1) come from the same blog entry
while the third (2) comes from a separate blog entry
The similarity between sentences 0 and 1 should be higher with each other than with sentence 2.
The sentences are: 0. BERT was developed by Google and Nvidia has created an optimized version that uses TensorRT
- One drawback of BERT is that only short passages can be queried
- I attended a conference in Denver
For a more detailed explanation, see my full blog post at ironmanjohn.com and on Medium at medium.com.