/
semantic_search_with_bert_tutorial.txt
45 lines (34 loc) · 1.43 KB
/
semantic_search_with_bert_tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
This tutorial is for semantic search with BERT pre trained trasformer model. it provides
some tensor similarity functions to perform experiment.
Setup
=====
You need to have Spacy and PyTorch installed. You can refer to spacy document for installing transformer
https://spacy.io/universe/project/spacy-transformers
I installed from this URL directly. It's a big download, about 400 MB
https://github.com/explosion/spacy-models/releases/download/en_trf_bertbaseuncased_lg-2.3.0/en_trf_bertbaseuncased_lg-2.3.0.tar.gz
Executing search
================
./ssearch <sim_algo> [doc_dir_path]
where
sim_algo is the similarity algorithm to be used. Choices are as follows
ds: doc avrage similarity
tsma: token max similarity
tsavm: token average of max similarity
tsmav: token max of average similarity
tsa: token average similarity
tsme: token median similarity
ssa: sentence average similarity
ssme: sentence median similarity
ssma: sentence max similarity
doc_dir_path is an optional argument. If provided all files in the directory are used,
otherwise har coded text for documents are used
If enters a console loop with the following choices. Use up and down arrow keys to make choices
1. enter query
2. enter query file path
3. enter matching technique
4. find match
5. quit
Corpus
======
Only a smal corpus has been used. All text is hard coded in the python script. You have the option
of specifying a directory path for files to be used.