This is a toolkit to calculate semantic similarity using LSTM or BERT for the implementation of TensorGCN in paper:
Liu X, You X, Zhang X, et al. Tensor graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(05): 8409-8416.
nltk
pytorch
transformers >= 4.11.3
Strongly recommend you to use BERT version, makes your life a little bit easier!
# DO NOT TOKENIZE YOUR SENTENCE!!!!!! [[sentence0],[sentence1],[sentence2],[sentence3],.......]
total_set,valid_set = get_bert_embedding(sentences)
import pickle
f = open('bert_semantic.pkl','wb')
pickle.dump(valid_set,f)
f.close()
You will need to train your LSTM model first using train_model(train_data,test_data), therefore you will split your data into two part!
# DO NOT TOKENIZE YOUR SENTENCE!!!!!! [[sentence0],[sentence1],[sentence2],[sentence3],.......]
train_model(train_data,train_label,test_data,test_label)
get_similarity(train_data,train_label,test_data,test_label)