To run the code, python2.7
There are two seperate methods implemented on the dataset quora_duplicate_questions.tsv
- 2_tf_idf_vec.py : Implements the TF-IDF method
- 3_word2vec_train.py : Implements the Siamese Neural Network Method
The run results corresponding to both the methods are listed out in the corresponding LOGFILE
To run you'll need the Quora duplicate questions dataset and word2vec binfile. Download this from here:
http://qim.ec.quoracdn.net/quora_duplicate_questions.tsv
https://drive.google.com/drive/folders/0B57P1TaPZK85ZWs0QXdLejBrOUU?usp=sharing
Paste quora_duplicate_questions.tsv inside the final code folder and complete data folder with binfile inside final code folder then run the program
Dependencies: Keras Word2Vec Spacy Gensim