Skip to content

This projet aims to build Neural Network model to predict if two sentences are paraphrases or not

Notifications You must be signed in to change notification settings

luciegaba/paraphrase-identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paraphrase identification

Table of Contents


About

This projet aims to build Neural Network model to predict if two questions are paraphrases or not using Deep Learning.

Contents

This project contains:

  • A folder for notebooks (Project contains our model results)
  • Reporting folder containing the report from this project
  • A folder "scripts" with data_utils module containing ETL and Embedding pipelines + two scripts for the two tested models (SiameseLSTM and BERTFineTuner)
  • Env files (explained after)

Installation

To use this project, you must make the follow commands:

git clone https://github.com/luciegaba/paraphrase-identification.
cd paraphrase-identification

If you run the code for BERT Fine-tuning part in Colab, you must do instead:

pip install -r requirements.txt

If you use conda virtual env:

conda env create -f environment.yml
conda activate paraphrase-identification

Results

In this project, we mainly focused on developing a model from scratch to challenge ourselves. We built a Siamese LSTM model for this purpose. Nonetheless, you will see that our performance were not so good due to lack of quality fo data and a potential badly calibrated model. But we also make a "challenging" model based on Transformers called "ParaBERT": The BERT fine-tuned model can be found here. See more details about our project in our report

Contact

About

This projet aims to build Neural Network model to predict if two sentences are paraphrases or not

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published