Semantic Textual Similarity (STS):

Dataset:

Semantic Textual Similarity 2012-2017 Dataset
The benchmark comprises 8628 sentence pairs. Each sentence pair is accompained with a similarity score (0-5), where 0 being least similar and 5 being most similar.
The train-dev-test splits is as follows:

	train	dev	test	total
news	3299	500	500	4299
caption	2000	625	525	3250
forum	450	375	254	1079
total	5749	1500	1379	8628

Methodology:

I have design a system similar to the one proposed in "Siamese Recurrent Architectures for Learning Sentence Similarity".
Instead of a plain LSTM. I have used a BiLSTM followed by a dense layer.
The loss used to exponential of the negative Manhattan distance between 2 sentence representations generated by the Siamese Network.
The results can be further improved using the non-parametric log-linear classifier in the post-processing step as illustrated in the base paper.

Files:

utils.py file contains all the helper funcitons.
siamese_model.py contains the model architecture.
main.ipynb contains the implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
siamese_model.py		siamese_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

siamese_model.py

siamese_model.py

utils.py

utils.py

Repository files navigation

Semantic Textual Similarity (STS):

Dataset:

Methodology:

Files:

About

Releases

Packages

Languages

License

atharvajk98/Semantic-Textual-Similarity

Folders and files

Latest commit

History

Repository files navigation

Semantic Textual Similarity (STS):

Dataset:

Methodology:

Files:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages