Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction

A TensorFlow implementation of our recent paper in ACL 2017(Student track) (Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction.)

Two sentences are said to be aligned or semantically similar if they convey the same semantics in both the languages. Our code makes use of the Bilingual Word Embeddings for capturing the semantic relatedness of two words across languages. A similarity matrix is constructed between the words of two sentences, which is dynamically pooled to a fixed size dimension 'dim' for classification tasks. We split the data into different bucket sizes as one fixed size representation would not work effectively for all sentence-pair sizes. Separate CNN's were trained on each data split.

Pre-requisites

Python 2.7
TensorFlow
Numpy
Scikit Learn
Matplotlib

Usage

To replicate the results from our paper, use the testing command below.

$ python main.py

The results would be appended at the end of corresponding files in the results folder. If you want to retrain the model, then uncomment the lines specified in the main.py

Attribution / Thanks

Bilingual Word Representations with Monolingual Quality in Mind Link
BUCC 2017 dataset Link

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
data		data
results		results
src		src
weights		weights
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction

Pre-requisites

Usage

Attribution / Thanks

License

About

Releases

Packages

Languages

License

groverjeenu/Bilingual-Word-Embeddings-with-Bucketed-CNN-for-Parallel-Sentence-Extraction

Folders and files

Latest commit

History

Repository files navigation

Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction

Pre-requisites

Usage

Attribution / Thanks

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages