Simple flask app that wraps the alignment algorithm from this paper and this code. It uses flask as webserver, gunicorn server for production, d3.js for visualization. See the online version here.
If you are interested in word aligned corpore check out our demo that lets you browse more than 1000 languages: ParCourE.
Install requirements and SimAlign.
If CIS=False
in app/utils.py
, it is easier to test locally (as no BERT models are loaded into memory and also SimAlign is not required). For deployment set CIS=True
.
Create local secrets (do not put true secrets repos) like this:
export FLASK_SECRET_KEY="neverguessing"
export CAPTCHA_SITE_KEY='createonline'
export CAPTCHA_SECRET_KEY='createonline'
You need to create the captcha keys online or you set it to something meaningless (then captcha does not work which is fine for testing the app).
Then set
export FLASK_APP=align.py
and run
flask run
.
For the actual deployment run:
gunicorn --config gunicorn_config.py demo:app
If you use the code, please cite
@inproceedings{jalili-sabet-etal-2020-simalign,
title = "{S}im{A}lign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings",
author = {Jalili Sabet, Masoud and
Dufter, Philipp and
Yvon, Fran{\c{c}}ois and
Sch{\"u}tze, Hinrich},
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.147",
pages = "1627--1643",
}
Feedback and Contributions more than welcome! Just reach out to @pdufter.
Copyright (C) 2020, Philipp Dufter
A full copy of the license can be found in LICENSE.
- add embedding similarities to edges
- update simalign library