About

Yalign is a tool for extracting parallel sentences from comparable corpora.

Statistical Machine Translation relies on parallel corpora (eg.. europarl) for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from comparable corpora. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.

Installation

Yalign requires that you install scikit-learn.

After that you can install Yalign from PyPi via pip:

sudo pip install yalign

Usage

Firstly we need to download and unpack the english to spanish model.

wget https://raw.githubusercontent.com/machinalis/yalign/develop/data/models/0.1/en-es.tar.gz
tar -xvzf en-es.tar.gz

Now we can use the yalign-align script along with the english to spanish model to align two web pages.

yalign-align en-es http://en.wikipedia.org/wiki/Antiparticle http://es.wikipedia.org/wiki/Antipart%C3%ADcula

Yalign is not limited to any one language pair. By creating your own models you can align any two languages. For more details on how to use yalign and on yalign's implementation please read the docs.

The Yalign Team:

Yalign is a Machinalis project. You can view our other open source contributions here.

Andrew Vine
Gonzalo García Berrotarán
Rafael Carrascosa
Elías Andrawos
Laura Alonso Alemany

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
data		data
docs		docs
scripts		scripts
tests		tests
yalign		yalign
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

scripts

scripts

tests

tests

yalign

yalign

.gitignore

.gitignore

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.rst

README.rst

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

About

Installation

Usage

About

Releases

Packages

Contributors 4

Languages

License

machinalis/yalign

Folders and files

Latest commit

History

Repository files navigation

About

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Languages