Prevent this user from interacting with your repositories and sending you notifications.
Learn more about blocking users.
You must be logged in to block users.
Contact GitHub support about this user’s behavior.
Learn more about reporting abuse.
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Tool to fix bitexts and tag near-duplicates for removal
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.
Tool for manual evaluation of parallel sentences.
Forked from loomchild/segment
Program used to split text into segments
Targetted language identifier, based on FastText and Hunspell.
Seeing something unexpected? Take a look at the
GitHub profile guide.