Skip to content


@paracrawl @bitextor @macocu
Block or Report

Block or report mbanon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse


  1. Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 123 19

  2. Tool to fix bitexts and tag near-duplicates for removal

    Python 18 2

  3. Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

    Python 13 3

  4. Tool for manual evaluation of parallel sentences.

    PHP 11 4

  5. segment Public

    Forked from loomchild/segment

    Program used to split text into segments

    Java 2 1

  6. fastspell Public

    Targetted language identifier, based on FastText and Hunspell.

    Python 8 1

321 contributions in the last year

Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Mon Wed Fri
Activity overview
Contributed to bitextor/bicleaner-hardrules, mbanon/benchmarks, bitextor/bifixer and 8 other repositories

Contribution activity

December 2022

Created 1 commit in 1 repository

Seeing something unexpected? Take a look at the GitHub profile guide.