Skip to content
@macocu

MaCoCu

MaCoCu focuses on collecting monolingual and parallel data from the Internet, specially for under-resourced languages and DSI-specific data.

Popular repositories Loading

  1. LanguageModels LanguageModels Public

    Tools for training LMs

    Python 5 1

  2. MaCoCu-crawler MaCoCu-crawler Public

    Python 5

  3. prevert prevert Public

    Iterator for the prevert format

    Python 2

  4. American-British-variety-classifier American-British-variety-classifier Public

    Jupyter Notebook 1

  5. BCMS-variant-classifier BCMS-variant-classifier Public

    A classification tool for discriminating between Bosnian, Croatian, Montenegrin, and Serbian

    1

  6. Manual-Checking-Web-Corpora-Guidelines Manual-Checking-Web-Corpora-Guidelines Public

    Forked from TajaKuzman/GINCO-Genre-Annotation-Guidelines

    The Guidelines for Manual Checking of Web Corpora

    JavaScript

Repositories

Showing 10 of 10 repositories

Top languages

Loading…

Most used topics

Loading…