Skip to content

Interesting links to Slovak NLP tools, utils corpuses and resources.

Notifications You must be signed in to change notification settings

essential-data/nlp-sk-interesting-links

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 

Repository files navigation

nlp-sk-interesting-links

Interesting links to Slovak NLP tools, utils, corpora and resources. Feel free to add more interesting links via pull request.

Corpora

Corpus name Date Text Sents Tokens PoS Deps NER Sentiment Errors Parallel Download
Súdne rozhodnutia latest x x
Zákony latest x x
European Parliament Proceedings Parallel Corpus 1996-2011 x x x
SK WIKI latest x x
WEB Crawl w2c 2011 x x
TildeMODEL 2018 x x x x x
DGT (JRC) 2019 x x x x x
JW300 (Jehovah’s Witnesses) 2019 x x x x x
EMEA ? x x x x x
EUbookshop ? x x x x x
OpenSubtitles 2018 x x x x x
ECB ? x x x x x
QCRI Educational Domain Corpus ? x x x x x
GNOME ? x x x x x
JRC-acquis 2019 x x x x x
SkTenTen 2011-12-16 x x x x
Leipzig Corpora Collection - News Crawl 2011-2016 x x x
Leipzig Corpora Collection - Web Crawl 2014-2016 x x x
Korpus.sk - Multiple 2003-2019 x x x x
MULTEXT-East "1984" 2010-05-14 x x x x x
English-Slovak Parallel Corpus - Multiple 2012-05-15 x x x x x
Czech-Slovak Parallel Corpus - Multiple 2012-05-15 x x x x x
Slovak Web Discussion Corpus 2014 x x x x x x x
Deltacorpus 1.1 2016-06-20 x x x x x
R-mak 2017 x x x M
Slovak UD treebank 2020 x x x M M M x
Slovak Dependency treebank 2016-11-07 x x x M M M x
Slovak Categorized News Corpus 2016 x x x x x x
Sentigrade sentiment dataset 2017 x x M x
BSNLP 2017 2017 x x x
Korpus slovenských právnych predpisov 2021 x x x x x x x
CHIBY 2019 x x x x x

Positive/Negative words

Graph

Wordnet

Speech

Diacritics restoration

Annotated Translation ranking/errors

Stopwords

Tools and implementations

Sentiment analysis

PoS Tagging

Tokenizer and segmentation

Text reconstruction, spelling, text quality

Lemmatizer / Stemmer

Wordlists / Dictionaries

Dependency parsing

Synonyms

Vectors and analogies

NER

Linked / Open Data

Topic extraction

Negation detection

Annotation

Models

ELMo

Word2Vec

Bert

ULMFiT

Lemma

PoS

Dependency parsing

UDPipe

Hugging Face

Other

Publications

About

Interesting links to Slovak NLP tools, utils corpuses and resources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •