ANNotation Infrastructure using Finna: an automatic subject indexing tool using Finna as corpus
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
conf support for GACS/AGRIS corpus Apr 21, 2017
corpus sync with scripts: ignore .raw files Apr 6, 2017
web add info about new version May 9, 2018
.gitignore ignore pyc files Feb 17, 2017
LICENSE initial code and license (CC0) Feb 17, 2017
Makefile support for GACS/AGRIS corpus Apr 21, 2017
README.md corpus folder and README updates Feb 17, 2017
autoindex.cgi fix adjustment of boost via cgi May 4, 2017
autoindex.py robustness fix (avoid div by zero in some weird special case) Sep 18, 2017
autoindex_sk.py remove the 10 article limit that was used for testing Sep 18, 2017
autoindex_txtdir.py robustness fix: don't break on Unicode decode errors Sep 25, 2017
autoindex_wikipedia.py script to automatically index Finnish Wikipedia articles Apr 6, 2017
conceptboost.py robustness quickfix (don't break if concept doesn't exist) Oct 11, 2017
create_corpus_gacs_agris.py support for GACS/AGRIS corpus Apr 21, 2017
create_corpus_yso_finna.py more robust error handling, hopefully... Apr 6, 2017
extract_words.py simplify by limiting projects to a single language (part of #1) Apr 15, 2017
filter_corpus.py stop filtering after 1000 lines so we don't waste time on huge input Apr 21, 2017
generate-corpus.sh script to run the corpus generation as a batch job Mar 29, 2017
load_corpus.py set boost to a float value so changing it by a factor will work later on Oct 11, 2017
ocr.cgi provide a language selector in the mobile UI Apr 6, 2017
optimize.py two-level caching in parameter optimizer Mar 31, 2017
projects.py Add missing module projects (doh). Fixes #1 Apr 15, 2017

README.md

annif

ANNotation Infrastructure using Finna: an automatic subject indexing tool using Finna as corpus

Dependencies

  • Elasticsearch installed on localhost

  • Python modules: (virtualenv recommended, with pypy for extra speed)

    pip install requests pip install SPARQLWrapper pip install langdetect pip install elasticsearch