Information retrieval system for documents in romanian.
- word stemming
- ignores stop-word
- case-independent
- parsed query
- highlight terms in fragments
- explain scoring for each term
- abstract score boosting
- install pylucene
pip install textract pdfminer flask
npm install jquery semantic
and move them tointerface/lib/semantic.css
andinterface/lib/jquery.js
./indexer.py
./server.py
pug
,coffee
,sass
for the interface PyLucene samples: http://svn.apache.org/viewvc/lucene/pylucene/trunk/samples/
If there is only one symbol with diacritics in the file, it will not be rendered correctly. Adding at least one more (not necessarily the same) will cause all symbols with diacritics to be rendered correctly.