Novasearch is a nice and simple solr web front end based on bottle.
First you should store all pdfs in dataset/pdf
, e.g. you have
dataset/pdf/path/test.pdf
so you should store in:
dataset/meta/path/test.json
the corresponding metafile, that has the following structure:
{
"abstract": "abstract",
"title": "pdf title"
}
The meta file will be used during search result presentation.
Solr will be used as backend, so a simple call of prepare.sh
will install solr in the dataset directory.
After you downloaded solr, you can easily start the index process,
so change to dataset directory and run index_build.sh
:
cd dataset
./index_build.sh
After you indexed all your pdfs and all meta files are stored, you can start novasearch via
./novasearch.py
It is necessary that solr is running (you can manually start it via ./start_solr.sh
).
The novasearch web service is now available via http://localhost:6090
,
you can change port, binding, dataset, download user/password settings
by customizing config.json
, an example configuration is already prepared.
Bibtex entries will be queried using dblp via stored title attribute in meta, because dblp bibtex entries are clean and homogeneous.
Web-css is done by bootstrap and some javascript magic using jQuery.
- add a document store (so that file system access is not needed anymore)
- use solr meta fields for storing meta information OR use a document store for meta infos
- cache dblp requests local OR extract all infos from dblp dump