Solr is tough to install, and it presents a real obstacle to deploying The State Decoded. In the intervening couple of years since I decided to use Solr, Elasticsearch has improved a great deal, and is now my personal default search software. (In fact, it's become my default data storage mechanism, too.) Elasticsearch is provided as DEB and RPM files, with proper init scripts etc., so installing it on Ubuntu, Debian, Red Hat, Fedora, and CentOS is trivial. We should consider moving away from Solr and to Elasticsearch, post-v1.0.
The text was updated successfully, but these errors were encountered:
The catch with Elasticsearch is that only supports JSON—no other formats. However, that's not necessarily a problem. We can iterate through all laws, via the API, to generate JSON for each law and feed those records to Elasticsearch. We could even iterate through all structural units and index those, too, something that we don't currently do (because there's no XML for structural units).
By storing JSON in Elasticsearch, rather than merely indexing it, we could even use Elasticsearch to serve up responses to many API requests. That would make a caching layer (e.g., Varnish, Memcached) unnecessary, and make it trivial for the site to consume its own API.
I don't think it would be particularly onerous to convert our schema (schema.xml) from Solr's format to Elasticsearch's. I have very limited experience with Solr's format, but I've done a bunch of work with Elasticsearch's, and I've found it to be quite straightforward.
I'm much more familiar with Solr, but I have the impression that they both have roughly the same features. What matters to me most when selecting a search backend is what query syntax it might force upon the users. Users of legal research tools are the worst. They want it all. They want a natural language search that will "just get" what they were looking for and they also want a super-powerful terms and connectors search capability. (And we're not talking about just boolean AND/NOT/OR; they like things like w/in 30, w/in para, wildcards, fuzzy, ranges, term appears x times, etc.) I know how Solr fares on this front, but am less familiar with what Elasticsearch's query syntax will look like. There's this page: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html and I think I understand it, and if so, it looks like it may use the same syntax Solr does, but it would be nice to have an item-by-item side-by-side comparison.