-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore replacing Solr with Elasticsearch #552
Comments
The catch with Elasticsearch is that only supports JSON—no other formats. However, that's not necessarily a problem. We can iterate through all laws, via the API, to generate JSON for each law and feed those records to Elasticsearch. We could even iterate through all structural units and index those, too, something that we don't currently do (because there's no XML for structural units). By storing JSON in Elasticsearch, rather than merely indexing it, we could even use Elasticsearch to serve up responses to many API requests. That would make a caching layer (e.g., Varnish, Memcached) unnecessary, and make it trivial for the site to consume its own API. |
I don't think it would be particularly onerous to convert our schema ( |
I'm much more familiar with Solr, but I have the impression that they both have roughly the same features. What matters to me most when selecting a search backend is what query syntax it might force upon the users. Users of legal research tools are the worst. They want it all. They want a natural language search that will "just get" what they were looking for and they also want a super-powerful terms and connectors search capability. (And we're not talking about just boolean AND/NOT/OR; they like things like w/in 30, w/in para, wildcards, fuzzy, ranges, term appears x times, etc.) I know how Solr fares on this front, but am less familiar with what Elasticsearch's query syntax will look like. There's this page: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html and I think I understand it, and if so, it looks like it may use the same syntax Solr does, but it would be nice to have an item-by-item side-by-side comparison. |
They're both just interfaces to Lucene, so they should use the same syntax. |
Oh, hey, this seems handy. |
If anyone wants to write an interface here for a new search engine, there are now two examples of how the search wrapper expects to interface.
|
👍 |
Solr is tough to install, and it presents a real obstacle to deploying The State Decoded. In the intervening couple of years since I decided to use Solr, Elasticsearch has improved a great deal, and is now my personal default search software. (In fact, it's become my default data storage mechanism, too.) Elasticsearch is provided as DEB and RPM files, with proper init scripts etc., so installing it on Ubuntu, Debian, Red Hat, Fedora, and CentOS is trivial. We should consider moving away from Solr and to Elasticsearch, post-v1.0.
The text was updated successfully, but these errors were encountered: