New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback on elasticsearch-model #4
Comments
Hi Jérémy, feedback here in Github issues is absolutely perfect. Easy to track, review, contribute.
Yes, indeed. That's the main use case for the -model gem -- your primary database is "something else", and you want to Now, first important point to make is, that a "persistence" module for this gem is planned. If you're familiar with
Everything depends on how much do you want your PORO class to be involved with Elasticsearch. Should it be responsible In the most simple case, you can simply wrap each With such a wrapper in place, you can of course selectively
The An example of source filtering in Elasticsearch 1.0: require 'elasticsearch'
client = Elasticsearch::Client.new host: 'localhost:9250'
client.indices.delete index: 'articles'
client.index index: 'articles', type: 'a', id: 1, refresh: true, body: { title: "Test", tags: ["a", "b", "c"] }
response = client.search index: 'articles', body: {
_source: {
include: 'title'
}
}
puts response['hits']['hits'].map { |r| r['_source']}
# => {"title"=>"Test"} Many thanks for the feedback and the questions! Post more, if you'd have them. |
Thanks for your reply. I like that kind of discussion because it challenges my mental model. Regarding the persistence I'd like to have my models mostly ignorant of Elasticsearch. Right now I don't need the indexation nor the mapping/setting parts. That said, if I would need them, it might be comfortable to have them at hands. Regarding the mapping, I've already done something similar. Right now I'm using Virtus but it is slow as hell when I need to create thousands of objects. I don't really like Regarding the I think that the recent change in ES to restrict the use of Also, why ES wraps all fields values in an Array is a complete mystery to me. I've read the issue about this, but it doesn't make sense to me (don't bother explaining this to me, I've just updated my client code accordingly). |
Hi @karmi, I've just found an article by @ahawkins about the Repository pattern : http://hawkins.io/2013/10/implementing_the_repository_pattern/ I find it quite relevant with what I have in mind regarding a clear separation between models and persistence, using Elasticsearch as a Backend, elasticsearch-ruby as a transport/api adapter and custom clean models on the other side. |
In this case, I think you actually don't need much supporting "integration", since a class MyModel
include MyStuff
def to_hash
{ foo: "bar" }
end
end
m = MyModel.new
client.index index: m.index_name, m.document_type, m.to_hash The repository "support" would be quite small here?, since most of the other stuff would be tied to your model implementation -- how it handles naming (
Yeah, I'm familiar with that. There's actually quite a lot of different implementations of the pattern in Ruby at Github, at various states of maturity and with various interfaces. On of the most complete ones seems to be braintree/curator. I'll think about it a bit more, and play with some sketches for the interface.
I will, anyway :) Basically, any field or property or attribute (or whatever you call it) of your document is multi valued in Lucene, by default. Elasticsearch here "leaks" this fact, and in my opinion, rightly. Source filtering is much more intuitive, and should be preferred anyway. Only people knowing what they're doing should be working with fields directly. |
Just for the record, I did a little bit of research (Domain Driven Design book by Eric Evans, repository vs activerecord patterns, etc.) and I'm quite sure I'll try to support both data mapper and active record patterns in the persistence module.
For inspiration, I quite like the approach Braintree took in their |
I've been studying the Curator source code lately and it is indeed really interesting. There is something that bothers me ; there is nothing in it about building queries besides using an index to fetch a record. With Elasticsearch the index looks like the bucket in Riak or the collection in Mongo and the id is obviously present too, but besides that there are much more concepts involved. The type and routing are not part of the query itself but are really important. Values returned are not monolithic : _source, fields, facets, … are returned in different parts of the response. I quite easily see how I could improve my implementation by using Curator's basic parts : Configuration, DataStore, Repository, … but the (missing) QueryBuilder is the most sensitive piece (to me). |
I'm quite sure I won't design & implement it as an "adapter" to Curator, Ruby Object Mapper or do something heavy like that. The way I see it, the repository would be responsible for storage & retrieval related behaviour, which in Elasticsearch's world means also setting up mappings, defining index/type, etc. The model would have to conform to a simple contract (having a Some kind of a "sugar" module could "include" the repository in a model, so it would provide familiar active record semantics ( You correctly highlight the special meaning queries have in Elasticsearch. There are plans for a |
I'm not sure I understand how you would do something like
👍 For the persistence of models, my ideas are not that clear yet. For the moment, I mainly focus on querying many indexes with complex queries.
I have to admit, I've never been a fan of
Another example, depending of All this can be done using a DSL, but I've found it quite hard.
And often times, DSLs aim at easing the simple/common things but fail at making the compile things possible. |
What I'm saying is that my first goal is to come up with a standalone repository pattern implementation in the
Even when it takes ~ 150 lines, like in this complex example? :) Have a look at this sketch of using self-calling lambdas for dynamically building the Hash, I think the convenience factor of a "DSL" is immediately visible there. |
Even after a few minutes fiddling with the sketch you've linked, I'm not sure I understand 100% of how it works and what I can do with it. It is definitely shorter than your previous example. I guess I should take some time to rethink my query building. I've coded it on the go, when designing my queries and how I wanted to fetch my data. With the whole picture, I might come up with something more obvious/simple than what I've done lately. |
That said, this self-calling lambda trick is really neat. |
I'd like to come back to a point I was referring to earlier : instantiating objects with the "records" gathered from the index. As I've said, I've been using Virtus. It's a very nice project with many useful features. You can give it a hash of attributes and each of them will be a separate method in the object. You can coerce the value. The problem is that it's very slow when you coerce many fields. I've updated my Gist with many benchmarks. It show that the fastest approach is no coercion at all ( In my very personal case, I need to have simple value objects, with very few behavior but with important coercions. That's why a naive |
I'm quite familiar with Virtus. I'm considering using it as the basis for the active record layer of the "persistence" module, because it's the most advanced and feature-complete implementation of ActiveModel (incl. casting and coercion). As discussed, this won't prevent people using other model implementations, or plain old Ruby objects with the repository layer.
I've seen the benchmark, and wasn't quite sure about the concern: is it that creating thousands of Virtus instances is slow, or that creating handful of Virtus instances with many attributes is slow? Usually, you're creating only a limited number (10, 25, 50) of instances based on search results, and even when it isn't a case ( |
I've seen big slowdowns with Virtus when creating tens/hundreds of objects with 5 to 20 attributes. Since this morning, I've fiddled with a simpler implementation with 2 simple specs :
With the same attributes and coercions, between a Vurtus implementation and the simplest implementation that verifies my specs, I get this :
You can see the benchmark here : https://gist.github.com/jlecour/9dd8022c542c9d353ed7#file-z2_benchmark_virtus-rake |
@jlecour I can see the difference in the benchmarks, though I still don't understand why it's such a big concern. Again, in regular use, you instantiate a handful (10, 25, 50) objects e.g. for displaying them. Can you clarify this bit? |
As you said, the regular use case is not an issue. But when I have to instantiate (often, in my situation) hundreds or even thousands of objects, it becomes an issue. I have an example with an API where I have a few hundreds main elements to return in a JSON response. Each of these main elements have 10-20 sub elements that come from a different index. |
The working version of the repository pattern implementation added in #71. |
Hi @karmi,
I wanted to give you some feedback about
elasticsearch-model
but didn't find a better place to post it.This gem is really interesting, especially the "model" part of it.
I really like the decoupling between the class that holds the data, the one that does the request, and the one that represent the response and the results.
Still, there is something that bothers me ; there is a big assumption that "model" is something that is stored in another primary datastore than Elasticsearch, accessible through ActiveRecord, Mongoid or else.
I am working on an app where a lot of data is indexed in ES from an external data source. In the "main" app, I just want to query ES, fetch results and populate collections of results with data found in ES.
The objects I'm dealing with are very simple Ruby objects (_PORO_s) on which I can add some behavior, …
I guess I could use the various parts of Elasticsearch::Model with only what I want (excluding all the
Record
stuff) but it seems a little tedious and brittle.Maybe, it would be just enough to have an example, include just what's needed to have those plain Ruby objects coming from ES.
Another question : in Result, how do you deal with
script_fields
(with a calculated distance, for example) when the rest of the_source
is also present? I guess a basic merge would do the job.The text was updated successfully, but these errors were encountered: