-
Notifications
You must be signed in to change notification settings - Fork 533
Add support for multitype search and eager loading #131
Conversation
Hi Dylan, sorry for such a delay with my answer. I have several remarks regarding the feature, and couple of questions. So, questions first: It seems not so common to return instances of multiple classes from a single search? How do you handle incompatible model interfaces, etc? I can imagine that it's more common when using single table inheritance or such. Note, that usually, it really makes sense to leave Tire return the Now, regarding the implementation. I think I won't surprise by describing the implementation as very complex, very hard to follow, and solving an apparent edge-case. Please, don't take this as a stupid gate-keeper guarding the codebase or something like that. I'm just trying to keep the Tire codebase approachable, and not cluttered, for the longer term. Anything which adds "dead code", something which is hard to change, refactor, extract is a huge liability. It certainly does not help that the Now, speaking practically. I have tried to reconstruct your case, and this is what I came up with, inspired by your code: https://gist.github.com/1312996. It seems to me it's doing what it should. If I were you, I would put such code into a Ruby module, including it into the application, using the low-level Tire DSL/API, and not the ActiveModel integration directly. That way, you'd have tight control on the logic, you could fine tune the code structure or performance without hanging in thin air, waiting for the upstream to take your changes, maintaining your fork, etc. Based on my experience with eg. CouchDB gems, I'd very much prefer such approach over constantly "fixing" an external library... I'll gladly hear any opinions and feedback on this... |
Searching across multiple types to me didn't seem like an edge case. It allows a search field to be present as part of the layout (i.e. available on any page) without having to navigate to a section of the site before searching. This could be used to search for tv shows and movies, articles and comments, blog posts and other pages, etc. I do agree that it is best to avoid loading data from the database, if possible, and just use the results directly to display the data. However, I am dealing with an existing codebase that is currently using Sphinx, and expects to be working with models. Displaying these results is also non-trivial, because I am developing for a platform which provides the ability for custom search results pages to be uploaded as liquid templates. Obviously this isn't the typical use case, but Tire already seemed to have the ability to load models from the database, so I thought the code would be useful upstream. I appreciate your feedback, and understand if you don't want to maintain the code for eager loading multi-type searches. Also take note of the commit that allows the search to gracefully handle missing records. It would be a bad user experience to have a search fail just because one of the results can't be found. Of course, without multi-type search support, the code wouldn't matter to me, since I will need to do the loading externally. I am already using the Tire.search, not the ActiveModel integration directly. The :load parameter isn't handled in Tire::Model::Search, it is accessible from Tire.search. I could move the functionality to a wrapper, although the fact that Tire::Search::Search changes Configuration.wrapper would need to be fixed. Likely I will move the code to my own module as you mentioned, handle eager loading there, and not call tire search functions directly from other code. |
Hi,
indeed. That's a very common and valid scenario. In this scenario, and precisely in this scenario, I'd work work with the shallow Hash/Ostruct-like instances of
I can imagine. Thinking Sphinx, while unbelievably polished and well interfaced library, instilled some pre-conceptions into people's mind how search should work within Rails, and in Ruby in general. With ElasticSearch, many of these pre-conceptions are not justified, and in many cases they are downright false. When we have the ability to return content at will from the engine, it is just wrong to load data from the database. I have yet to hear a compelling argument otherwise.
If that is the case, I think this is another, quite strong reason to exploit ElasticSearch's ability to return “full” content for you. If we're talking about Shopify, it's on par with other features where ElasticSearch makes sense for you: easy multitenancy (account-based indices, configurable aliases with filters/routing, etc), distribution, powerful facets. You may also be the first to actually make use of the
Yeah, I did notice it, and I did also notice you're using a
That's true. I should look into that and guard against missing records. The chance of such failure seems to me quite small, however, for most use cases. In a system with high throughput, where records are destroyed quickly, it would be a problem.
I think it's the other way round? But yes, |
It was just meant to be a transitional step. A Tire::Results::Item obviously doesn't have the same interface as the model it represents.
I am already using account-based indices, plan to look into using aliases for non-disruptive reindexing, and there are plans to exploit the extra search features elasticsearch has.
Perhaps one that will create an ActiveRecord objects with the data from elasticsearch that simply act as a cache, but can still load data through relationships with other tables. I haven't looked into how necessary this would be for our use case though.
Well, I am doing my own eager loading now, so that's alright.
Check your codebase, because the option is passed through to |
Yeah, it does not. However, for the type of listing I imagine one would end up with, the free-form
Actually, a custom |
Dylan, what should we do about this issue? |
Missing records are tolerated since a record may be deleted then searched for before the elasticsearch index is refreshed. Also, the record may be deleted just after the search before the eager loading request is processed.
* Only uses one database query per type is used for eager loading. * The time complexity of preserving the search result order in eager loading is now linear rather than quadratic.
Took advantage of some ActiveSupport methods, since ActiveSupport is already a dependancy of ActiveModel.
I added a commit to try to simplify the eager loading code. Is it still too complicated? As for the use of |
Hi Dylan, I ended up let this feature pass, for the time being. The whole After that, a feature like yours should be easily supported, and everybody could decide if it's something she would like to share in upstream, either in core or in contrib. Also, the whole approach of loading records from database seems suspicious to me (as already stated). Normally, there should be no need to do that, and certainly not in a use case like "I want to display links to various stuff when people perfrom search". I understand the approach of working with the "real models" all the time is more convenient, and people are used to it by using libraries such as ThinkingSphinx, but I think ElasticSearch deserves more care and experimentation... |
+1 to the important need to provide site wide searching ability out of the box. I have no idea how to implement this though ;-) |
ES/Tire make it easy to do multi-model searches, just do: Tire.search ['articles', 'comments', 'whatever'] do
query { string 'whatever' }
end You just won't get the "real" models, but instances of |
Maybe also mention the possibility for multi-model searches in the Readme. I had to use google to find here (or I just have tomatoes on my eyes...) :) |
@luxflux Yeah, I should put that into Readme. |
Hi @karmi, I think you should something together in the readme for these two things
|
Is there a way to specify what to eagerly load based on the AR object that is returned? I am searching over multiple different models and would like to eagerly load different things based on the actual model. |
Although searching across multiple types could be previously done with using a comma separated string for the type, the eager loading would still assume only one type was searched for. This patch fixes this by grouping the results by type, then performing one query per type to eager load the actual records.