Add support for multitype search and eager loading #131

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
6 participants
@dylanahsmith

Although searching across multiple types could be previously done with using a comma separated string for the type, the eager loading would still assume only one type was searched for. This patch fixes this by grouping the results by type, then performing one query per type to eager load the actual records.

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Oct 25, 2011

Owner

Hi Dylan,

sorry for such a delay with my answer. I have several remarks regarding the feature, and couple of questions. So, questions first:

It seems not so common to return instances of multiple classes from a single search? How do you handle incompatible model interfaces, etc? I can imagine that it's more common when using single table inheritance or such.

Note, that usually, it really makes sense to leave Tire return the Item instances, and work with those, especially if you're only displaying the data to the user. I'd be very curious to hear arguments against such approach, because that's and advice I keep giving to users.

Now, regarding the implementation. I think I won't surprise by describing the implementation as very complex, very hard to follow, and solving an apparent edge-case. Please, don't take this as a stupid gate-keeper guarding the codebase or something like that. I'm just trying to keep the Tire codebase approachable, and not cluttered, for the longer term. Anything which adds "dead code", something which is hard to change, refactor, extract is a huge liability.

It certainly does not help that the Collection#results method is such a beast, and quite convoluted already. It should be refactored, extracted into smaller methods, into a module or class, so anybody trying to change the behaviour -- as you're doing -- would have to overload just one or two methods (and decided if s/he would like to keep them private, or publish them in karmi/tire-contrib, etc.)

Now, speaking practically. I have tried to reconstruct your case, and this is what I came up with, inspired by your code: https://gist.github.com/1312996.

It seems to me it's doing what it should. If I were you, I would put such code into a Ruby module, including it into the application, using the low-level Tire DSL/API, and not the ActiveModel integration directly. That way, you'd have tight control on the logic, you could fine tune the code structure or performance without hanging in thin air, waiting for the upstream to take your changes, maintaining your fork, etc. Based on my experience with eg. CouchDB gems, I'd very much prefer such approach over constantly "fixing" an external library...

I'll gladly hear any opinions and feedback on this...

Owner

karmi commented Oct 25, 2011

Hi Dylan,

sorry for such a delay with my answer. I have several remarks regarding the feature, and couple of questions. So, questions first:

It seems not so common to return instances of multiple classes from a single search? How do you handle incompatible model interfaces, etc? I can imagine that it's more common when using single table inheritance or such.

Note, that usually, it really makes sense to leave Tire return the Item instances, and work with those, especially if you're only displaying the data to the user. I'd be very curious to hear arguments against such approach, because that's and advice I keep giving to users.

Now, regarding the implementation. I think I won't surprise by describing the implementation as very complex, very hard to follow, and solving an apparent edge-case. Please, don't take this as a stupid gate-keeper guarding the codebase or something like that. I'm just trying to keep the Tire codebase approachable, and not cluttered, for the longer term. Anything which adds "dead code", something which is hard to change, refactor, extract is a huge liability.

It certainly does not help that the Collection#results method is such a beast, and quite convoluted already. It should be refactored, extracted into smaller methods, into a module or class, so anybody trying to change the behaviour -- as you're doing -- would have to overload just one or two methods (and decided if s/he would like to keep them private, or publish them in karmi/tire-contrib, etc.)

Now, speaking practically. I have tried to reconstruct your case, and this is what I came up with, inspired by your code: https://gist.github.com/1312996.

It seems to me it's doing what it should. If I were you, I would put such code into a Ruby module, including it into the application, using the low-level Tire DSL/API, and not the ActiveModel integration directly. That way, you'd have tight control on the logic, you could fine tune the code structure or performance without hanging in thin air, waiting for the upstream to take your changes, maintaining your fork, etc. Based on my experience with eg. CouchDB gems, I'd very much prefer such approach over constantly "fixing" an external library...

I'll gladly hear any opinions and feedback on this...

@dylanahsmith

This comment has been minimized.

Show comment Hide comment
@dylanahsmith

dylanahsmith Oct 25, 2011

Searching across multiple types to me didn't seem like an edge case. It allows a search field to be present as part of the layout (i.e. available on any page) without having to navigate to a section of the site before searching. This could be used to search for tv shows and movies, articles and comments, blog posts and other pages, etc.

I do agree that it is best to avoid loading data from the database, if possible, and just use the results directly to display the data. However, I am dealing with an existing codebase that is currently using Sphinx, and expects to be working with models. Displaying these results is also non-trivial, because I am developing for a platform which provides the ability for custom search results pages to be uploaded as liquid templates. Obviously this isn't the typical use case, but Tire already seemed to have the ability to load models from the database, so I thought the code would be useful upstream.

I appreciate your feedback, and understand if you don't want to maintain the code for eager loading multi-type searches.

Also take note of the commit that allows the search to gracefully handle missing records. It would be a bad user experience to have a search fail just because one of the results can't be found. Of course, without multi-type search support, the code wouldn't matter to me, since I will need to do the loading externally.

I am already using the Tire.search, not the ActiveModel integration directly. The :load parameter isn't handled in Tire::Model::Search, it is accessible from Tire.search. I could move the functionality to a wrapper, although the fact that Tire::Search::Search changes Configuration.wrapper would need to be fixed. Likely I will move the code to my own module as you mentioned, handle eager loading there, and not call tire search functions directly from other code.

Searching across multiple types to me didn't seem like an edge case. It allows a search field to be present as part of the layout (i.e. available on any page) without having to navigate to a section of the site before searching. This could be used to search for tv shows and movies, articles and comments, blog posts and other pages, etc.

I do agree that it is best to avoid loading data from the database, if possible, and just use the results directly to display the data. However, I am dealing with an existing codebase that is currently using Sphinx, and expects to be working with models. Displaying these results is also non-trivial, because I am developing for a platform which provides the ability for custom search results pages to be uploaded as liquid templates. Obviously this isn't the typical use case, but Tire already seemed to have the ability to load models from the database, so I thought the code would be useful upstream.

I appreciate your feedback, and understand if you don't want to maintain the code for eager loading multi-type searches.

Also take note of the commit that allows the search to gracefully handle missing records. It would be a bad user experience to have a search fail just because one of the results can't be found. Of course, without multi-type search support, the code wouldn't matter to me, since I will need to do the loading externally.

I am already using the Tire.search, not the ActiveModel integration directly. The :load parameter isn't handled in Tire::Model::Search, it is accessible from Tire.search. I could move the functionality to a wrapper, although the fact that Tire::Search::Search changes Configuration.wrapper would need to be fixed. Likely I will move the code to my own module as you mentioned, handle eager loading there, and not call tire search functions directly from other code.

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Oct 26, 2011

Owner

Hi,

Searching across multiple types to me didn't seem like an edge case. It allows a search field to be present as part of the layout (...) This could be used to search for tv shows and movies, articles and comments, blog posts and other pages, etc.

indeed. That's a very common and valid scenario. In this scenario, and precisely in this scenario, I'd work work with the shallow Hash/Ostruct-like instances of Item, for displaying the content snippets and links to results. (Tire tries hard to make eg. URL helpers work.)

I do agree that it is best to avoid loading data from the database (...) existing codebase that is currently using Sphinx (...)

I can imagine. Thinking Sphinx, while unbelievably polished and well interfaced library, instilled some pre-conceptions into people's mind how search should work within Rails, and in Ruby in general. With ElasticSearch, many of these pre-conceptions are not justified, and in many cases they are downright false.

When we have the ability to return content at will from the engine, it is just wrong to load data from the database. I have yet to hear a compelling argument otherwise.

Displaying these results is also non-trivial, because I am developing for a platform which provides the ability for custom search results pages to be uploaded as liquid templates. Obviously this isn't the typical use case, but Tire already seemed to have the ability to load models from the database, so I thought the code would be useful upstream.

If that is the case, I think this is another, quite strong reason to exploit ElasticSearch's ability to return “full” content for you. If we're talking about Shopify, it's on par with other features where ElasticSearch makes sense for you: easy multitenancy (account-based indices, configurable aliases with filters/routing, etc), distribution, powerful facets. You may also be the first to actually make use of the :wrapper option, since you could create your own, “enhanced” Item implementation.

Also take note of the commit that allows the search to gracefully handle missing records.

Yeah, I did notice it, and I did also notice you're using a where() method :) That may fly with ActiveRecord, but half of people using ElasticSearch/Tire are not using ActiveRecord-based models (me included), and moreover, I personally don't care about ActiveRecord a tiny little bit. If I'd developed Tire solely for my purposes, there would be no ActiveRecord support.

It would be a bad user experience to have a search fail just because one of the results can't be found.

That's true. I should look into that and guard against missing records. The chance of such failure seems to me quite small, however, for most use cases. In a system with high throughput, where records are destroyed quickly, it would be a problem.

I am already using the Tire.search, not the ActiveModel integration directly. The :load parameter isn't handled in Tire::Model::Search, it is accessible from Tire.search.

I think it's the other way round? But yes, Tire.search does not expose the :load option, since it does not have any notion of a “model”. I really think it's much more flexible and robust to implement specific use cases on top of the Tire API/DSL. That said, I am quite mad at the Tire::Results::Collection#results method. It is a mess and does not allow easy overload, monkeypatching.

Owner

karmi commented Oct 26, 2011

Hi,

Searching across multiple types to me didn't seem like an edge case. It allows a search field to be present as part of the layout (...) This could be used to search for tv shows and movies, articles and comments, blog posts and other pages, etc.

indeed. That's a very common and valid scenario. In this scenario, and precisely in this scenario, I'd work work with the shallow Hash/Ostruct-like instances of Item, for displaying the content snippets and links to results. (Tire tries hard to make eg. URL helpers work.)

I do agree that it is best to avoid loading data from the database (...) existing codebase that is currently using Sphinx (...)

I can imagine. Thinking Sphinx, while unbelievably polished and well interfaced library, instilled some pre-conceptions into people's mind how search should work within Rails, and in Ruby in general. With ElasticSearch, many of these pre-conceptions are not justified, and in many cases they are downright false.

When we have the ability to return content at will from the engine, it is just wrong to load data from the database. I have yet to hear a compelling argument otherwise.

Displaying these results is also non-trivial, because I am developing for a platform which provides the ability for custom search results pages to be uploaded as liquid templates. Obviously this isn't the typical use case, but Tire already seemed to have the ability to load models from the database, so I thought the code would be useful upstream.

If that is the case, I think this is another, quite strong reason to exploit ElasticSearch's ability to return “full” content for you. If we're talking about Shopify, it's on par with other features where ElasticSearch makes sense for you: easy multitenancy (account-based indices, configurable aliases with filters/routing, etc), distribution, powerful facets. You may also be the first to actually make use of the :wrapper option, since you could create your own, “enhanced” Item implementation.

Also take note of the commit that allows the search to gracefully handle missing records.

Yeah, I did notice it, and I did also notice you're using a where() method :) That may fly with ActiveRecord, but half of people using ElasticSearch/Tire are not using ActiveRecord-based models (me included), and moreover, I personally don't care about ActiveRecord a tiny little bit. If I'd developed Tire solely for my purposes, there would be no ActiveRecord support.

It would be a bad user experience to have a search fail just because one of the results can't be found.

That's true. I should look into that and guard against missing records. The chance of such failure seems to me quite small, however, for most use cases. In a system with high throughput, where records are destroyed quickly, it would be a problem.

I am already using the Tire.search, not the ActiveModel integration directly. The :load parameter isn't handled in Tire::Model::Search, it is accessible from Tire.search.

I think it's the other way round? But yes, Tire.search does not expose the :load option, since it does not have any notion of a “model”. I really think it's much more flexible and robust to implement specific use cases on top of the Tire API/DSL. That said, I am quite mad at the Tire::Results::Collection#results method. It is a mess and does not allow easy overload, monkeypatching.

@dylanahsmith

This comment has been minimized.

Show comment Hide comment
@dylanahsmith

dylanahsmith Oct 26, 2011

When we have the ability to return content at will from the engine, it is just wrong to load data from the database. I have yet to hear a compelling argument otherwise.

It was just meant to be a transitional step. A Tire::Results::Item obviously doesn't have the same interface as the model it represents.

If we're talking about Shopify, it's on par with other features where ElasticSearch makes sense for you: easy multitenancy (account-based indices, configurable aliases with filters/routing, etc), distribution, powerful facets.

I am already using account-based indices, plan to look into using aliases for non-disruptive reindexing, and there are plans to exploit the extra search features elasticsearch has.

You may also be the first to actually make use of the :wrapper option, since you could create your own, “enhanced” Item implementation.

Perhaps one that will create an ActiveRecord objects with the data from elasticsearch that simply act as a cache, but can still load data through relationships with other tables. I haven't looked into how necessary this would be for our use case though.

That may fly with ActiveRecord, but half of people using ElasticSearch/Tire are not using ActiveRecord-based models (me included), and moreover, I personally don't care about ActiveRecord a tiny little bit.

Well, I am doing my own eager loading now, so that's alright.

But yes, Tire.search does not expose the :load option

Check your codebase, because the option is passed through to Tire::Results::Collection. The ActiveModel integration obviously didn't make semantic sense for multitype search (e.g. I don't want to do Product.search when I also want to search for other models).

When we have the ability to return content at will from the engine, it is just wrong to load data from the database. I have yet to hear a compelling argument otherwise.

It was just meant to be a transitional step. A Tire::Results::Item obviously doesn't have the same interface as the model it represents.

If we're talking about Shopify, it's on par with other features where ElasticSearch makes sense for you: easy multitenancy (account-based indices, configurable aliases with filters/routing, etc), distribution, powerful facets.

I am already using account-based indices, plan to look into using aliases for non-disruptive reindexing, and there are plans to exploit the extra search features elasticsearch has.

You may also be the first to actually make use of the :wrapper option, since you could create your own, “enhanced” Item implementation.

Perhaps one that will create an ActiveRecord objects with the data from elasticsearch that simply act as a cache, but can still load data through relationships with other tables. I haven't looked into how necessary this would be for our use case though.

That may fly with ActiveRecord, but half of people using ElasticSearch/Tire are not using ActiveRecord-based models (me included), and moreover, I personally don't care about ActiveRecord a tiny little bit.

Well, I am doing my own eager loading now, so that's alright.

But yes, Tire.search does not expose the :load option

Check your codebase, because the option is passed through to Tire::Results::Collection. The ActiveModel integration obviously didn't make semantic sense for multitype search (e.g. I don't want to do Product.search when I also want to search for other models).

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Oct 28, 2011

Owner

 When we have the ability to return content at will from the engine, it is just wrong to load data from the database.

It was just meant to be a transitional step. A Tire::Results::Item obviously doesn't have the same interface as the model it represents.

Yeah, it does not. However, for the type of listing I imagine one would end up with, the free-form Item instances would still be the best. You could eg. display a title (of movie, TV show, article, ...), specific fields where applicable (author of article), and a link to the result, via the link_to helper. In a "results" listing like that, there's generally no need to have the full model accesible?

Perhaps one that (...) can still load data through relationships with other tables.

Actually, a custom Tire::Collection class could be able to do that. If Tire would expose that as an option, then it would be trivial to implement logic like yours cleanly. That's why I was saying that the whole results loading, parsing, etc. stinks in the current codebase, and should be refactored...

Owner

karmi commented Oct 28, 2011

 When we have the ability to return content at will from the engine, it is just wrong to load data from the database.

It was just meant to be a transitional step. A Tire::Results::Item obviously doesn't have the same interface as the model it represents.

Yeah, it does not. However, for the type of listing I imagine one would end up with, the free-form Item instances would still be the best. You could eg. display a title (of movie, TV show, article, ...), specific fields where applicable (author of article), and a link to the result, via the link_to helper. In a "results" listing like that, there's generally no need to have the full model accesible?

Perhaps one that (...) can still load data through relationships with other tables.

Actually, a custom Tire::Collection class could be able to do that. If Tire would expose that as an option, then it would be trivial to implement logic like yours cleanly. That's why I was saying that the whole results loading, parsing, etc. stinks in the current codebase, and should be refactored...

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Nov 4, 2011

Owner

Dylan, what should we do about this issue?

Owner

karmi commented Nov 4, 2011

Dylan, what should we do about this issue?

dylanahsmith added some commits Oct 21, 2011

Allow searches to span multiple types.
* Only uses one database query per type is used for eager loading.
* The time complexity of preserving the search result order
  in eager loading is now linear rather than quadratic.
Tolerate missing records when eager loading search results.
Missing records are tolerated since a record may be deleted then
searched for before the elasticsearch index is refreshed. Also,
the record may be deleted just after the search before the eager
loading request is processed.
Attempt to simplify the eager loading code.
Took advantage of some ActiveSupport methods, since ActiveSupport is
already a dependancy of ActiveModel.
@dylanahsmith

This comment has been minimized.

Show comment Hide comment
@dylanahsmith

dylanahsmith Nov 4, 2011

I added a commit to try to simplify the eager loading code. Is it still too complicated?

As for the use of where() instead of find(), should I revert that change for this pull request? I would like the code to tolerate missing records, but I couldn't find any implementation or documented interface for find in ActiveModel. What documentation do you use to make sure Tire is compatible with other ActiveModel based implementations?

I added a commit to try to simplify the eager loading code. Is it still too complicated?

As for the use of where() instead of find(), should I revert that change for this pull request? I would like the code to tolerate missing records, but I couldn't find any implementation or documented interface for find in ActiveModel. What documentation do you use to make sure Tire is compatible with other ActiveModel based implementations?

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Nov 6, 2011

Owner

Hi Dylan, I ended up let this feature pass, for the time being. The whole Tire::Results::Collection.results method is complicated as it is, and this change would made it even more opaque and fragile. This method must be refactored, logic extracted to smaller methods, etc., before we can attempt for anything like this.

After that, a feature like yours should be easily supported, and everybody could decide if it's something she would like to share in upstream, either in core or in contrib.

Also, the whole approach of loading records from database seems suspicious to me (as already stated). Normally, there should be no need to do that, and certainly not in a use case like "I want to display links to various stuff when people perfrom search".

I understand the approach of working with the "real models" all the time is more convenient, and people are used to it by using libraries such as ThinkingSphinx, but I think ElasticSearch deserves more care and experimentation...

Owner

karmi commented Nov 6, 2011

Hi Dylan, I ended up let this feature pass, for the time being. The whole Tire::Results::Collection.results method is complicated as it is, and this change would made it even more opaque and fragile. This method must be refactored, logic extracted to smaller methods, etc., before we can attempt for anything like this.

After that, a feature like yours should be easily supported, and everybody could decide if it's something she would like to share in upstream, either in core or in contrib.

Also, the whole approach of loading records from database seems suspicious to me (as already stated). Normally, there should be no need to do that, and certainly not in a use case like "I want to display links to various stuff when people perfrom search".

I understand the approach of working with the "real models" all the time is more convenient, and people are used to it by using libraries such as ThinkingSphinx, but I think ElasticSearch deserves more care and experimentation...

@karmi karmi closed this Nov 6, 2011

@allesklar

This comment has been minimized.

Show comment Hide comment
@allesklar

allesklar Dec 15, 2011

+1 to the important need to provide site wide searching ability out of the box.

I have no idea how to implement this though ;-)

+1 to the important need to provide site wide searching ability out of the box.

I have no idea how to implement this though ;-)

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Dec 16, 2011

Owner

ES/Tire make it easy to do multi-model searches, just do:

Tire.search ['articles', 'comments', 'whatever'] do
  query { string 'whatever' }
end

You just won't get the "real" models, but instances of Tire::Results::Item -- you have to take responsibility from there, and either work around that, or retrieve your models by yourself.

Owner

karmi commented Dec 16, 2011

ES/Tire make it easy to do multi-model searches, just do:

Tire.search ['articles', 'comments', 'whatever'] do
  query { string 'whatever' }
end

You just won't get the "real" models, but instances of Tire::Results::Item -- you have to take responsibility from there, and either work around that, or retrieve your models by yourself.

@karmi karmi reopened this Mar 22, 2012

@karmi karmi closed this in d5c08fb Mar 22, 2012

@luxflux

This comment has been minimized.

Show comment Hide comment
@luxflux

luxflux Apr 3, 2012

Maybe also mention the possibility for multi-model searches in the Readme. I had to use google to find here (or I just have tomatoes on my eyes...) :)

luxflux commented Apr 3, 2012

Maybe also mention the possibility for multi-model searches in the Readme. I had to use google to find here (or I just have tomatoes on my eyes...) :)

@karmi

This comment has been minimized.

Show comment Hide comment
@karmi

karmi Apr 4, 2012

Owner

@luxflux Yeah, I should put that into Readme.

Owner

karmi commented Apr 4, 2012

@luxflux Yeah, I should put that into Readme.

@magedmakled

This comment has been minimized.

Show comment Hide comment
@magedmakled

magedmakled Jul 15, 2012

Hi @karmi, I think you should something together in the readme for these two things

  1. multi-model search like it is mentioned in that post.
  2. how to implement association (has_many, belongs_to) in one of the models

Hi @karmi, I think you should something together in the readme for these two things

  1. multi-model search like it is mentioned in that post.
  2. how to implement association (has_many, belongs_to) in one of the models
@HiroProt

This comment has been minimized.

Show comment Hide comment
@HiroProt

HiroProt Feb 14, 2013

Is there a way to specify what to eagerly load based on the AR object that is returned? I am searching over multiple different models and would like to eagerly load different things based on the actual model.

Is there a way to specify what to eagerly load based on the AR object that is returned? I am searching over multiple different models and would like to eagerly load different things based on the actual model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment