Feature Request: class method to force delta index a set of items #29

Open
mipearson opened this Issue Dec 27, 2012 · 5 comments

Projects

None yet

2 participants

@mipearson

Not sure if this is already implemented - couldn't find it. Also not sure if it should go here or in the main TS project.

Anyways. I have a piece of code that looks like this now:

  def self.reindex relation
    relation.update_all(:delta => true)
    # We sometimes don't have an active sphinx index - eg tests.
    sphinx_indexes.first.try(:delta_object).try(:index, self)
  end

There's two situations that I'll call this: I've done a mass-column update on a 'displayed' flag on that model, or an object associated to the model has changed, and that association has many children.

Calling 'save' on each individual instance will take too long. Also, this happens frequently enough that forcing a full reindex each time isn't feasible.

I'd be happy to put in a PR for this if you can give me a bit of guidance on how you'd like it implemented.

@pat
Owner
pat commented Dec 28, 2012

Hi Michael

Appreciate the offer for help - though this sounds a little like suspended deltas. However, if you feel it differs, there's a few things to note:

  • A patch is certainly welcome.
  • I generally want to keep new features to the 3.x releases - which is currently in the edge branch.
  • While you only need to fire the indexing portion once, it is perhaps useful to mark all core records as deleted, so only the delta records are considered for searches. This is where things get a little tricky, but you could get all primary key values through a single SQL query to generate the underlying SphinxQL query.
  • This would mean the code for flagging as deleted could should be shifted out of ThinkingSphinx::Deltas::DefaultDelta#delete and into something a little more reusable (and then that gets used from within the original delete method).

Now, all this would mean you'll probably want to start using the 3.x branch, which is not a simple change - the README in that branch covers the differences, and is definitely worth a read.

Let me know what you think :)

Pat

@mipearson

Hi Pat,

It's different to suspended deltas in that I'm specifically avoiding my model's (lengthy) save logic, not just the creation of n*1000 DelayedJob instances.

I didn't think about what's going on in the core index. Guh.

What sort of timeline is 3.x on for release? Right now my "patches I need to tidy up" list is kinda long, so I'd like to wait until 3.x drops, and we know that we can migrate to it, before I work on code for BikeExchange that will only be of benefit in 3.x :)

@pat
Owner
pat commented Dec 29, 2012

With a little bit of luck, I'm hoping to have 3.0.0 out within the next week :) I pushed a release candidate about a week ago.

On 29/12/2012, at 12:03 AM, mipearson wrote:

Hi Pat,

It's different to suspended deltas in that I'm specifically avoiding my model's (lengthy) save logic, not just the creation of n*1000 DelayedJob instances.

I didn't think about what's going on in the core index. Guh.

What sort of timeline is 3.x on for release? Right now my "patches I need to tidy up" list is kinda long, so I'd like to wait until 3.x drops, and we know that we can migrate to it, before I work on code for BikeExchange that will only be of benefit in 3.x :)


Reply to this email directly or view it on GitHub.

@mipearson

Awesome.

Here's a new re-index method that's a little cleaner & handles deletion from the core:

  def self.reindex relation
    relation.update_all(:delta => true)
    sphinx_document_ids = relation.map(&:sphinx_document_id)

    Delayed::Job.enqueue(ThinkingSphinx::Deltas::DeltaJob.new(delta_index_names))

    sphinx_document_ids.each do |document_id|
      Delayed::Job.enqueue(
        ThinkingSphinx::Deltas::FlagAsDeletedJob.new(core_index_names, document_id)
      )
    end
  end
@pat
Owner
pat commented Dec 31, 2012

I'm hoping TS v3 will be out in the next couple of days, all going to plan.

And I think with this patch, we could get it working so it's a single SQL query to get the sphinx_document_id values, and have the deletion job handle multiple ids at once, with a single statement sent to Sphinx. Keep it nice and concise. Will perhaps need a patch to the SphinxQL section of Riddle though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment