Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Better performance for pagination #130

Closed
dnagir opened this Issue · 6 comments

3 participants

@dnagir
Owner

Reasoning

The current pagination looks pretty inefficient. It loops through all the items to get to the page every time, be it 50 items or 5000000.

So we'll start kill the application going further and further to the end.

We also need (not sure if that's available already):

  • skip(N) - a way to efficiently skip the first N items
  • limit(N) - efficiently limit the result set to N items.
  • sort (probably is already available) - efficient way to sort.

UPDATE: Ruby already has Enumerable#drop to "skip" and Enumerable#first to "limit". So we may just stick to standard Enumerable interface and optimise those methods where possible.

The pagination and/or "infinite scrolling" is easily achieved by combining those methods:

items.sort(:name).skip(25000).limit(10) which is supposed to be efficient as :)

Then we could just add a a convenience method paginate where it makes sense. The implementation will look something like this:

def paginate(options={})
  page = options[:page] || 1
  per_page = options[:per_page] || 20
 self.skip((per_page-1) * page).limit(per_page)
end

Ins

I tried to get rid of all wp_query methods to see where those are used. And got the failing specs in:

  • Neo4j::HasN::Mapping
  • Neo4j::Traversal::Traverser
  • Model itself (presumable based on <Model>.all)
  • Neo4j::Index::LuceneQuery
  • Neo4j::Rails::Relationships::NodesDSL
  • Neo4j::Rails::Relationships::RelsDSL

We'll need to implement the common interface for all of those (something like current WillPaginate) that would leverage the underlying DB pagination support.

A good start can be figuring out how Creating a paged traverser is implemented and take that approach.

@dnagir
Owner

@andreasronge just continuing the discussion from #129 here.

Silly me, you're right about the reverse order. And I'm pretty sure Cypher supports sorting.

Could you please explain why "supernode" problem is a bad design (maybe some reading)?
The kvitter app, for example, uses User.has_n :tweeted.

The number of tweets can easily grow to hundreds of thousands just in a year.
Is this a bad design? How should this be changed?

@andreasronge

There has been problem with performance of Neo4j having millions of relationships to one node (not sure it's solved 100%)
Also, the only way to find nodes is by traversing and visiting all millions of relationships - which is not good.
Instead, it's usually better to organize the node space more like a graph, maybe group the millions of relationships in chunks of date ranges etc (i.e. an in graph index). So User.has_n :tweeted will not scale well if you need to search all tweets. Instead depending on the query you should organize it as an in graph index, or simply User.has_list :tweeted or build your own linked list (I think Neo4j.rb should support linked list, or maybe a different gem).

@dnagir
Owner

I didn't know about has_list (I thought it's just an alias to has_n and didn't look at it :) )
BTW,the TimelineIndex seems to be non-Thread-safe.

@andreasronge

It used to be non thread safe, but I think it is thread safe now. It now uses lucene instead of an ingraph index (as a linked list).

@dnagir
Owner

I think we can move this out of scope of 2.0?

@andreasronge

Done, moved out of 2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.