New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HydratingPaginatorAdapter is slow with big data #7

Closed
sasquatch-mc opened this Issue Jul 12, 2013 · 11 comments

Comments

Projects
None yet
3 participants
@sasquatch-mc

sasquatch-mc commented Jul 12, 2013

My team recently needed to implement similar paginator adapter for the old Zend_Paginator and at first we've started with creating almost the same adapter as HydratingPaginatorAdapter. The problem is that if you have a lot of data in a collection (let's say about 1 milion records) using the skip() cursor method is performance killer. This approach is good if you have small amount of data, but I personaly do not recommend it if you need to handle big data.

The best solution so far (at least for us) for this problem is described here http://stackoverflow.com/questions/9703319/mongodb-ranged-pagination - the first answer.

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Mar 8, 2014

Contributor

The solution presented in stackoverflow would work only when you are using MongoId

Contributor

manchuck commented Mar 8, 2014

The solution presented in stackoverflow would work only when you are using MongoId

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Mar 10, 2014

Contributor

After some testing. I was able to get this working with all types of _id's. I created a ranged paginator adapter which requires passing in the current Id from the request. checkout #13 for more information

Contributor

manchuck commented Mar 10, 2014

After some testing. I was able to get this working with all types of _id's. I created a ranged paginator adapter which requires passing in the current Id from the request. checkout #13 for more information

@sasquatch-mc

This comment has been minimized.

Show comment
Hide comment
@sasquatch-mc

sasquatch-mc Mar 10, 2014

@manchuck Hello. I've read the code from your pull request and do you think that this will work with going back in pages? For example - I'm on page 4 and I want to go back to page 2. Also the example will work for going to the next page, but if you want to skip 2 pages ahead? Our solution includes additional params (like step from the page) which solves that, but I didn't get the time to submit it.

sasquatch-mc commented Mar 10, 2014

@manchuck Hello. I've read the code from your pull request and do you think that this will work with going back in pages? For example - I'm on page 4 and I want to go back to page 2. Also the example will work for going to the next page, but if you want to skip 2 pages ahead? Our solution includes additional params (like step from the page) which solves that, but I didn't get the time to submit it.

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Mar 12, 2014

Contributor

@sasquatch-mc After looking through the Zend\Paginator package, the paginator only cares about the current page. Finding out what the next or previous pages, would be a responsibility of the pagination. Zend\Paginator would have to be updated to create the correct pages (see Zend\Paginator\Paginator::_createPages). We would also need to have new view helpers to append the Id's to the end of the URL's that are created. It appears that the adapter itself just has to worry about getting the items it needs to return back to the Paginator.

Contributor

manchuck commented Mar 12, 2014

@sasquatch-mc After looking through the Zend\Paginator package, the paginator only cares about the current page. Finding out what the next or previous pages, would be a responsibility of the pagination. Zend\Paginator would have to be updated to create the correct pages (see Zend\Paginator\Paginator::_createPages). We would also need to have new view helpers to append the Id's to the end of the URL's that are created. It appears that the adapter itself just has to worry about getting the items it needs to return back to the Paginator.

@weierophinney

This comment has been minimized.

Show comment
Hide comment
@weierophinney

weierophinney Mar 12, 2014

Contributor

@manchuck and @sasquatch-mc -- Chuck is correct -- the adapters are passed the current page offset and the number of items to return per page. If the adapter is able to determine the cursor offset to start on based on the page and items per page, then everything's set; you wouldn't need to change anything else in the paginator at all.

If not, however... that's another story. I'm not pretending to understand how ranged pagination is supposed to work at this point.

Contributor

weierophinney commented Mar 12, 2014

@manchuck and @sasquatch-mc -- Chuck is correct -- the adapters are passed the current page offset and the number of items to return per page. If the adapter is able to determine the cursor offset to start on based on the page and items per page, then everything's set; you wouldn't need to change anything else in the paginator at all.

If not, however... that's another story. I'm not pretending to understand how ranged pagination is supposed to work at this point.

@sasquatch-mc

This comment has been minimized.

Show comment
Hide comment
@sasquatch-mc

sasquatch-mc Mar 12, 2014

@manchuck Hi. I think you misunderstood the issue here. With your current code you can't go back in pages. For example I'm on page 4 and I want to go to page 3 or 2. You are using $min, but is it always $min? I think not. The code you have written only goes forward, but not backwards. And actually it has issue with going forward also - you can't skip page. We've dealt with this problem half an year ago.

Right now me and my teammate are rewriting our adapter (we don't really have a lot of time to spear around this) in a fork of this repository, but here is a summary of our ideas, just to include them in the discussion:

Going forward in pages

If we are on page 1 and we want to go page 2 is simple as in your pull request - we just add limit and we set a starting ID.

However, if we want to go from page 1 to page 3 we need a little extra. The starting ID will be the same as in the first scenario, but we don't need all results from the previous pages, we want only the results on page 3.
In short - we need to skip some results. Actually skip is not bad, if you use it wisely - in our case there is nothing bad to use with small amount of rows.

We've introduced a new parameters that we've named simply a "step". The step contains the number of pages to jump, from which the skip will be calculated. So if you are currently on page 1 and you want to go to page 2, the step will be equal to 1. For page 3 the step will 2 and so on. The step is calculated from the point of view of the current page, so if we are on page 56, the step for page 57 will be 1.

When we get the step parameter we can calculate the skip value. If we have 5 items per page and we go from page 1 to page 3 the skip will be 10. We skip 10 items and our result is the next five (e.g from record 10 to 15).

However, there are some exceptions. Since our main concern is performance, we can't go from page 1 to page 1000. The step will be 999, which will generate a big skip. So we just don't have that option in our projects and we limit the number of pages from which you can choose to 10 or 20. Also we have validation which limits the step size. Keep in mind that Mongo is not MySQL and we can't do all the things we did with MySQL and other RDBS.

Going back in pages

In the first section I've introduced the step parameter. We use it as a positive value for going forward, and similar to that we going to use it in the same fashion, but with negative values for going back. E.g. going from page 5 to page 3 will have step = -2. Since we are going in a reverse direction, we need to change some things with our query as well.

First of all we need to sort it in reverse. Why is that?

If we just select the first 5 records which have ID lower then the given one, Mongo is going to select them from the beginning of the collection.
Example: We select with limit 5 records lower then ID 5320c9c36abfb2376d2b8228. Let's assume that this record is somewhere in the middle of our collection. If we just use $lt and limit to 5, we are going to get the first 5 rows from the first page.

Solution: We need to reverse the selection by using sort({_id: -1}). This will reverse the order of the records and mongo is going to "walk" in reverse way. This is going to introduce one more issue, but more on that later.

So in our second use case for going backwards we'll need to use $lt. The skip will be used only with it's absolute value for the calculation of the skip.

The bad thing about reverse order

We have one very frustrating problem caused by sort(). Since we must use it to go back in page, it still returns the correct results, but in reversed order.
If we let's say "Title 1", "Title 2", "Title 3", "Title 4", the usage of sort() is going to reverse them. Which is really painful.

The way we go around this is to convert the MongoCursor into array and use array_reverse (yeah, yeah I know what a sin this is, but we don't have a better solution)

Please if somebody has better ideas or suggestions to simplify all of this - we are listening. Soon we are going to make a pull request with our version of this thing.

@manchuck About the helper - yeah we are building one too.

Hope this all makes sense.

sasquatch-mc commented Mar 12, 2014

@manchuck Hi. I think you misunderstood the issue here. With your current code you can't go back in pages. For example I'm on page 4 and I want to go to page 3 or 2. You are using $min, but is it always $min? I think not. The code you have written only goes forward, but not backwards. And actually it has issue with going forward also - you can't skip page. We've dealt with this problem half an year ago.

Right now me and my teammate are rewriting our adapter (we don't really have a lot of time to spear around this) in a fork of this repository, but here is a summary of our ideas, just to include them in the discussion:

Going forward in pages

If we are on page 1 and we want to go page 2 is simple as in your pull request - we just add limit and we set a starting ID.

However, if we want to go from page 1 to page 3 we need a little extra. The starting ID will be the same as in the first scenario, but we don't need all results from the previous pages, we want only the results on page 3.
In short - we need to skip some results. Actually skip is not bad, if you use it wisely - in our case there is nothing bad to use with small amount of rows.

We've introduced a new parameters that we've named simply a "step". The step contains the number of pages to jump, from which the skip will be calculated. So if you are currently on page 1 and you want to go to page 2, the step will be equal to 1. For page 3 the step will 2 and so on. The step is calculated from the point of view of the current page, so if we are on page 56, the step for page 57 will be 1.

When we get the step parameter we can calculate the skip value. If we have 5 items per page and we go from page 1 to page 3 the skip will be 10. We skip 10 items and our result is the next five (e.g from record 10 to 15).

However, there are some exceptions. Since our main concern is performance, we can't go from page 1 to page 1000. The step will be 999, which will generate a big skip. So we just don't have that option in our projects and we limit the number of pages from which you can choose to 10 or 20. Also we have validation which limits the step size. Keep in mind that Mongo is not MySQL and we can't do all the things we did with MySQL and other RDBS.

Going back in pages

In the first section I've introduced the step parameter. We use it as a positive value for going forward, and similar to that we going to use it in the same fashion, but with negative values for going back. E.g. going from page 5 to page 3 will have step = -2. Since we are going in a reverse direction, we need to change some things with our query as well.

First of all we need to sort it in reverse. Why is that?

If we just select the first 5 records which have ID lower then the given one, Mongo is going to select them from the beginning of the collection.
Example: We select with limit 5 records lower then ID 5320c9c36abfb2376d2b8228. Let's assume that this record is somewhere in the middle of our collection. If we just use $lt and limit to 5, we are going to get the first 5 rows from the first page.

Solution: We need to reverse the selection by using sort({_id: -1}). This will reverse the order of the records and mongo is going to "walk" in reverse way. This is going to introduce one more issue, but more on that later.

So in our second use case for going backwards we'll need to use $lt. The skip will be used only with it's absolute value for the calculation of the skip.

The bad thing about reverse order

We have one very frustrating problem caused by sort(). Since we must use it to go back in page, it still returns the correct results, but in reversed order.
If we let's say "Title 1", "Title 2", "Title 3", "Title 4", the usage of sort() is going to reverse them. Which is really painful.

The way we go around this is to convert the MongoCursor into array and use array_reverse (yeah, yeah I know what a sin this is, but we don't have a better solution)

Please if somebody has better ideas or suggestions to simplify all of this - we are listening. Soon we are going to make a pull request with our version of this thing.

@manchuck About the helper - yeah we are building one too.

Hope this all makes sense.

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Mar 13, 2014

Contributor

I thought on the plane ride how we might be able to resolve this. After looking thought the DB adapters, none of them are using ranged based queries so the same problem might exist with those. The Idea I have is to make a Zend\Paginator\RangedAdapterInterface which will have the function, getIdForPage($page); The Adapter can then build the correct starting ID for those pages catered to that table. I have not looked though the views that much, however, I have not seen restrictions for the views to be numerical. It should work that if we use the page as the current id. This would require the zend paginator to be changed which would be a greater discussion. Once I get that going, ill reference the PR for it

Contributor

manchuck commented Mar 13, 2014

I thought on the plane ride how we might be able to resolve this. After looking thought the DB adapters, none of them are using ranged based queries so the same problem might exist with those. The Idea I have is to make a Zend\Paginator\RangedAdapterInterface which will have the function, getIdForPage($page); The Adapter can then build the correct starting ID for those pages catered to that table. I have not looked though the views that much, however, I have not seen restrictions for the views to be numerical. It should work that if we use the page as the current id. This would require the zend paginator to be changed which would be a greater discussion. Once I get that going, ill reference the PR for it

@sasquatch-mc

This comment has been minimized.

Show comment
Hide comment
@sasquatch-mc

sasquatch-mc Mar 13, 2014

@manchuck Do you think that this will be performance wise decision? If we display 20 pages in the pagination control this means 20 queries to the database just to get the appropriate IDs. And also how exactly do you plan to get to the id of the first row of each of those pages by page number? Isn't that including skip with again big numbers? Add to that the number of queries, and if you are advanced in your browsing each of those queries will take too long to load.

Our solution is faster and it doesn't include changing the whole Zend\Paginator and it includes less extra parameters.

In your decision the displaying of the results is ok, but finding from where to start is a bottleneck, which I don't see how is going to be solved.

Cheers.

sasquatch-mc commented Mar 13, 2014

@manchuck Do you think that this will be performance wise decision? If we display 20 pages in the pagination control this means 20 queries to the database just to get the appropriate IDs. And also how exactly do you plan to get to the id of the first row of each of those pages by page number? Isn't that including skip with again big numbers? Add to that the number of queries, and if you are advanced in your browsing each of those queries will take too long to load.

Our solution is faster and it doesn't include changing the whole Zend\Paginator and it includes less extra parameters.

In your decision the displaying of the results is ok, but finding from where to start is a bottleneck, which I don't see how is going to be solved.

Cheers.

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Mar 17, 2014

Contributor

@sasquatch-mc I the solution you came up with, will work will when you are just using the Mongo Id for the PK. In order to solve the issue for a generic manner, will require some work. The project I'm working on now, uses product skus for the _id. Since the SKU's can be strings, I cannot reasonably take the starting index and add or subtract from those. Also, some collections might use compound keys. My idea would involve having a new Adapter interface that the paginator will call to create the starting Id's for the page. The interface will just have the signature


use Zend\Paginator\Adapter;

interface RangedAdapter
{
    public function getIdForPage(<int> $page, <mixed> $id);
}

All the paginator will just pass in the ids from the pages and will let the adapter decide how to make the next id. I am looking though all the code for the pagniator to see if the page, prev, next, etc. needs an int passed back. If not, then the "page" parameter can just be used as the starting id. That should resolve most of the use cases except for compound index's.

Mongo is smart in its fetching of data, where it will not fetch to the end of the collection when a mongo cursor is created (IIRC). I am also not familiar with the fetching procedures of all RDBMS, so I am not sure if that is the case with them all. If that were not the case, it would mean that the where clause would have to use BETWEEN instead of GREATER THAN.

Right now I am set up the ranged adapter and making the changes to Pagniator to reflect. I should have it done by this weekend (I have a midterm this week so I am going crazy over that). After that we can see how the community will respond to the changes and go from there.

Hope that clears up what I was thinking

Contributor

manchuck commented Mar 17, 2014

@sasquatch-mc I the solution you came up with, will work will when you are just using the Mongo Id for the PK. In order to solve the issue for a generic manner, will require some work. The project I'm working on now, uses product skus for the _id. Since the SKU's can be strings, I cannot reasonably take the starting index and add or subtract from those. Also, some collections might use compound keys. My idea would involve having a new Adapter interface that the paginator will call to create the starting Id's for the page. The interface will just have the signature


use Zend\Paginator\Adapter;

interface RangedAdapter
{
    public function getIdForPage(<int> $page, <mixed> $id);
}

All the paginator will just pass in the ids from the pages and will let the adapter decide how to make the next id. I am looking though all the code for the pagniator to see if the page, prev, next, etc. needs an int passed back. If not, then the "page" parameter can just be used as the starting id. That should resolve most of the use cases except for compound index's.

Mongo is smart in its fetching of data, where it will not fetch to the end of the collection when a mongo cursor is created (IIRC). I am also not familiar with the fetching procedures of all RDBMS, so I am not sure if that is the case with them all. If that were not the case, it would mean that the where clause would have to use BETWEEN instead of GREATER THAN.

Right now I am set up the ranged adapter and making the changes to Pagniator to reflect. I should have it done by this weekend (I have a midterm this week so I am going crazy over that). After that we can see how the community will respond to the changes and go from there.

Hope that clears up what I was thinking

@manchuck

This comment has been minimized.

Show comment
Hide comment
@manchuck

manchuck Apr 9, 2014

Contributor

@sasquatch-mc you were correct about my solution not working properly. After trying to implement something to handle this in the framework, I found no solution that is generic enough to fit inside zend framework. Believe me I tried but consider the following:

Imagine we have the following ordered set (Im using ints to keep the math simple):

1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20

If I only want 2 records per page and page 5, the math is simple:

$startIndex = $perPage * $page
// 10 = 2 * 5;

Now consider the following when records are removed:

1   2   4   5   7   8   9   10  11  12  13  14  15  16  17  18  19  20

The starting index will still be calculated as 10 but in reality, the starting index should be 12. I tried to come up with ways around this but when the set is not normally distributed, the math will not work out.

I took a look at what Google does to handle this. I made a call to get a list of products from Google Shopping and noticed that there are only links for self and next. Next has something like: https://content.googleapis.com/content/v1/********items/products/schema?start-token=3a11ffc3cf8a1f3&max-results=10. It looks like what Google will only allow you to move forward in the feed rather then transverse backwards.

I know that is not the solution to your problem but perhaps this might help:

Upon some more digging into how Google handles this, some other feeds they have will allow you to set start-index to a number. What you could do is cache what the position is for each id in your adapter in the getItems. You can even go smart and find the nearest id and start walking from that point.

Hopefully that will help but I am stumped as to a solution for this ATM 😳 😕

Contributor

manchuck commented Apr 9, 2014

@sasquatch-mc you were correct about my solution not working properly. After trying to implement something to handle this in the framework, I found no solution that is generic enough to fit inside zend framework. Believe me I tried but consider the following:

Imagine we have the following ordered set (Im using ints to keep the math simple):

1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20

If I only want 2 records per page and page 5, the math is simple:

$startIndex = $perPage * $page
// 10 = 2 * 5;

Now consider the following when records are removed:

1   2   4   5   7   8   9   10  11  12  13  14  15  16  17  18  19  20

The starting index will still be calculated as 10 but in reality, the starting index should be 12. I tried to come up with ways around this but when the set is not normally distributed, the math will not work out.

I took a look at what Google does to handle this. I made a call to get a list of products from Google Shopping and noticed that there are only links for self and next. Next has something like: https://content.googleapis.com/content/v1/********items/products/schema?start-token=3a11ffc3cf8a1f3&max-results=10. It looks like what Google will only allow you to move forward in the feed rather then transverse backwards.

I know that is not the solution to your problem but perhaps this might help:

Upon some more digging into how Google handles this, some other feeds they have will allow you to set start-index to a number. What you could do is cache what the position is for each id in your adapter in the getItems. You can even go smart and find the nearest id and start walking from that point.

Hopefully that will help but I am stumped as to a solution for this ATM 😳 😕

@weierophinney

This comment has been minimized.

Show comment
Hide comment
@weierophinney

weierophinney Jul 17, 2014

Contributor

Closing. From the discussion, it appears that the current solution is fine as a generic solution, and if users have specialized needs (performance, specialized sorting, etc.), they can create a custom adapter and/or view for use with their collection.

Contributor

weierophinney commented Jul 17, 2014

Closing. From the discussion, it appears that the current solution is fine as a generic solution, and if users have specialized needs (performance, specialized sorting, etc.), they can create a custom adapter and/or view for use with their collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment