Add item-per-item iteration so that users don't need to manage "pages" of results #86
This pull request is motivated by the CPAN Pull Request Challenge, http://cpan-prc.org/.
Unfortunately, I am not a regular user of this module, so I'd welcome pointers to typical workflows or relevant reverse dependencies which I could add as tests.
The text was updated successfully, but these errors were encountered:
Hello Fayland, you write:
Hello. thanks for the patching. overall it looks very good even I didn't test it yet.
Fine! I'll complete the other submodules next week.
just wondering why we need close_*?
I added these out of old habits - it depends on how the module is used. For command line scripts based on Net::GitHub you don't need close_*. But if you want to embed it into a webserver, then the "open" queries would hang around forever. Not only would the result sets pile up in memory, but they would also prevent a user to see the first issue if they didn't walk through the complete queue of issues the day before. I'll add that to the documentation. An alternative would be to go object oriented and create result objects (similar to what Pithub::Result does). In that case, the query results would be dropped automatically when the result object goes out of scope. I could do this if you prefer it. It needs some refactoring, though, because each result set object would need to be able to fetch new pages from the API. By the way: If you call close_issue(...) and then next_issue() again, you don't consume GitHub API calls. The code will fetch the first page, which should still be in the cache. -- Cheers, haj
A section whether, when and why close_xxx methods may be needed has been included in lib/Net/GitHub/V3.pm, next to the existing section about pagination.
What is still open is pagination of search results (Search.pm), which has easy and tricky parts. It is easy to derive the next_ and close_methods from what is there. The JSON result comes as a hashref instead of an arrayref, so we could iterate over the list elements of the 'item' key with a few extra lines. However, the hash response is there for a reason: In addition to the items, it contains information about the total number of hits, and whether there was a timeout on the server side. I consider this information vital: Iterating over millions of hits is a very bad idea, and interpreting incomplete results is a very bad idea as well. In both cases, the caller needs to adjust his query.
So, a possible approach would be to die if the server reports incomplete results, and also to die if the total number of hits exceed a defined threshold (but which threshold is good enough? Should be configurable on creation of the Search object...).
A second approach would store the two values after the first next_repository (or next_issue, or...) in the Search object. The user would need to call next_ once before he can evaluate the values and maybe abort the iteration.
As a third approach, we could add total_xxx and is_complete_xxx methods to each of issues, repositories, code and users.
And, of course, we could leave searching as it is. The page-per-page iteration works just fine, and the two values are returned in the page response so that the caller can (must!) check the.
Do you have any preferences? Maybe we should followup in another pull request?