Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing post count in new api #590

Closed
ghostrigger opened this issue Feb 23, 2013 · 16 comments
Closed

missing post count in new api #590

ghostrigger opened this issue Feb 23, 2013 · 16 comments

Comments

@ghostrigger
Copy link
Contributor

the post count is missing in the new posts.xml ( example: http://danbooru.donmai.us/posts.xml?tags=touhou+bunny_ears ) , previously the old post/index.xml ( example: http://danbooru.donmai.us/post/index.xml?&tags=touhou+bunny_ears ) has one.

somehow, this reminds me of #207 but the counts seem to be very accurate now. and one advantage of post count is to have an estimate of how many pages to fetch, or are we left to an iterative solution until there are no elements?

@ROMaster2
Copy link

While going to /tags can help for single tag counts, searches with more than 1 tags would need to use posts.xml to know the count.
The only way to circumvent this is to either increase the page limit high enough to see the last page, or fetch each and every page until you hit blank ones.

@RaisingK
Copy link
Collaborator

RaisingK commented Mar 2, 2013

Binary search over the page number combined with limit=100 seems like the only solution.

@ROMaster2
Copy link

But if the results are more than 100,000 you won't know the result number.

@r888888888
Copy link
Collaborator

There's no easy way for me to implement this without copy-pasting a lot of code. I could stuff the total page count in the post elements but that's inelegant. Is there any reason why the iterative approach doesn't work?

@spillerrec
Copy link

In practice it is simply to slow to iterate over the posts. From my location it takes about 1 second for a request to complete, if you need to do just 5-10 requests it becomes way too slow. (Of course you could simply spam request in parallel, but that is not elegant and programming asynchronously is probably too tricky for beginners.)
I would say it is better to have a total post count on each post, than having to do 10 requests to find it.

I'm currently using post count in two ways, first to display it to the user, and secondly as an aid to control my partial cache. Neither justifies the extra requests, it would make the UI less responsive and the point of the cache was to make fewer requests.

@r888888888
Copy link
Collaborator

One thing I can do is pass the count in the HTML header, as X-Total-Pages or some equivalent.

@ghostrigger
Copy link
Contributor Author

my idea of total post count should only be declared once and should be included in the first server's reply as an xml element or json attribute. having to declare it in every post entry requires continues updating to maintain accuracy which could probably even slow down the process. btw, is the api access still throttled and limited to 1 request per sec per user?

@spillerrec
Copy link

I would also prefer to have it directly in the XML and JSON data.
If retrieving the post count is an expensive operation, I think it would be better to add a parameter like "show_post_count" which is false as default. This way it will only be included if people actually is interested in it. (I fail to see why it should be an expensive operation, when there is no issues with including it in the HTML version.)

I think the throttling was removed after a short period of time, though I haven't tested it. I don't need to provide any user information anymore though, so at most it is on the IP level.

@Lightforger
Copy link
Contributor

This might have been what RaisingK meant, but by calling the raw http (&limit=200) and extracting the number from the paginator you can get a fairly accurate result in a single call.
There are very little 2-tag searches that return more then 200,000 results.

@pipian
Copy link

pipian commented Apr 15, 2013

Since we already have a rough count from ::Post.fast_count() attached to the paginated @posts array thanks to the implementation of total_count in the paginator module, wouldn't it be sufficient to just add a reimplementation of Array#to_xml in the paginator module to shim in an extra "total-count" attribute on the root "posts" element? (Extra bonus: you'd get a "total-count" attribute on all other paginated XML files)

@RaisingK
Copy link
Collaborator

My algorithm is basically:

Using page=MAX_PAGE, limit=MAX_LIMIT, and the id metatag, query the API until a query with no results is returned
If one of the queries returns more than 0 but less than MAX_LIMIT results, return current total
for( page = 2 ^ ceil( log2(MAX_PAGE) - 1 ); page > 0; page /= 2 )
{
    Get next query with page=page, limit=MAX_LIMIT, and id metatag
    If more than 0 results but less than MAX_LIMIT, return current total
    If more than 0 results, update the value fed to the id metatag
}

Searching for status:any bad_id: (~156700 posts)

  • MAX_LIMIT= 25 = 20 queries, 31s
  • MAX_LIMIT= 50 = 16 queries, 38s
  • MAX_LIMIT=100 = 13 queries, 48s
  • MAX_LIMIT=200 = 11 queries, 139s.

ToksT added a commit that referenced this issue May 13, 2013
@ToksT
Copy link
Contributor

ToksT commented May 13, 2013

I added it to the XML for posts.

@ToksT
Copy link
Contributor

ToksT commented Jun 2, 2013

Idea: Don't return result count with the results themselves; instead create a separate action that returns just the result count. Ideally this would work for all pages (not just posts) and with both xml and json formats.

eg. /posts/count.json?tags=3girls+1boy => {total_count:1189}

That might be neater and easier to implement than the other ideas so far.

@ROMaster2
Copy link

I can't tell if it no longer works or I'm doing something wrong. I think I remember it working a month ago...

@ToksT
Copy link
Contributor

ToksT commented Aug 20, 2013

It still works for me.

What do you see if you visit this link normally?: http://danbooru.donmai.us/counts/posts.json?tags=3girls+1boy

Should be:

{
    "counts": {
        "posts": 1359
    }
}

@ROMaster2
Copy link

I was doing it wrong, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants