Paginate \ count result #1

tomron · 2018-07-04T12:18:17Z

Hi,
Is there a way to get count of the relevant results with respect to the query and \ or to paginate the results. E.g. read the first 500, then 500-1000, etc.?

gijswobben · 2018-07-04T13:44:50Z

Hi,
The library does apply batching to the requests (250 articles at a time). Therefore there is not batching / pagination method.

If you really need it you can do something like this:

from pymed import PubMed
pubmed = PubMed()

# Use the low level API to retrieve the article IDs that are related to the query
article_ids = pubmed._getArticleIds(query=query, max_results=9999999)

# This is an opportunity to show the number of results
print("The total number of results matching the query is", len(article_ids))

# Use the low level API to retieve the articles
# NOTE: pubmed._getArticles() already expects a list of article IDs (which will be processed in a single
# call to PubMed). In this sample I'm providing here I'll insert the article IDs one by one but please
# don't do this in your own code!
articles = [list(pubmed._getArticles(article_ids=[article_id]))[0] for article_id in article_ids]

# The preferred way it to make batches and give those batches to pubmed._getArticles() (which is
# what the library does...) like this:
from pymed.helpers import batches
batched_articles = [pubmed._getArticles(article_ids=batch) for batch in batches(article_ids, 250)]
for batch in batched_articles:
    for article in batch:
        # Do something here
        print(article.title)

The articles variable in the last example is a generator, so the next request is not made until you're done with this one.

I'll try to add some easier helper methods in the next release.

I hope that helps?

tomron · 2018-07-04T13:52:35Z

Thanks, I think it is a fair enough solution for now but would like to have advanced option such as count without retrieving all the ids, queries based on specific field, etc.

gijswobben · 2018-07-04T14:04:19Z

I'll take care of the count method ;)

As for the querying... It's possible to enter any PubMed query (also for specific fields). Try for example something like:

((tomron[Author]) AND ("2018/01/01"[Date - Create] : "3000"[Date - Create])) AND PubMed[Title]

(which will get you all articles published after the first of January 2018 (until now), by you with "PubMed" in the title)

Tip: Use the "advanced" query builder on the PubMed website and copy the query to your code for deeper analysis of the articles.

tomron · 2018-07-04T14:08:26Z

Super, thanks

gijswobben · 2018-07-04T14:26:52Z

Update: I've added a new method for counting the total number of matching articles (without retrieving any). It's now available in pymed version 0.8.1.

pip install pymed==0.8.1

from pymed import PubMed
pm = PubMed()
number_of_articles = pm.getTotalResultsCount(query="Occupational Health[Title]")
print("Number of articles with Occupational Health in the title is", number_of_articles)

gijswobben closed this as completed Jul 4, 2018

hemantbadhe mentioned this issue May 14, 2020

Pagination on response #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paginate \ count result #1

Paginate \ count result #1

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018 •

edited

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018

Paginate \ count result #1

Paginate \ count result #1

Comments

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018 • edited

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018

tomron commented Jul 4, 2018

gijswobben commented Jul 4, 2018

gijswobben commented Jul 4, 2018 •

edited