Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnexpectedEmptyPageError and associated errorscre #31

Closed
robisen1 opened this issue Jun 29, 2023 · 5 comments
Closed

UnexpectedEmptyPageError and associated errorscre #31

robisen1 opened this issue Jun 29, 2023 · 5 comments
Labels
question Further information is requested

Comments

@robisen1
Copy link

Please excuse me if I do this incorrectly. I a noob. I am using python 3.11 on Windows 11 and Ubuntu 22.04.2. on I have run into an error like this on arxiv as well as medarxiv:

arxiv.arxiv.UnexpectedEmptyPageError: Page of results was unexpectedly empty (http://export.arxiv.org/api/query?search_query=%28all%3Apyschological+flow+state%29&id_list=&sortBy=relevance&sortOrder=descending&start=29500&max_results=100)

this seems to be an issue in the original code and was patched here lukasschwab/arxiv.py#43

I did not see that and I took a similar path. My code can checks to see if a URL is malformed or is empty. It handles it and logs it. If it runs into a URL that is not responding or hangs it waits some user-defined amount of time and moves on. You can also make it create smaller jsonl for various reasons. I was also going to implement querying by date. Right now it's all hardcoded variables but I was thinking I should make it so that you can call the options from the command line or a config file. I am also thinking about multi-threaded and being able to throttle your calls to service and or a back-off algorithm. I don't know what I am supposed to do. Do I provide my fixes, if needed, and how or do I go to the arxiv team? I also think these issues lurk in other libraries but I have not made anything like extensive testing. Thank you I appreciate your time and paper scraper.

@jannisborn
Copy link
Owner

Hi @robisen1,
thanks for the interest and opening this issue.
Which version of paperscraper and arxiv are you using respectively?

The root of this problem lies in the arxiv API which is used from the arxiv package, so it's not directly related to this package. So it's a bit unclear what you expect from the paperscraper team to do here. Please specify. If you have local changes that are fixing some problems, feel free to open a PR

@jannisborn
Copy link
Owner

Closing this here, but please comment below if you expect anything to happen from our side.

@robisen1
Copy link
Author

robisen1 commented Jul 10, 2023 via email

@copperwiring
Copy link

Hi, I have the same issue:

raise UnexpectedEmptyPageError(url, try_index, feed)

Can the code have somewhere if this is returned it just skips and continues and goes to next fetching of data?

@jannisborn
Copy link
Owner

Hi,
First, can you please post your full error log, ideally with the query that produced it?
Which version do you use?

Since this is an issue from the arxiv package, have you opened an issue there? The fix done in lukasschwab/arxiv.py#43 is available through paperscraper since years. Does this error happen regularly or just occasionally?

@jannisborn jannisborn reopened this Aug 16, 2024
@jannisborn jannisborn added the question Further information is requested label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants