UnexpectedEmptyPageError and associated errorscre #31

robisen1 · 2023-06-29T05:58:48Z

Please excuse me if I do this incorrectly. I a noob. I am using python 3.11 on Windows 11 and Ubuntu 22.04.2. on I have run into an error like this on arxiv as well as medarxiv:

arxiv.arxiv.UnexpectedEmptyPageError: Page of results was unexpectedly empty (http://export.arxiv.org/api/query?search_query=%28all%3Apyschological+flow+state%29&id_list=&sortBy=relevance&sortOrder=descending&start=29500&max_results=100)

this seems to be an issue in the original code and was patched here lukasschwab/arxiv.py#43

I did not see that and I took a similar path. My code can checks to see if a URL is malformed or is empty. It handles it and logs it. If it runs into a URL that is not responding or hangs it waits some user-defined amount of time and moves on. You can also make it create smaller jsonl for various reasons. I was also going to implement querying by date. Right now it's all hardcoded variables but I was thinking I should make it so that you can call the options from the command line or a config file. I am also thinking about multi-threaded and being able to throttle your calls to service and or a back-off algorithm. I don't know what I am supposed to do. Do I provide my fixes, if needed, and how or do I go to the arxiv team? I also think these issues lurk in other libraries but I have not made anything like extensive testing. Thank you I appreciate your time and paper scraper.

jannisborn · 2023-07-10T10:46:09Z

Hi @robisen1,
thanks for the interest and opening this issue.
Which version of paperscraper and arxiv are you using respectively?

The root of this problem lies in the arxiv API which is used from the arxiv package, so it's not directly related to this package. So it's a bit unclear what you expect from the paperscraper team to do here. Please specify. If you have local changes that are fixing some problems, feel free to open a PR

jannisborn · 2023-07-10T20:34:15Z

Closing this here, but please comment below if you expect anything to happen from our side.

robisen1 · 2023-07-10T20:53:40Z

Sorry I am on a plane. I will try to comment on the next two days. Thank you

…

On Mon, Jul 10, 2023, 3:34 PM Jannis Born ***@***.***> wrote: Closing this here, but please comment below if you expect anything to happen from our side. — Reply to this email directly, view it on GitHub <#31 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSF2RKTXKY5RKTI76OKBTDXPRRNDANCNFSM6AAAAAAZYBBW7I> . You are receiving this because you were mentioned.Message ID: ***@***.***>

copperwiring · 2024-08-16T09:22:39Z

Hi, I have the same issue:

raise UnexpectedEmptyPageError(url, try_index, feed)

Can the code have somewhere if this is returned it just skips and continues and goes to next fetching of data?

jannisborn · 2024-08-16T09:29:43Z

Hi,
First, can you please post your full error log, ideally with the query that produced it?
Which version do you use?

Since this is an issue from the arxiv package, have you opened an issue there? The fix done in lukasschwab/arxiv.py#43 is available through paperscraper since years. Does this error happen regularly or just occasionally?

jannisborn closed this as completed Jul 10, 2023

jannisborn reopened this Aug 16, 2024

jannisborn added the question Further information is requested label Aug 19, 2024

jannisborn closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnexpectedEmptyPageError and associated errorscre #31

UnexpectedEmptyPageError and associated errorscre #31

robisen1 commented Jun 29, 2023

jannisborn commented Jul 10, 2023

jannisborn commented Jul 10, 2023

robisen1 commented Jul 10, 2023 via email

copperwiring commented Aug 16, 2024

jannisborn commented Aug 16, 2024

UnexpectedEmptyPageError and associated errorscre #31

UnexpectedEmptyPageError and associated errorscre #31

Comments

robisen1 commented Jun 29, 2023

jannisborn commented Jul 10, 2023

jannisborn commented Jul 10, 2023

robisen1 commented Jul 10, 2023 via email

copperwiring commented Aug 16, 2024

jannisborn commented Aug 16, 2024