Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing 'title' attribute causes "AttributeError: object has no attribute 'title'" error #71

Closed
lance10t opened this issue Jun 19, 2021 · 4 comments · Fixed by #72
Closed
Assignees
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.

Comments

@lance10t
Copy link

Description

A clear and concise description of what the bug is.
In some edge cases, the entry returned by arXiv does not contain a valid 'title' tag (e.g. https://arxiv.org/abs/2104.12255v1). This causes an error in arxiv.py line 116:

Traceback (most recent call last):
File ".\retrieve_arxiv.py", line 21, in
for result in big_slow_client.get(unrestricted_search):
File "E:\Dropbox\Coding\arXiv\arxiv\arxiv.py", line 547, in get
yield Result._from_feed_entry(entry)
File "E:\Dropbox\Coding\arXiv\arxiv\arxiv.py", line 116, in _from_feed_entry
title=re.sub(r'\s+', ' ', entry.title),
File "C:\Users\gerry\Miniconda3\envs\arxiv\lib\site-packages\feedparser\util.py", line 158, in getattr
raise AttributeError("object has no attribute '%s'" % key)
AttributeError: object has no attribute 'title'

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

from arxiv import arxiv
import csv

search = arxiv.Search(
id_list=['2104.12255v1'],
sort_by = arxiv.SortCriterion.LastUpdatedDate
)

for result in search.get():
print(result.entry_id)
print(list(result.dir()))
print()

Expected behavior

A clear and concise description of what you expected to happen.
Missing title attribute should be checked and imputed since it is a key field

Versions

  • python version:
    Python 3.8
  • arxiv.py version:
    arxiv.py == 1.2.0

Additional context

Add any other context about the problem here.
A workaround patch on my local copy worked:
Lines 541onwards
# Yield query results until page is exhausted.
for entry in feed.entries:
# BUG: Fixes a bug where sometimes the entry does not return with a title in the feed
# E.g. https://arxiv.org/abs/2104.12255v1
if not hasattr(entry, 'title'):
entry['title'] = ''
yield Result._from_feed_entry(entry)

@lance10t lance10t added the bug Deviations from documented behavior. label Jun 19, 2021
@lukasschwab lukasschwab added the api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. label Jun 20, 2021
@lukasschwab
Copy link
Owner

lukasschwab commented Jun 20, 2021

Huh! What an odd edge case. Adding the api tag as a reminder to mention this with the arXiv API folks––this field should either be present and empty or the API docs should indicate the field is optional.

Thinking about work-arounds.

Missing title attribute should be checked and imputed

What seems more sensible as a default?

  • None
  • ""

I lean towards None––this seems distinct from having a result specify an empty title.

@lukasschwab
Copy link
Owner

Sent a message to the Google Group. It may be safe to assume that any entry without a title element has precisely the title "0," but I'd like to have confirmation before a bake that assumption into a release.

@lance10t
Copy link
Author

Huh! What an odd edge case. Adding the api tag as a reminder to mention this with the arXiv API folks––this field should either be present and empty or the API docs should indicate the field is optional.

Thinking about work-arounds.

Missing title attribute should be checked and imputed

What seems more sensible as a default?

  • None
  • ""

I lean towards None––this seems distinct from having a result specify an empty title.

Thanks for the quick message back. Yeah, the longer term solution you mentioned makes sense.

I agree with you - None would make more sense and hopefully the regression tests can check to make sure it doesn't introduce faults in other areas (I have not really read the entire code base, just those that was impacting the runs).

@lukasschwab
Copy link
Owner

Got confirmation that this is an API bug. I'll write a patch that defaults the title to "0", because that's the only value we know to cause this bug; as far as I can tell, when an Atom result is missing its title attribute, that indicates the title is "0".

There's a chance we can't draw that conclusion. If we observe other titles causing this issue––e.g. "False", "null", "", and so on––we should default to None and dealing with the downstream effects.

I'll add an accompanying test case.

lukasschwab added a commit that referenced this issue Jun 23, 2021
+ Remove .get usage in new unit test
+ Remove .get reference in error docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants