Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'sha1' #19

Closed
he7d3r opened this issue Dec 22, 2014 · 5 comments
Closed

KeyError: 'sha1' #19

he7d3r opened this issue Dec 22, 2014 · 5 comments

Comments

@he7d3r
Copy link
Contributor

he7d3r commented Dec 22, 2014

from mw.api import Session
from mw.lib import reverts
session = Session("https://pt.wikipedia.org/w/api.php")
reverts.api.check_rev(session, {'revid':40679746,'pageid':69874})
reverts.api.check_rev(session, {'revid': 40679734, 'pageid': 24386})

works fine, however

reverts.api.check_rev(session, {'revid':40679748,'pageid':40677})

causes the following error:

Traceback (most recent call last):
  File "demonstrate_scorer_on_rc.py", line 79, in <module>
    reverted = 0 if reverts.api.check_rev(session, rev) is None else 1
  File "<mypath>/lib/python3.4/site-packages/mw/lib/reverts/api.py", line 40, in check_rev
    return check(session, rev_id, page_id=page_id, **kwargs)
  File "<mypath>/lib/python3.4/site-packages/mw/lib/reverts/api.py", line 115, in check
    for revert in detect(checksum_revisions, radius=radius):
  File "<mypath>/lib/python3.4/site-packages/mw/lib/reverts/functions.py", line 40, in detect
    for checksum, revision in checksum_revisions:
  File "<mypath>/lib/python3.4/site-packages/mw/lib/reverts/api.py", line 110, in <genexpr>
    ((rev['sha1'], rev) for rev in past_revs),
KeyError: 'sha1'
@halfak
Copy link
Member

halfak commented Dec 23, 2014

Now that's a trick. The API is not returning a sha1, but we'd like to be able to detect a revert. This means that we might just give up on detecting a revert for this revision. We might also try to go to the API to look for content so we can generate our own sha1.

@he7d3r
Copy link
Contributor Author

he7d3r commented Dec 23, 2014

@he7d3r
Copy link
Contributor Author

he7d3r commented Jan 25, 2015

I noticed the same problem with other revisions for which there is a sha1:

@he7d3r
Copy link
Contributor Author

he7d3r commented Jan 25, 2015

The page 40677 contains the revision 40327912, which has an attribute "sha1hidden": "" (and a sha1 which seems to be visible to sysops only):
https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids|timestamp|user|size|sha1&rvlimit=20&pageids=40677
However it is far from the revision we are interested in. The error does not happen if I run

reverts.api.check_rev(session, {'revid':40679748,'pageid':40677}, radius=7)

but it happens when I increase the radius to 8:

reverts.api.check_rev(session, {'revid':40679748,'pageid':40677}, radius=8)

Using a big radius should not be a reason for failure, because 40679748 was reverted by the revision which is right after it in the history (40686393), and a radius of 2 is enough to detect that (BTW: shouldn't it radius 1 be enough too?). In other words, we have something like this:
A[sha1hidden] --> B --> C --> D --> E --> F --> G --> H --> I[reverted] --> H

@halfak
Copy link
Member

halfak commented Feb 3, 2015

Fixed in a293667

And updated in v0.4.11

@halfak halfak closed this as completed Feb 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants