In this notebook, we're going to build a basic vndalism detection system.  We'll be connecting to the feed of edits from English Wikipedia and we'll be using the ORES 'damaging' model to flag edits that look like they need review. See the documentation on [Wikimedia Event Streams](https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams) for more information about these events and how to connect to the stream.  

First, we'll define a function that, given a "revision ID" (which represents a new edit to an article), we'll look up a prediction and return the probability (confidence level).  Then, we'll connect to the stream of edits and score each one as it comes in -- reporting each edit that crosses a specific threshold. 

In [1]:
!pip install oresapi sseclient



In [2]:
import json
from sseclient import SSEClient as EventSource
import oresapi



# Define a function for getting damage probability

In [3]:
ores_session = oresapi.Session(
    "https://ores.wikimedia.org",
    user_agent="ahalfaker@wikimedia.org -- ORES usage demo")

def get_damage_proba(rev_id):
    score_doc = list(ores_session.score('enwiki', ['damaging'], rev_id))[0]
    
    try:
        return score_doc['damaging']['score']['probability']['true']
    except:
        return None



In [4]:
get_damage_proba(12345678)

0.12206179165017522

# Connect to the stream and start scoring

In [5]:
url = 'https://stream.wikimedia.org/v2/stream/recentchange'
for event in EventSource(url):
    if event.event == 'message':
        try:
            change = json.loads(event.data)
        except ValueError:
            pass
        else:
            if 'revision' in change and change['wiki'] == 'enwiki':
                rev_id = change['revision']['new']
                proba = get_damage_proba(rev_id)
                print("https://en.wikipedia.org/wiki/Special:Diff/{0}".format(rev_id), 
                      round(proba, 2), "check" if proba and proba > 0.5 else "OK")

https://en.wikipedia.org/wiki/Special:Diff/937044201 0.04 OK
https://en.wikipedia.org/wiki/Special:Diff/937044205 0.01 OK
https://en.wikipedia.org/wiki/Special:Diff/937044204 0.73 check
https://en.wikipedia.org/wiki/Special:Diff/937044207 0.01 OK
https://en.wikipedia.org/wiki/Special:Diff/937044208 0.1 OK
https://en.wikipedia.org/wiki/Special:Diff/937044206 0.03 OK
https://en.wikipedia.org/wiki/Special:Diff/937044209 0.49 OK
https://en.wikipedia.org/wiki/Special:Diff/937044210 0.66 check
https://en.wikipedia.org/wiki/Special:Diff/937044212 0.54 check
https://en.wikipedia.org/wiki/Special:Diff/937044211 0.01 OK
https://en.wikipedia.org/wiki/Special:Diff/937044213 0.03 OK
https://en.wikipedia.org/wiki/Special:Diff/937044214 0.33 OK
https://en.wikipedia.org/wiki/Special:Diff/937044215 0.05 OK
https://en.wikipedia.org/wiki/Special:Diff/937044216 0.01 OK
https://en.wikipedia.org/wiki/Special:Diff/937044217 0.04 OK
https://en.wikipedia.org/wiki/Special:Diff/937044218 0.15 OK
https://en.wikip

KeyboardInterrupt: 