

Welcome!

This  collection of notebooks lets you analyze the evolution of the revision history of a Wikipedia article (with live data). Among other things, it allows you to inspect **article- or editor-specific conflicts &  productivity.** The magic happens mostly in notebook 2 

Each Notebook can be **explored like a Web app, without interacting with the code behind it**. But you can also - if you choose to - click on "Show solution" after each block and work with the code directly (green button). 
 
---

**Please choose a small or medium-sized article for your first exploration to reduce computation times. ALL DATA IS FETCHED LIVE, so please be patient.** Filtering is done locally, so that will be quicker. To start you can, e.g., simply go with our default choice. Very large articles may lead to memory overload, depending on which Binder platform you are using. You could get more memory by shutting down some previous running notebooks.

When loaded the first time, all cells **should** run automatically and you should not see any code. If that is not the case, please just reload the tab in your browser. After chosing a new article, please rerun the cells/modules you want to use. 

In [2]:
from IPython.display import display, Javascript, HTML, clear_output
from ipywidgets import widgets, Output, interact, Layout
from ipywidgets.widgets import Dropdown
from datetime import datetime

## SOME EXTENTIONS ##
#%load_ext autoreload
%reload_ext autoreload
%autoreload 2
if 'the_page' not in locals():
    import pickle
    print("Loading default data...")
    the_page = pickle.load(open("data/the_page.p",'rb'))

display(Javascript('IPython.notebook.execute_cells_below()'))

display(Javascript('Jupyter.notebook.get_cells()'))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [3]:
%%capture
%store -r the_page

### <span style="color:green"> Modules Imported </span>

In [4]:
## Modules Imported ##

# Display
from IPython.display import display, Markdown as md, clear_output
from datetime import date
import urllib

# APIs
from wikiwho_wrapper import WikiWho
from external.wikipedia import WikipediaDV, WikipediaAPI
from external.wikimedia import WikiMediaDV, WikiMediaAPI
from external.xtools import XtoolsAPI, XtoolsDV

# Data Processing
import pickle
import pandas as pd

# Visualization tools
import qgrid
import matplotlib.pyplot as plt

# Page views timeline
from visualization.views_listener import ViewsListener

# Change actions timeline
from visualization.actions_listener import ActionsListener

# Conflicts visualization
from visualization.conflicts_listener import ConflictsListener, ConflictsActionListener
from visualization.calculator_listener import ConflictCalculatorListener

# Word cloud visualization
from visualization.wordcloud_listener import WCListener, WCActionsListener
from visualization.wordclouder import WordClouder

# Wikipedia talk pages visualization
from visualization.talks_listener import TalksListener
from visualization.topics_listener import TopicsListener

# Tokens ownership visualization
from visualization.owned_listener import OwnedListener

# Templates visualization
from visualization.templates_listener import ProtectListener, TemplateListener

# Metrics management
from metrics.conflict import ConflictManager
from metrics.token import TokensManager

# For language selection
from utils.lngselection import abbreviation, lng_listener

---

# A. Selecting a Wikipedia article

Let's start: The default example is the article "The Camp of the Saints" (a novel). But you can enter/search an article of your choice and explore it as well. 

In [5]:
# the method that listens to the click event
def on_button_clicked(b):
    global wikipedia_dv
    global the_page
    # use the out widget so the output is overwritten when two or more
    # searches are performed
    with out:
        try:            
            # query wikipedia
            wikipedia_dv = WikipediaDV(WikipediaAPI(lng=abbreviation(languageSelection.value)))
            search_result = wikipedia_dv.search_page(searchTerm.value)
            the_page = wikipedia_dv.get_page(search_result)
            %store the_page
            clear_output()
            display(md(f"The page that was found: **{the_page['title']}**"))
            display(md(f"Page id: **{the_page['page_id']}**"))
            url = f"{wikipedia_dv.api.base}action=query&titles={urllib.parse.quote_plus(the_page['title'])}&format=json"
            display(md(f"Metadata can be found in:"))
            print(url)
            #display(Javascript('Jupyter.notebook.execute_cells([8])'))

        except:
            clear_output()
            display(md(f'The page title *"{searchTerm.value}"* was not found'))
            #display(Javascript('Jupyter.notebook.execute_cells([8])'))

# Language selection.
languageSelection = Dropdown(options=['English', 'Deutsch', 'Español', 'Türkçe', 'Euskara'], value='English', description='Language:')

# by default display the last search
try:
    searchTerm = widgets.Text(the_page['title'], description='Page title:')
except:
    searchTerm = widgets.Text("The Camp of the Saints", description='Page title:')

# Update selected language
initial_select = widgets.interactive(lng_listener, lng=languageSelection, search_term=searchTerm)
display(initial_select.children[0])
display(initial_select.children[1])

# create and display the button    
button = widgets.Button(description="Search")
example = md("e.g. *The Camp of the Saints*")
display(example, button)

# the output widget is used to remove the output after the search field
out = Output()
display(out)

# set the event
button.on_click(on_button_clicked)

# trigger the event with the default value
on_button_clicked(button)

Dropdown(description='Language:', options=('English', 'Deutsch', 'Español', 'Türkçe', 'Euskara'), value='Engli…

Text(value='The Camp of the Saints', description='Page title:')

e.g. *The Camp of the Saints*

Button(description='Search', style=ButtonStyle())

Output()

<span style="color: #626262"> Try yourself! This is what will happen when you click 'Search' button: </span>

In [6]:
%%script false --no-raise-error

### IMPORTANT NOTE: COMMENT THE ABOVE LINE TO EXECUTE THE CELL ###

### --------------------------------------------------------------------- ###
### TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Search' BUTTON ###
### --------------------------------------------------------------------- ###

## This is the default data and used for initialization ##
the_page = pickle.load(open("data/the_page.p",'rb'))  # global
title_default = the_page['title']
print('The pre-filled value for the title:', title_default)

## The search term you have input ##

# this was extracted from the previous search box, searchTerm, which is built by
# e.g. searchTerm = widgets.Text(the_page['title'], description='Page title:'), you could
# know more about it through https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html
search_language = languageSelection.value
language_for_api = abbreviation(search_language)

search_value = searchTerm.value
print('The language you use now:', search_language)
print('The value you input in search box:', search_value)

# of course you could also update the value here, like in the "Page title:" box above.
#search_value = 'Matrix Completion' 

## query wikipedia using WikiWho API, more details please see:                ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikipedia.py ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py       ##
wikipedia_dv = WikipediaDV(WikipediaAPI(lng=language_for_api)) # create an instance
result_after_search = wikipedia_dv.search_page(search_value)

print("The page that was found:", result_after_search)

## Get page id through get_page() method ##
the_page = wikipedia_dv.get_page(result_after_search)  # global
page_id = the_page['page_id']
print('Page id:', page_id)

## Metadata ##
url = f"{wikipedia_dv.api.base}action=query&titles={urllib.parse.quote_plus(result_after_search)}&format=json"
print('Metadata can be found in:', url)

---

# B. General Statistics

Now, lets get live data of some general statistics (through Xtools API (1))

In [7]:
def xtools_on_click(b):
    with out_xtools:
        clear_output()
        xtools_api = XtoolsAPI(lng=abbreviation(languageSelection.value))
        xtools_dv = XtoolsDV(xtools_api)
        page_info = xtools_dv.get_page_info(the_page['title'])
        page_info['assessment'] = page_info['assessment']['value'] if type(page_info['assessment']) != bool else page_info['assessment']

        page_info = page_info.to_frame('value').rename(index={
            'project': 'Project name',
            'page': 'Page name',
            'watchers': 'Watchers (2)',    'pageviews': f"Page Views (per {page_info['pageviews_offset']} days)",
            'revisions': 'Revisions',
            'editors': 'Editors',
            'author': 'Creator of the page',
            'created_at': 'Creation Date',
            'created_rev_id': 'Creation revision id',
            'modified_at': 'Last modified',
            'last_edit_id': 'Last revision id',
            'assessment': 'Content Assessment (3)',
        }).drop(index = ['pageviews_offset', 'author_editcount', 'secs_since_last_edit','elapsed_time'])
        
        display(md(f"***Page: {the_page['title']} ({abbreviation(languageSelection.value).upper()})***"))
        url = f"{xtools_dv.api.base}page/articleinfo/{xtools_dv.api.project}/" + urllib.parse.quote(the_page['title'])
        display(md(f"Metadata can be found in"))
        print(url)
        display(page_info)
        #display(Javascript('Jupyter.notebook.execute_cells([14])'))           
        #display(Javascript('Jupyter.notebook.execute_cells([18])'))
        

# create and display the button    
button = widgets.Button(description="Get Page Info")
display(button)

# the output widget is used to remove the output after the search field
out_xtools = Output()
display(out_xtools)

# set the event
button.on_click(xtools_on_click)

# trigger the event with the default value
xtools_on_click(button)

Button(description='Get Page Info', style=ButtonStyle())

Output()

<sup>**(1)** *A community-built service for article statistics at xtools.wmflabs.org* **(2)** *Users that added this page to their watchlist.* **(3)** *See [Wikipedia Content Assessment](https://en.wikipedia.org/wiki/Wikipedia:Content_assessment)*</sup>


<span style="color: #626262"> Try yourself! This is what will happen when you click 'Get Page Info' button: </span>

In [8]:
%%script false --no-raise-error

### IMPORTANT NOTE: COMMENT THE ABOVE LINE TO EXECUTE THE CELL ###

### ----------------------------------------------------------------------------- ###
### TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Get Page Info' BUTTON  ###
### ----------------------------------------------------------------------------- ###

## Define a Xtools instance, more details see:                             ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/xtools.py ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py    ##
xtools_api = XtoolsAPI(lng=language_for_api)
xtools_dv = XtoolsDV(xtools_api)
print('Provided through the Xtools API (1)')

## The page you are insterested in ##
print('The page that was found:', the_page['title'], f'({abbreviation(languageSelection.value).upper()})')

## Get the page info through Xtools method get_page_info() ##
page_info = xtools_dv.get_page_info(the_page['title'])

## Metadata ##
url = f"{xtools_dv.api.base}page/articleinfo/{xtools_dv.api.project}/" + urllib.parse.quote(the_page['title'])
print("Metadata can be found in:", url)

## Use a dictionary to construct a pd.DataFrame to present the general info from Xtools ##
dict_for_df = {
    'Project name': page_info['project'], 'Page name': page_info['page'], 'Watchers(2)': page_info['watchers'],
    'Page Views (per 30 days)': page_info['pageviews'], 'Revisions': page_info['revisions'], 
    'Editors': page_info['editors'], 'minor_edits': page_info['minor_edits'], 'Creator of the page': page_info['author'],
    'Creation Date': page_info['created_at'], 'Creation revision id': page_info['created_at'],
    'Last modified': page_info['modified_at'], 'Last revision id': page_info['last_edit_id'],
    'Content Assessment (3)': page_info['assessment']['value'] if type(page_info['assessment']) != bool else page_info['assessment']
}

df_info = pd.DataFrame.from_dict(dict_for_df, orient='index', columns=['value'])
display(df_info)

## Some footnotes ##
display(md('<sup>**(1)** *A community-built service for article statistics at xtools.wmflabs.org*' 
           '**(2)** *Users that added this page to their watchlist.*' 
           '**(3)** *See [Wikipedia Content Assessment](https://en.wikipedia.org/wiki/Wikipedia:Content_assessment)*</sup>'))

---

# C. Page Views

Provided through the Wikimedia API

In [9]:
def pageviews_button(b):
    with out_pageviews:
        clear_output()
        
        display(md(f"***Page: {the_page['title']} ({abbreviation(languageSelection.value).upper()})***"))
        # Query request
        wikimedia_api = WikiMediaAPI(lng=abbreviation(languageSelection.value))
        wikimedia_dv = WikiMediaDV(wikimedia_api)
        views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')
        
        # Visualization
        listener = ViewsListener(views)
        inter_func = interact(listener.listen, 
                         begin=Dropdown(options=views.timestamp),
                         end=Dropdown(options=views.timestamp.sort_values(ascending=False)),
                         granularity=Dropdown(options=['Yearly', 'Monthly', 'Weekly', 'Daily'], value='Monthly'))

        # raw data url
        start = 19900101
        today = datetime.today().strftime("%Y%m%d")
        end = int(today)
        article_name = urllib.parse.quote(the_page['title'])
        granularity = 'daily'

        url = (f'{wikimedia_dv.api.base}metrics/pageviews/per-article/{wikimedia_dv.api.project}/'
                f'all-access/all-agents/{article_name}/{granularity}/{start}/{end}')
        display(md(f"Metadata can be found in:"))
        print(url)
                   
        # The df_plotted keeps a reference to the plotted data above
        pageviews_agg = listener.df_plotted['views'].agg({
                            'Total views': sum,
                            'Max views period': max,
                            'Min views period': min,
                            'Average views': min,}).to_frame('Value')
        display(pageviews_agg)
        
        

# create and display the button    
button = widgets.Button(description="Get Pageviews", layout=Layout(width='150px'))
display(button)

# the output widget is used to remove the output after the search field
out_pageviews = Output()
display(out_pageviews)

# set the event
button.on_click(pageviews_button)

# trigger the event with the default value
pageviews_button(button)

Button(description='Get Pageviews', layout=Layout(width='150px'), style=ButtonStyle())

Output()

<span style="color: #626262"> Try yourself! This is what will happen when you click 'Get Pageviews' button: </span>

In [10]:
%%script false --no-raise-error

### IMPORTANT NOTE: COMMENT THE ABOVE LINE TO EXECUTE THE CELL ###

### ----------------------------------------------------------------------------- ###
### TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Get Pageviews' BUTTON  ###
### ----------------------------------------------------------------------------- ###

## define a WikiMediaAPI instance, more details see:                          ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikimedia.py ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py       ##


wikimedia_api = WikiMediaAPI(lng=abbreviation(languageSelection.value))
wikimedia_dv = WikiMediaDV(wikimedia_api)

## Page of insterest ##
print('The page that was found:', the_page['title'], f'({abbreviation(languageSelection.value).upper()})')

## get pageview counts for the article, more details see:                     ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikimedia.py ##

views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')

## Visualization, core visual code lies in ViewsListener, then the interact function    ##
## make it interactive, mode details see:                                               ##
## https://github.com/gesiscss/wikiwho_demo/blob/master/visualization/views_listener.py ##
## https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html           ##

# Create a ViewListener instance with the page views counts
listener = ViewsListener(views)

# You could customize begin/end dates and granularity to generate different graphs
# e.g. begin='20160809', end='20191020', granularity='Monthly'
begin='20150701'
end='20210314'
granularity='Monthly' # 'Yearly', 'Monthly', 'Weekly' ,'Daily' 

# Metadata
url = (f'{wikimedia_dv.api.base}metrics/pageviews/per-article/'
       f"{wikimedia_dv.api.project}/all-access/all-agents/{urllib.parse.quote(the_page['title'])}/{granularity.lower()}"
       f"/{int(begin)}/{int(end)}")
print('Metadata can be found in:', url)
print('(Note that: the time unit for the response data only supports granularity for "daily" and "monthly")\n')

# Time range you have selected
print('Time range you have selected:')
print('Start date:', begin)
print('End date:', end)
print('Granularity:', granularity)

# Visulization
listener.listen(begin, end, granularity)

# Pageviews aggregation data. Use the attribute "df_plotted".
pageviews_agg = listener.df_plotted['views'].agg({
                    'Total views': sum,
                    'Max views period': max,
                    'Min views period': min,
                    'Average views': min,}).to_frame('Value')

print('Total views of this page:', pageviews_agg['Value']['Total views'])
print('Max views during the selected period:', pageviews_agg['Value']['Max views period'])
print('Min views during the selected period:', pageviews_agg['Value']['Min views period'])
print('Average views during the selected period:', pageviews_agg['Value']['Average views'])

---

# D. Templates and Protection

Provided through the WikiWho API.

In [11]:
def template_button(b):
    global token_source
    global new_template
    global new_protect
    global token_source
    with template_out:
        clear_output()
        
        # WikiWho API.
        wikiwho = WikiWho(lng=abbreviation(languageSelection.value))
        display(md("Downloading all_content from the WikiWhoApi..."))
        content = wikiwho.dv.all_content(the_page['page_id'])
        display(md("Downloading revisions from the WikiWhoApi..."))
        revisions = wikiwho.dv.rev_ids_of_article(the_page['page_id'])
        clear_output()
        
        # Wikipedia API
        pp_log = wikipedia_dv.get_protection(the_page['title'])

        # Use ConflictManager to join content and revision tables.
        cm = ConflictManager(content,
                             revisions, 
                             lng=abbreviation(languageSelection.value), 
                             include_stopwords=True)
        cm.calculate()
        clear_output()
        token_source = cm.all_actions.copy()
        
        display(md(f"***Page: {the_page['title']} ({abbreviation(languageSelection.value).upper()})***"))

        # Templateslistener
        new_protect = ProtectListener(pp_log, lng=abbreviation(languageSelection.value))
        
        display(md("Analysing protection data..."))
        plot_protect = [new_protect.get_protect(level)[1] for level in ["semi_edit", "semi_move", "fully_edit", "fully_move", "unknown"]]
        plot_protect = pd.concat(plot_protect)

        new_template = TemplateListener(token_source, plot_protect, lng=abbreviation(languageSelection.value),
                                       wikipediadv_api=wikipedia_dv, page=the_page)
        new_template.listen()
        
        
# create and display the button    
button = widgets.Button(description="Get Templates & Protection", layout=Layout(width='200px'))
display(button)

# the output widget is used to remove the output after the search field
template_out = Output()
display(template_out)

# set the event
button.on_click(template_button)

# trigger the event with the default value
template_button(button)

Button(description='Get Templates & Protection', layout=Layout(width='200px'), style=ButtonStyle())

Output()


After we have no seen some general statistics of the article and the views it attracted, we will go on to take a look at what specific kinds of changes by which editors it was subject to over time. 

Click below to go to the next notebook. You can later come back to this notebook and simply enter another article name to start the process over with that new article. 

In [12]:
from utils.notebooks import get_next_notebook
display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))

re_hide = """
<script>
var update_input_visibility = function () {
    Jupyter.notebook.get_cells().forEach(function(cell) {
        if (cell.metadata.hide_input) {
            cell.element.find("div.input").hide();
        }
    })
};
update_input_visibility();
</script
"""
display(HTML(re_hide))

scroll_to_top = """
<script>
document.getElementById('notebook').scrollIntoView();
</script
"""
display(HTML(scroll_to_top))