# Media Outlets Activity on Wikipedia aggregated by Media Type

The parameters in the cell below can be adjusted to explore other media outlets and time frames.

### How to explore other media types?
The ***media*** parameter can be use to aggregate media outlets by their type. The column `subcategory` in this [this other notebook](../media.ipynb?autorun=true) show the media outlets that belong each type.

***Alternatively***, you can direcly use the [organizations API](http://mediamonitoring.gesis.org/api/organizations/swagger/), or access it with the [SMM Wrapper](https://pypi.org/project/smm-wrapper/).

## A. Set Up parameters

In [None]:
# Parameters: 
media = 'TV'
from_date = '2017-09-01'
to_date = '2018-12-31'
aggregation = 'week'

In [None]:
blacklist = [
#media
'Rbb 88.8', 'Deutsche Wirtschaftsnachrichten', 'Hessische/Niedersächsische Allgemeine', 
'Mediengruppe Straubinger Tagblatt/Landshuter Zeitung', 'Niedersächsische Allgemeine', 
'Sindelfinger Zeitung/Böblinger Zeitung', 'Rbb 88.8', 'Rbb24']

## B. Using APIs
### B.1 Using the SMM Organization API

In [None]:
import pandas as pd
from smm_wrapper import SMMOrganizations

# Create an instance to the smm wrapper
smm = SMMOrganizations()

# Request the politicians from the API
df = smm.dv.get_organizations()
df=df[~df['name'].isin(blacklist)]
df = df[(df['category']=='media')]

# Filter the accounts by party, and valid ones (the ones that contain wp_ids)
media_df = (df[(df['subcategory'].str.contains(media)) & (df['wp_ids'].notnull())]) if media != 'All' else df[df['wp_ids'].notnull()]

# query the Social Media Monitoring API
if len(media_df)!= 0:
    wiki_chobs = pd.concat(smm.dv.wikipedia(_id=organization_id, from_date=from_date, to_date=to_date, aggregate_by=aggregation) 
               for organization_id in media_df.index)
    wiki_chobs = wiki_chobs.groupby('date').agg({'chobs': 'sum'}).reset_index()
else:
    print("No data for this wikipedia account found")

### B.2 Using the Wikiwho API

In [None]:
from wikiwho_wrapper import WikiWho

#using wikiwho to extract conflicts and revisions
ww = WikiWho(lng='de')
edit_persistance_gen = (
    ww.dv.edit_persistence(page_id=wp_id , start=from_date, end=to_date) 
    for wp_ids in media_df['wp_ids'] for wp_id in wp_ids)
wiki_data = pd.concat(df for df in edit_persistance_gen if len(df) > 0)
wiki_data['undos'] = wiki_data['dels'] + wiki_data['reins']
wiki_data['date'] = pd.to_datetime(wiki_data['year_month'])
wiki_data = wiki_data.groupby('date')['conflict','elegibles','undos'].sum().reset_index()
wiki_data['conflict_score'] = wiki_data['conflict'] / wiki_data['elegibles']
wiki_data.fillna(0, inplace=True)

### B.3 Using the Wikimedia API

In [None]:
import requests
import urllib.parse

# open a session
session = requests.Session()

# prepare url
vurl = ("https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article"
        "/de.wikipedia.org/all-access/user/{wp_title}/daily/"
        f"{from_date.replace('-','')}/{to_date.replace('-','')}")

# cleaning method for failed responses of wikimedia Views
def clean_json(_json):
    """ Complete the json with and empty `items` key when the wikimedia
    views fails to find the article
    """
    if 'items' not in _json:
        _json['items'] = []
    return _json
        
# use the wikimedia API to download the views
views = pd.concat([pd.DataFrame(clean_json(session.get(url=vurl.format(
           wp_title=urllib.parse.quote(wp_title, safe=''))).json())['items']) 
           for wp_titles in media_df['wp_titles'] for wp_title in wp_titles])
views['timestamp']=pd.to_datetime(views['timestamp'], format='%Y%m%d%H')

# weekly or monthly aggregation of the data
if aggregation  == 'week':    
    views = views.groupby([pd.Grouper(key='timestamp', freq='W-SUN')])['views'].sum().reset_index().sort_values('timestamp')
    views['timestamp'] = views['timestamp'] - pd.Timedelta(days=6)
    
elif aggregation  == 'month':    
    views = views.groupby([pd.Grouper(key='timestamp', freq='MS')])['views'].sum().reset_index().sort_values('timestamp')

## C. Plotting
### C.1 Plot Wikipedia Activity

In [None]:
import plotly
from plotly import graph_objs as go
plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": [go.Scatter(x=views['timestamp'], y=views['views'], name='Views'),
             go.Scatter(x=wiki_chobs['date'], y=wiki_chobs['chobs'], name='Changes', yaxis='y2')], 
    "layout": go.Layout( title='Wikipedia Activity', yaxis=dict(title='Views'),
    yaxis2=dict(title='Changes', overlaying='y', side='right'))
})

### C.2 Plot Wikipedia Disagreement

In [None]:
plotly.offline.iplot({
    "data": [go.Scatter(x=wiki_data['date'], y=wiki_data['undos'], name='Undos', line_shape='spline'),
            go.Scatter(x=wiki_data['date'], y=wiki_data['conflict_score'], name='Conflict', line_shape='spline', yaxis='y2')], 
    "layout": go.Layout(title='Wikipedia Disagreement', yaxis=dict(title='Undos'),
    yaxis2=dict(title='Conflict', overlaying='y',side='right'))
})