This file describes the requirements and steps needed to produce a time series visualization of Engish Wikipedia traffic, split by mobile and desktop sites during January 2008 to September 2017, using Jupyter notebook and the Wikimedia Rest API.
- Jupyter notebook running an R kernel.
- This can be done locally by installing Juypter notebook or, alternatively, on the Jupyter server.
- If done locally, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing Jupyter.
- Jupyter/jupyter is licensed under the BSD 3-clause "New" or "Revised" License.
- R as a package is licensed under GPL-2 | GPL-3.
- Python 2.2 and above are licensed as per https://docs.python.org/3/license.html
- Content accessed through the Wikimedia Rest API is licensed under the CC-BY-SA 3.0 and GFDL licenses.
- Use of the Wikimedia Rest API is under the Wikimedia Terms of Use.
The Wikimedia Rest API has two endpoints for Wikipedia traffic:
- The Pagecounts API which provides access to desktop and mobile traffic data from January 2008 to July 2016.
- The Pageviews API which provides acesss to desktop, mobile web and mobile app traffic data from July 2015 to present.
The data file created as part of this project has the following structure:
- year (integer) - the year to which the traffic relates {2008, 2009, ... 2017}.
- month (integer) - the month to which the traffic relates {1, 2, ... 12}.
- pagecount_all_views - the total number of views (English desktop and mobile sites) as defined by the Pagecounts API.
- pagecount_desktop_views - the total number of views for the English desktop site as defined by the Pagecounts API.
- pagecount_mobile_views - the total number of views for the English mobile site as defined by the Pagecounts API.
- pageview_all_views - the total number of views (English desktop and mobile sites) as defined by the Pageviews API.
- pageview_desktop_views - the total number of views for the English desktop site as defined by the Pageviews API.
- pageview_mobile_views - the total number of views for the English mobile site as defined by the Pageviews API.
Please note that views from the Pagecounts API includes views from non-human agents (e.g. spiders and webcrawlers). Views from the Pageviews API has been filtered to exclude views from non-human agents.
This Jupyter notebook contains the steps and code needed to reproduce this analysis.