Skip to content

mykebrowne/data-512-a1

Repository files navigation

data-512-a1

Repo for DATA_512 Assignment 1

This file describes the requirements and steps needed to produce a time series visualization of Engish Wikipedia traffic, split by mobile and desktop sites during January 2008 to September 2017, using Jupyter notebook and the Wikimedia Rest API.

Software requirements

  • Jupyter notebook running an R kernel.
  • This can be done locally by installing Juypter notebook or, alternatively, on the Jupyter server.
  • If done locally, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing Jupyter.

Licensing

API documentation

The Wikimedia Rest API has two endpoints for Wikipedia traffic:

  • The Pagecounts API which provides access to desktop and mobile traffic data from January 2008 to July 2016.
  • The Pageviews API which provides acesss to desktop, mobile web and mobile app traffic data from July 2015 to present.

Data file

The data file created as part of this project has the following structure:

  • year (integer) - the year to which the traffic relates {2008, 2009, ... 2017}.
  • month (integer) - the month to which the traffic relates {1, 2, ... 12}.
  • pagecount_all_views - the total number of views (English desktop and mobile sites) as defined by the Pagecounts API.
  • pagecount_desktop_views - the total number of views for the English desktop site as defined by the Pagecounts API.
  • pagecount_mobile_views - the total number of views for the English mobile site as defined by the Pagecounts API.
  • pageview_all_views - the total number of views (English desktop and mobile sites) as defined by the Pageviews API.
  • pageview_desktop_views - the total number of views for the English desktop site as defined by the Pageviews API.
  • pageview_mobile_views - the total number of views for the English mobile site as defined by the Pageviews API.

Please note that views from the Pagecounts API includes views from non-human agents (e.g. spiders and webcrawlers). Views from the Pageviews API has been filtered to exclude views from non-human agents.

Steps to reproduce analysis

This Jupyter notebook contains the steps and code needed to reproduce this analysis.

About

Repo for DATA_512 Assignment 1

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published