Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Wikipedia Data Analysis Toolkit
branch: master
Failed to load latest commit information.
doc Fig:ETL-schema add blank space at bottom
wikidat Fix bug inserting info users with id = 0
.gitignore Initial commit
AUTHORS AUTHORS: maintainer and code contributions; fixed README
GettingStarted.txt Calculate country contributions script
NEWS fixed README format, adding project info files Fancy title for GPLv3 link Fix small typo initial version of main; setup file for project installation


Wikipedia Data Analysis Toolkit

  • Author: Felipe Ortega.
  • Contributors: Carlos Martínez, Efrayim D Zitron, Aaron Halfaker.
  • License: GPLv3.

The aim of WikiDAT is to create an extensible toolkit for Wikipedia data analysis, using Python and R.

Several tools are included to automate the extraction and preparation of Wikipedia data from different sources. Their execution can be parallelized in multi-core computing environments, and they are highly customizable with a single configuration file.

Different case studies illustrate how to analyze and visualize data from Wikipedia in any language. Outcomes are stored in subdirectories results, figs or traces, inside the main directory for each case. More cases will be included progressively, covering typical examples of quantitative analyses that can be undertaken with Wikipedia data.

Currently, WikiDAT is compatible with either MySQL or MariaDB for local data storage. Support for PostgreSQL will be available soon (code is being ported). Additional support for unstructured data with MongoDB is also planned.

Required dependencies

For a complete list of hardware and software requirements, please check the file.

Something went wrong with that request. Please try again.