This repository contains material for the article "What Makes a Link Successful on Wikipedia?". First, we make a parsing framework for Wikipedia available. The Python framework is intended to extract different link features (e.g., network topological, visual, semantic similarity) from Wikipedia in order to study human navigation. It can be used in combination with the clickstream dataset by Ellery Wulczyn and Dario Taraborelli from Wikimedia. The corresponding Wikipedia XML dump can be found here. Click here for more recent dumps. Additionally, the repository contains sample data extracted from Wikipedia with this framework and utilized in the paper. We also make a notebook with R kernel available containing detailed methodological steps and results from the paper.
This folder contains all python scripts needed for setting up the database containing all Wikipedia links and their features.
MySQL, PyQt4, Xvfb, Graph Tool and a lot of RAM and free hard disk space.
The folder contains a R notebook with mixed-effects hurdle models.
This folder contains a sample of links and their features.
This project is published under the MIT License.