Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Extract a (social) network from a mediawiki dump
branch: master

This branch is 12 commits behind volpino:master

Failed to load latest commit information.
django_wikinetwork lovely day for pep8
sonet lovely day for pep8
stats lovely day for pep8
tests lovely day for pep8
.gitignore fixing usercontributions with quartiles and process_username and proc…
.gitmodules Adding django project to store data in DB. Adding django-evolution as…
.pylintrc pylinting and fixing pywc_plot
.travis.yml nose-1.0.0 from easy_install returns 404
Makefile Trying to make Makefile usable
README.md update readme
__init__.py Fixed a bug (with Max) in django data report views. getUserClass() an…
countries_stats.py minor fix
counttemplates.py some pep8 and cleaning
countwords-groups-sender.py some pep8 and cleaning
countwords_groups.py some pep8 and cleaning
csv_manipulation.py don't output empty lines
download_current_events.py some pep8 and cleaning
download_dumps.sh minor fix
download_page.py some pep8 and cleaning
events_analysis.py some pep8 and cleaning
events_anniversary.py some pep8 and cleaning
gender_stats.py some pep8 and cleaning
graph_analysis.py some pep8 and cleaning
graph_enrich.py add license
graph_longitudinal_analysis.py add license
hist.gnuplot Code for matplotlib bar chart added (but not using it, it's too slow)
pywc.py lovely day for pep8
pywc_derived_data.py if name==main pywc_derived_data
pywc_dic_check.py pywc_plot sliding window + dic_check
pywc_plot.py lovely day for pep8
pywc_revision.py minor bug fix
pywc_revision_merge.py detailed output for pywc revision
pywc_revision_plot.py block fix
random_pages.py option fix
revisions_page.py lovely day for pep8
revisions_talk.py minor fix
setup.cfg rednose doesn't work with node-exclude
setup.py travis fix
signature2graph.py added date/time to logging - no print, yes logging.info!
usercontributions.py usercontribution fix
usercontributions_export.py bug fixed
utpedits2graph.py pylinting
word_frequency.py Added a comment in the 5 files for explaining the line src = deflate(…

README.md

Install

To solve dependencies: ./setup.py develop

Scripts

utpedits2graph.py

Count edits on User Talk Pages and create a graph from it. Save the graph as a pickled iGraph object. The graph is directed and weighted. For example, two edits made by User A on User B's Talk Page is represented as an edge from A to B with weight = 2. This script should be used on complete dumps and on stub.

signature2graph.py

Like utpedits2graph.py, but counting signature on User Talk Pages.

This script can be used on current dumps.

graph_enrich.py

Giving a pickled iGraph object, this script downloads useful information about the users (like if the user is a bot, a sysop, ..) from the wikipedia API and creates a new pickled iGraph object.

graph_analysis.py

Network analysis tool. Use ./graph_analysis --help to see available parameters Note: transitivity isn't available yet. don't use summary with --as-table. Use --histogram, --gnuplot, --plot alone

graph_longitudinal_analysis.py

Network longitudinal analysis. Possible analysis: cumulative, time slices

usercontributions.py

Given a stub dump, this script counts contributions for every user on the whole wikipedia.

Results are stored in a database. Saved informations are:

FieldTypeDescription
usernameString
langStringData on this user are related to the "lang" wikipedia
normal_editsIntegerEdits on the article namespace
namespace_editsStringThis is an array of integers. Each integer represents the number of edits made by this user on pages in a namespace. Namespaces are numbered starting from 0 in the order found at the beginning of the XML dump file
first_editDateTimeTime of the first (oldest) edit
last_editDateTimeTime of the last (most recent) edit
comments_countIntegerNumber of comments left by this user
comments_avgFloatComment average length
minorIntegerNumber of minor edits
welcomeIntegerNumber of edits with a comment containing the word "welcome"
npovIntegerNumber of edits with a comment containing the word "npov" (neutral point of view)
pleaseIntegerNumber of edits with a comment containing the word "please"
thanksIntegerNumber of edits with a comment containing the word "thanks"
revertIntegerNumber of edits with a comment containing the word "revert"

usercontributions_export.py

Export data collected by usercontributions.py in a CSV file.

events_anniversary.py

This script collects revision times for all the article and talk pages and for a set of desired pages. The purpose of this analysis is to find if pages related to events are changed in a neighbourhood of the anniversary.

Data are stored in a database.

events_analysis.py

The script accepts in input a list of desired pages and the wikipedia language to be analyzed. It retrieves data from db about all the revisions of the specified language and processes revisions' statistics for each found page, such as number of edits, number of unique editors, edits made in a range of days around event's anniversary, etc... Data are outputted in a csv file, bz2 compressed

word_frequency.py

Given a list of words, find the frequency of these words in a random set of pages and in a list of desired pages (and the related talk pages).

Data are stored in a database.

countwords_groups.py

Given a current dump, count words found on every UTP and return the results by group (the group which the user belongs).

countwords-groups-sender.py

Note: currenly broken. don't use it Given a current dump, count words found on every UTP and return the results by the group which the sender of the msg belongs to. Group == role (sysop, bot, etc...)

conttemplates.py

Used to count how many templates are used in every User Page (not UTP!)

pywc.py

Script similar to TAWC/LIWC. Used to analise text in wikipedia revisions_page output. As it has been developed for wikipedia, it contains some functions to clean the text. Basically accepts as input a dictionary with categories and a sort of regular expressions and counts words.

pywc_plot.py

pywc_plot is used to plot pywc results.

revisions_page.py

This script extracts all the revisions of a list of pages. It's possible to specify which kind of page you want to analize (all, content only, talk only). Data are outputted in a csv file.

csv_manipulation.py

Used to manage files exported by revisions_page.py (text cells can contain new lines so it isn't possible to manage them easily with head, tail, awk, ...) Using this script you can extract only specific pages or a little number of revisions from a huge file, select a start and end date of revisions or "condense" them in groups of X words.

Something went wrong with that request. Please try again.