Tools and data visualisation for a Stack Exchange Database Administration data mining.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

This repository contains the scripts used and the results of a data-mining project on the database of the Database Administration StackExchange between 2008-2013.

Short Presentation of Results: on Slideshare.


The Database: The database was found here and consists of xml files with the Database Administration StackExchange database, from 2008 to 2013.

Tools: The team used SQL Server and Visual studio to create the Data Warehouse and Cube that were then loaded to Tableau for the visualisation of the data. Hadoop was used for 3 MapReduce Jobs and RapidMiner for to run Associated Rules and Clustering Algorithms.

Data Warehouse and Cube: After clearing the xml files and passing them in SQL Server, the team created the nessecary relations and dimensions, using Posts as a fact table and cleared 5% of the total data as they were orphaned:

First Results: Metrics on Days, Months, Years as well as Badges and Tag frequency:

General Results


Posts per country:


Clustering: Clustering results relating to Post data:


Clustering results relating to User data:


Association Rules:

Association Rules Results

MapReduce Results: