Tools and data visualisation for a Stack Exchange Database Administration data mining.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Results
Src
README.md

README.md

This repository contains the scripts used and the results of a data-mining project on the database of the Database Administration StackExchange between 2008-2013.


Short Presentation of Results: on Slideshare.


Metholodogy

The Database: The database was found here and consists of xml files with the Database Administration StackExchange database, from 2008 to 2013.

Tools: The team used SQL Server and Visual studio to create the Data Warehouse and Cube that were then loaded to Tableau for the visualisation of the data. Hadoop was used for 3 MapReduce Jobs and RapidMiner for to run Associated Rules and Clustering Algorithms.

Data Warehouse and Cube: After clearing the xml files and passing them in SQL Server, the team created the nessecary relations and dimensions, using Posts as a fact table and cleared 5% of the total data as they were orphaned:

First Results: Metrics on Days, Months, Years as well as Badges and Tag frequency:

General Results

PostsPerMonth

Posts per country:

PostsPerCountry

Clustering: Clustering results relating to Post data:

ClusteringToPosts

Clustering results relating to User data:

ClusteringToUsers

Association Rules:

Association Rules Results

MapReduce Results:

MapReduceResults