Skip to content

Latest commit

 

History

History
46 lines (26 loc) · 2.07 KB

README.md

File metadata and controls

46 lines (26 loc) · 2.07 KB

This repository contains the scripts used and the results of a data-mining project on the database of the Database Administration StackExchange between 2008-2013.


Short Presentation of Results: on Slideshare.


Metholodogy

The Database: The database was found here and consists of xml files with the Database Administration StackExchange database, from 2008 to 2013.

Tools: The team used SQL Server and Visual studio to create the Data Warehouse and Cube that were then loaded to Tableau for the visualisation of the data. Hadoop was used for 3 MapReduce Jobs and RapidMiner for to run Associated Rules and Clustering Algorithms.

Data Warehouse and Cube: After clearing the xml files and passing them in SQL Server, the team created the nessecary relations and dimensions, using Posts as a fact table and cleared 5% of the total data as they were orphaned:

First Results: Metrics on Days, Months, Years as well as Badges and Tag frequency:

General Results

PostsPerMonth

Posts per country:

PostsPerCountry

Clustering: Clustering results relating to Post data:

ClusteringToPosts

Clustering results relating to User data:

ClusteringToUsers

Association Rules:

Association Rules Results

MapReduce Results:

MapReduceResults