Skip to content

Tools and data visualisation for a Stack Exchange Database Administration data mining.

Notifications You must be signed in to change notification settings

lattas/mining-stack-exchange

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

This repository contains the scripts used and the results of a data-mining project on the database of the Database Administration StackExchange between 2008-2013.


Short Presentation of Results: on Slideshare.


Metholodogy

The Database: The database was found here and consists of xml files with the Database Administration StackExchange database, from 2008 to 2013.

Tools: The team used SQL Server and Visual studio to create the Data Warehouse and Cube that were then loaded to Tableau for the visualisation of the data. Hadoop was used for 3 MapReduce Jobs and RapidMiner for to run Associated Rules and Clustering Algorithms.

Data Warehouse and Cube: After clearing the xml files and passing them in SQL Server, the team created the nessecary relations and dimensions, using Posts as a fact table and cleared 5% of the total data as they were orphaned:

First Results: Metrics on Days, Months, Years as well as Badges and Tag frequency:

General Results

PostsPerMonth

Posts per country:

PostsPerCountry

Clustering: Clustering results relating to Post data:

ClusteringToPosts

Clustering results relating to User data:

ClusteringToUsers

Association Rules:

Association Rules Results

MapReduce Results:

MapReduceResults

About

Tools and data visualisation for a Stack Exchange Database Administration data mining.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published