Skip to content

longnh462/OLAP-for-DataScience-Salary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OLAP-for-DataScience-Salary

Report

Full name Role
Nguyen Hoang Long Leader
Dang Thi Tuong Vy Member

Table of Content

  1. Introduction
  2. SSIS
  3. SSAS
  4. DataMining

Introduction

Online Analytical Processing is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store. in this case we use Dataset about DataScience Salary to analyze. This dataset include more than 50000 records, each record contain 29 columns like salary, title, tag, timestamp, ... We collected this dataset from Here

SSIS

Based on the Data we collected we decide to devide our data into seven Dimentions

  • Company
  • Title
  • Tag
  • Time
  • City
  • Education
  • Level

Then we have to build each Data pipeline for each dimention Dim_Time image Then we do the same for other columns

Finally, we have a system like this. Oh, and it also depends on what system you want to design. but this is one way to reference. Datapipeline image Remember to change datatype when you load data to flat file connection

SSAS

here is picture of schema.

Schema image

DataMining

We use two Machine Learning Algorithm to do this work, they are Kmeans and KNN we use OLS technique to find the best model for our dataset.

OLS result Education

OLS result Yearofexperience

Histogram of some characteristics Histogram

You can see more in file Datamining.ipynb from Datamining folder

About

Online Analytical Processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors