Skip to content

savitha91/DataUnderstanding_MLproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Data understanding in ML project

In this project we will apply statistical methods to understand IOT sensor dataset and come up with feature combination list which can used while building models

One of the important steps in ML project is understanding the data(features).

Possible types of features are

  1. Continuous
  2. Categorical - Ordinal/Nominal

Following analysis can be done to understand data better:

Know the data distribution=> Univariate analysis
  1. Continuous variable : Central tendency measure
  2. Categorical variable : Frequency table
Know the dependency between features => Bivariate analysis
  1. Dependency between 2 Continuous variables : PearsonR correlation
  2. Dependency between 2 Categorical variables: Chi-square hypothesis
  3. Dependency between Continuous and Categorical variables : ANOVA hypothesis

This module can be tested on any dataset after applying required preprocessing (Eg: Convert DateTime variables)

Input File : IOT_train.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published