This repository consists of Data Preprocessing techniques such as Importing Libraries and Datasets, Feature Scaling, Level Encoding, Splitting Training and Testing sets and dealing with missing values.
This project contains the learning and implementation process of Data Cleaning and processing before applying machine learning models like regression, classification or clustering to the datasets.
Explanation of importing required libraries and creating the dependent and independent variables metrices, dealing with missing value in a data frame with mean, median or mode depending on the variable type of the column, Encoding categorical variable to provide them numeric values, Splitting of test and training sets for the model training process and feature scaling to normalize independent variables.
In order to be able to apply these transformations and methods to your dataset you need to have following tools and libraries:
- Python 2.x or Python 3.x
- Pandas
- NumPy
- Scikit-Learn
- R (For R implementation)
Data Preprocessing is extremely crucial and important step of any data modeling steps therefore you can use these codes to refine and preprocess your dataset throughout any of your statistical model building process using Python and R.
You can implement these transformation on your existing model to see if it is going to increase accuracy of the model.