Data profiling is the process of examining, analyzing, and creating useful summaries of data.
Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
Exploratory Data Analysis (EDA) is an approach to analyze the data using visual techniques. It is used to discover trends, patterns, or to check assumptions with the help of statistical summary and graphical representations.
- MAP
This method converts the ordinal categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model. - One Hot Encoding
One hot encoding can be defined as the essential process of converting the categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model. - Standard Scaler
StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation. - Min Max Scaler
MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range. It scales the values to a specific value range without changing the shape of the original distribution.
- Feature Selection
Feature Selection is the method of reducing the input variable to your model by using only relevant data and getting rid of noise in data. - Feature Important
Feature (variable) importance indicates how much each feature contributes to the model prediction. Basically, it determines the degree of usefulness of a specific variable for a current model and prediction.
- Machine Learning Regression
- Simple Linear Regression
Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line. - Multiple Linear Regression
Multiple linear regression is a regression model that estimates the relationship between a quantitative dependent variable and two or more independent variables using a straight line.
- Simple Linear Regression
- Time Series Forcasting
Time series forecasting means to forecast or to predict the future value over a period of time. In this case, time series forecasting using the Prophet. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.
Hyperparameter tuning consists of finding a set of optimal hyperparameter values for a learning algorithm while applying this optimized algorithm to any data set. That combination of hyperparameters maximizes the model's performance, minimizing a predefined loss function to produce better results with fewer errors.
- Mean Absolute Error (MAE)
Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference between the actual and calculated values of each observation over the entire array and then dividing the sum obtained by the number of observations in the array. - Mean Absolute Percentage Error (MAPE)
The Mean Absolute Percentage Error (MAPE) is the sum of the individual absolute errors divided by the demand (each period separately). It is the average of the percentage errors.