Skip to content

[Machine Learning Part 1] Data Science | Studi Independen | MyEduSolve X Kampus Merdeka

Notifications You must be signed in to change notification settings

nurulauliyas/Machine-Learning-Regression-and-Timeseries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Regression and Timeseries

Data Profiling

Data profiling is the process of examining, analyzing, and creating useful summaries of data.

Data Cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an approach to analyze the data using visual techniques. It is used to discover trends, patterns, or to check assumptions with the help of statistical summary and graphical representations.

Feature Engineering

  • MAP
    This method converts the ordinal categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model.
  • One Hot Encoding
    One hot encoding can be defined as the essential process of converting the categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model.
  • Standard Scaler
    StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation.
  • Min Max Scaler
    MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range. It scales the values to a specific value range without changing the shape of the original distribution.

Preprocessing Modeling

  • Feature Selection
    Feature Selection is the method of reducing the input variable to your model by using only relevant data and getting rid of noise in data.
  • Feature Important
    Feature (variable) importance indicates how much each feature contributes to the model prediction. Basically, it determines the degree of usefulness of a specific variable for a current model and prediction.

Modeling

  • Machine Learning Regression
    • Simple Linear Regression
      Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line.
    • Multiple Linear Regression
      Multiple linear regression is a regression model that estimates the relationship between a quantitative dependent variable and two or more independent variables using a straight line.
  • Time Series Forcasting
    Time series forecasting means to forecast or to predict the future value over a period of time. In this case, time series forecasting using the Prophet. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Cross Validation

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

Hyperparameter Tuning

Hyperparameter tuning consists of finding a set of optimal hyperparameter values for a learning algorithm while applying this optimized algorithm to any data set. That combination of hyperparameters maximizes the model's performance, minimizing a predefined loss function to produce better results with fewer errors.

Evaluate Model

  • Mean Absolute Error (MAE)
    Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference between the actual and calculated values of each observation over the entire array and then dividing the sum obtained by the number of observations in the array.
  • Mean Absolute Percentage Error (MAPE)
    The Mean Absolute Percentage Error (MAPE) is the sum of the individual absolute errors divided by the demand (each period separately). It is the average of the percentage errors.