## Choosing the appropriate machine learning algorithm

Choosing the right ML algorithm is half systematic and half case-sensitive. What does that mean? From one prospective, there are few general steps you can save them as a crutch when you select the right ML model! However, from another prospective, your problem/data might be very unique and need to be carefully analyzed in order to select the right ML model!


So for any machine learning problem, we need to throw these questions to select the right model:

1- Is our data labeled or not?

Yes ---> Supervised Learning
 
   - Is the output "Label" discrete or continuous?
  
      Discrete ---> Classification
   
      Continuous ---> Regression 
      
      Anomaly Detection ---> The goal here is to identify data points that are simply unusual (e.x., Fraud Detection problem when we have highly unusual credit card spending patterns)

No  ---> Unsupervised Learning
   
   - Clustring

2- Do we need to interact with the environment to optimize the objective function?

Yes ---> Reinforcement Learning 

Now that we have a clear picture about our data, we can answer the previous questions directly: Yes, we have labeled, continuous data ---> Supervised Learning "Regression"

That lead us to a smaller set of ML algorithms where we deal with continuous labeled data: Linear Regression, Bayesian Linear Regression, Decision Forest Regression, Boosted Decision Tree Regression, Fast Forest Quantile Regression, Poisson Regression, Ordinal Regression, and Neural Network Regression.



Here, we reach the second option, mentioned_above, for the model selection: a case_sensitive task! This requires awareness of many factors that characterizing our problem, mainly : Accuracy, linearity, training time, the number of parameters (e.g., number of iterations, error tolerance,etc), and the number of features.

Finally, we need to highlight that "No free launch" with machine learning models. We need to figure out what requirement(s) we want to pay more for it!  

# Linear Regression

 Linear regression develops a linear relationship between one or more independent variables (also named predictors or features) and a numeric outcome, or dependent variable.

Linear regression could be the first refuge for many problems because:

 - It is simple ---> that makes it more explainable!

 - Fast to model and is particularly useful when the relationship to be modeled is not extremely complex and if you don't have a lot of data.

However,

 - For non-linear data, polynomial regression can be quite challenging to design, as one must have some information about the structure of the data and relationship between feature variables. As a result, these models are not as good as others when it comes to highly complex data.

# Bayesian Linear Regression 

It is a particular version of the linear regression model where the parameters are calculated using the prior information about the parameter combined with the likelihood function! 

    - With Bayesian Linear Regression, we can incorporate the whole range of inferential solutions, rather than a point estimate and a confidence interval as in classical regression. 

#  Decision Forest Regression 

Decision trees are non-parametric models that splits the data in the form of a tree structure! It performs a sequence of simple tests for each data point, forming a binary tree data structure until a leaf node (decision) is reached.

    - They are efficient in both computation and memory usage during training and prediction.

    - They can represent non-linear decision boundaries.

    - They perform integrated feature selection and classification and are resilient in the presence of noisy features.


# Boosted Decision Tree Regression

It simply mix of two classical ML algorithms where boosting is employed to create an ensemble of regression trees!
 
 "Boosting" means that each tree is dependent on prior trees. The algorithm learns by fitting the residual of the trees that preceded it. Thus, boosting in a decision tree ensemble tends to improve accuracy with some small risk of less coverage.

 Boosting in regression trees tends to improve accuracy with some small risk of less coverage.

### XGBoost

XGBoost is the implementation of the gradient boosted tree algorithms that's commonly used for classification and regression problems. Gradient boosting is a supervised learning algorithm consisting of an ensemble (set) of weaker models (trees), which sums up their estimates to predict a target variable with more accuracy.

# Fast Forest Quantile Regression

Quantile regression is useful if we want to understand more about the distribution of the predicted value, rather than get a single mean prediction value. This method has many applications, including:

    - Predicting prices

    - Estimating student performance or applying growth charts to assess child development

    - Discovering predictive relationships in cases where there is only a weak relationship between variables


- Regression Trees and Random Forest are generally great at learning complex, highly non-linear relationships.
- Very easy to interpret and understand.

However,

- Prone to major overfitting.
- Can be slow and requires more memory with a larger random forest ensembles.

# K-nearest neighbors (KNN)

KNN is a non-parametric method used for both classification and regression problems. The algorithm uses "feature similarity" to predict values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.

  - Simple, very easy to implement and  has one Hyper Parameter!
  - Variety of distance criteria to be choose from.
  
 However,
 
  - As dataset grows efficiency or speed of algorithm declines very fast.
  - Curse of Dimensionality: KNN works well with small number of input variables but as the numbers of variables grow KNN algorithm struggles to predict the output of new data point.

# Neural Network Regression

 Any class of statistical models can be termed a neural network if they use adaptive weights and can approximate non-linear functions of their inputs. Thus neural network regression is suited to problems where a more traditional regression model cannot fit a solution.
 
 - very effective at modeling highly complex non-linear relationships.
 - very flexible in learning almost any kind of feature variable relationships.
 - The more data we have, the better performance we can achieve.
 
 However,
 
 - It is known as a black box! As a result of it is complexity, so it is difficult to interpret and understand.

# Poisson Regression

Poisson regression is intended for use in regression models that are used to predict numeric values, typically counts. Therefore, we can use this module to create a regression model only if the values you are trying to predict fit the following conditions:

    - The response variable has a Poisson distribution.

    - Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.

    - A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-whole numbers.


# Ordinal Regression

Ordinal regression is used when the label or target column contains numbers, but the numbers represent a ranking or order rather than a numeric measurement.

# Conclusion

So based on the following factors:

1- The nature of our problem and the target variable (rental price),

2- From the previous analysis of the usage of different ML techniques for a regression problem,

3- And from the nature of our data set 

Those techniques have been selected to be applied on our data in order to predict the rental price: XGBoost, KNN, and NN!