# Chapter 11 Machine learning models, continued
---
### There are a countless number of machine learning 'models'; new ones are being invented by researchers all the time. 

### There is also an **ensemble methods** which just throws together a bunch of separate models to make a new model!


---
## **machine learning models, just in this chapter**:

### linear regression
### Ridge
### Lasso
### k nearest neigbors
### logistic regression
### decision trees
### random forests
### Naive Bayes
### Ada Boost
### XGBoost

---
## What is a model?

### A **model** is simply some recipe or function that **takes in appropriate input** (often called features) and **returns appropriate output**

---
## Why do we need so many models?

Because as data scientists we encounter different types of questions/tasks/problems, and some models are better suited to some problems than others. 

This is similar to how there are different types of mathematical functions, e.g 
- linear
- quadratic
- polynomial,
- trigonometric
- logarithmic

that model different physical phenomena. For instance, in basic physics, the height of a falling object is modelled using a quadratic function of time t. 
The temperature might be modeled as a sinusoidal function of the month.
---
## Which is the best model to use?
Trick question, as there is no right answer. Your job, if you become a data scientist, is to find a model (or collection of models) that gives answers that are **good enough** for your purposes!

You should have a general sense of available models and their use cases, but pick a model (or create your own) is an art not a science! And you don't just pick one model and be done with it, you experiment with different models, combine different models, etc.

That is what Kaggle Competitions and their leaderboards are used for - to improve your model, learn from others who are getting better results, etc.

Remember, **we have given up hope of getting the output exactly right** - there is usually no (useful) exact formula for the data science problem at hand (e.g. if I give you the number of rooms, the crime rate, etc there is simply no way you can be expected to **exactly** compute the home price.) 

---
## What makes a model good enough?
Depends on what your problem is. If you are working on self-driving cars or disease detection, the threshold for *good enough* should be higher than if you are making an app that classifies if a picture is a cat or a dog!


# Using ChatGPT constructively to learn 

## Ask ChatGPT to explain what each of these machine learning models is

(Me to ChatGPT): Explain what the following machine learning models are. Perhaps make a table

- linear regression
- Ridge
- Lasso
- logistic regression
- decision trees
- random forests
- Naive Bayes
- Ada Boost
- XGBoost

### ChatGPT response (but read legitmate sources (our text, wikipedia) to be sure!)

Sure, I'd be happy to explain these machine learning models! Here's a brief description of each one, along with some key characteristics and differences. I'll also provide a table summarizing the information.

| Model           | Description                                                                                          | Key Characteristics                                                                                   |
|-----------------|------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| Linear Regression | Predicts a continuous target variable based on linear relationships between input features.       | Assumes a linear relationship, sensitive to outliers, uses least squares to minimize prediction error.|
| Ridge           | A regularization technique applied to **linear regression**, adding a penalty term to the loss function to prevent overfitting.| Controls model complexity by adding a L2 regularization term, shrinks coefficient values towards zero. |
| Lasso           | Similar to Ridge but uses L1 regularization, which can lead to some coefficients becoming exactly zero.| Used for feature selection, helps create sparse models, useful for reducing dimensionality.          |
| Logistic Regression | Used for binary **classification**, estimates the probability of an instance belonging to a certain class.| Utilizes the logistic function to transform linear predictions into probabilities.                    |
| Decision Trees  | Hierarchical structure that makes decisions by splitting data based on feature values.             | Easily interpretable, prone to overfitting, can handle both numerical and categorical data.          |
| Random Forests  | Ensemble of decision trees, each trained on a random subset of data, and averages their predictions.| Reduces overfitting, improves generalization, can handle large datasets and high-dimensional data. |
| Naive Bayes     | Probabilistic **classifier** based on Bayes' theorem, assumes features are independent given the class. | Works well with high-dimensional data, computationally efficient, "naive" assumption may limit accuracy.|
| Ada Boost       | **Ensemble method** that sequentially combines weak learners to create a strong **classifier**.           | Focuses on instances with higher misclassification rates, assigns weights to data points.             |
| XGBoost         | Extreme Gradient Boosting, an advanced boosting algorithm that optimizes the **ensemble** creation process.| Uses a more complex regularization approach, handles missing data, supports custom optimization objectives.|

Remember that each of these models has its own strengths and weaknesses, and the choice of model depends on the specific problem you're trying to solve, the nature of your data, and the trade-offs you're willing to make between interpretability, accuracy, and computational complexity.

### Ok, that's a lot to take in. Let organize it based on the two types of prediction problems we've seen
---
## Regression (predict a continous numerical value, like house price)
- Linear Regression
    - Ridge, Lasso are variants of Linear regression  ** regularization ** 

- k nearest neighbors
- decision trees, random forests

---

## Classification (predict a discrete value or category, like cat or dog)
- Logistic Regression (regression is a misnomer! I wish it was called logistic classification), applies to Binary **Classification** i.e. where there are two categories, like cat or dog
- k nearest neighbors
- decision trees, random forests
---

## Bagging methods (can apply to regression or classification)
- Decision Trees, Random Forest are an ensemble (collection) of decision trees
- Naive Bayes
---

## Boosting algorithms (can apply to regression or classification)
- Idea is to modify weights that the model got wrong
- AdaBoost, XGBoost 

## Test Train split

We split the data into two sets- train and test sets. The training set is like giving a practice exam to study for. The test is the actual test, which they do no see.


## Overfitting
Analogy:
This is like a student who memorizes the answer to practice exams, and so does well on practice exams (or data has seen before) but does not do well on questions they have not seen before.

## Regularization
Analogy:
Penalty for memorizing