# Algorithms & Tools

> Data is more important than tools!

---

## Linear Model

- Classification or regression.
- Easy to interpret, train, and deploy.
- Typically first to explore, but not as powerful as other algorithms.

---

## Decision Trees

- Classification or regression.
- Easy to visualize.
- Nodes are decision points, branches are the data.

- **Random Forests** are an ensemble model, composed of multiple trees, then
summarized.

- **Hierarquical Clustering** is a tree-based model, but not a supervised
model.

- **Feature Selection** is done, since top nodes are more important, and best
split the data.

---

## XGBoost

- Tree-based model, comprised of weak-learners.
- Usually outperforms random forest.
- Highly tunable.

### Example / Practice

1. Import Pandas and XGBoost
2. Create DataFrame
3. Create Data Matrix
4. Train the model
5. Predict target values

```python
# [1] Import Pandas and XGBoost
import pandas as pd
import xgboost as xgb

# [2] Create DataFrame
df = pd.DataFrame([[1, 2, 0], [3, 4, 1], [5, 6, 0], [7, 8, 1]], columns=["num", "amount", "target"])

# [3] Create Data Matrix
df_xgb = xgb.DMatrix(df[["num", "amount"]], label=df["target"])

# [4] Train the model
params = {"eval_metric": "logloss", "objective": "binary:hinge"}
bst = xgb.train(params, df_xgb)

# [5] Predict target values
bst.predict(df_xgb)
```

---

## AutoGluon

- Framework to automate processing, creation, and tuning of ML models.
- Define target feature and duration of training.

---

> AutoGluon is part of the AutoML class of models, which automates ML workflow.

> More AutoML stuff can be found on Awesome's curated list,
[here](https://github.com/windmaple/awesome-AutoML)

---

### Example / Practice

1. Import Pandas and AutoGluon's `TabularPredictor`.
2. Create `DataFrame`.
3. Create `TabularPredictor`, passing data, time limit and objective.
4. Check summary of created models.
5. Evaluate best model's hyperparameters search.

```python
# [1] Import Pandas and AutoGluon's TabularPredictor
import pandas as pd
from autogluon.tabular import TabularPredictor

# [2] Create DataFrame
df = pd.DataFrame([[1, 2, 0], [3, 4, 1], [5, 6, 0], [7, 8, 1]],
                  columns=["num", "amount", "target"])

# [3] Create TabularPredictor, passing data, time limit and objective.
predictor = TabularPredictor(label="target") \
    .fit(train_data=df, time_limit=60, presets="best_quality")

# [4] Check summary of created models
predictor.fit_summary()

# [5] Evaluate best model's hyperparameters search
predictor.evaluate(df)
```

---

## Exercises

12. **Linear Models**: This first exercise explores two linear models, a
    classification and a regression.

13. **XGBoost**: After learning a little about linear models, random forests,
    this lesson puts XGBoost to practice, showing how to use its API to train,
    and test a model.

14. **AutoGluon**: To finish this part of the course, we use AutoGluon to train
    multiple models easily, both for classification and regression.