# Geometric Models labs

Example notebook for exploring geometric models:
* Geometric algorithms for regression tasks - Linear, Ridge and Lasso regression
* Geometric algorithms for classification tasks - Logistic regression, Support Vector Machines (SVM)

<br>

Documentation:
* **Linear Models** in [Scikit Learn](https://scikit-learn.org/stable/api/sklearn.linear_model.html)
* **Support Vector Machines** in [Scikit Learn](https://scikit-learn.org/stable/api/sklearn.svm.html)

<br>
<br>
Ricardo Almeida, and

[Supervised Learning](https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb) by A. C. Muller and S. Guido, Introduction to machine learning with Python: A guide for data scientists, O’Reilly, 2017



In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import train_test_split

In [None]:
RANDOM_SEED = 7657

TEST_SIZE=0.20

## Linear models for regression tasks

#### Loading dataset

California Housing dataset

In [None]:
housing = fetch_california_housing()

In [None]:
print(housing.DESCR)

In [None]:
housing.feature_names

In [None]:
housing.target_names

In [None]:
housing.target

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, random_state=RANDOM_SEED, test_size=TEST_SIZE)

#### Linear regression

Fiting a Linear regression model and checking for accuracy on both datasets.

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lr = LinearRegression()

lr.fit(X_train, y_train)

In [None]:
model = lr
print("Accuracy on train set:  {:.1f}%".format(model.score(X_train, y_train)*100))
print("Accuracy on test  set:  {:.1f}%".format(model.score(X_test, y_test)*100))

#### Ridge regression

**Exercise 1.1** - your task is to:
- fit a Ridge regression model
- test different parameter values (example: ```alpha=0.0001, 0.1, ... , 10, 10000```)
- compare performance with the Linear regressor

In [None]:
# Fill here...

In [None]:
# ...

In [None]:
# model = ...
# print("Accuracy on train set:  {:.1f}%".format(model.score(X_train, y_train)*100))
# print("Accuracy on test  set:  {:.1f}%".format(model.score(X_test, y_test)*100))

#### Lasso regression

**Exercise 1.2** - your task is to:
- fit a Lasso regression model
- test different parameter values (example: ```alpha=0.0001, 0.01, ..., max_iter=1000, ..., 100000```)
- compare performance with the Linear regressor

In [None]:
# Fill here...

In [None]:
# ...

## Linear models for classification tasks

#### Loading dataset

Breast Cancer dataset

In [None]:
cancer = load_breast_cancer()

In [None]:
print(cancer.DESCR)

**Exercise 2.1** - your task is to:
- split the original data (in ```cancer.data``` and ```cancer.target```) into training and dev/validation sets

In [None]:
# Fill here...

#### Logistic regression

**Exercise 2.2** - your task is to:
- fit a ```LogisticRegression``` model
- with parameters ```C=100, max_iter=20000```

In [None]:
# Fill here...

In [None]:
model = logr
print("Accuracy on train set:  {:.1f}%".format(model.score(X_train, y_train)*100))
print("Accuracy on test  set:  {:.1f}%".format(model.score(X_test, y_test)*100))

#### Support Vector Machines(SVM)

**Exercise 2.3** - your task is to:
- fit a Support Vector Classifier
- with parameter ```dual='auto'```
- try distinct values of parameter ```C=1, 10, 100``` (regularization parameter)
- compare performance with the Logistic Regression classifier

In [None]:
# Fill here...

#### (Tree - NOT a geometric model, here just to compare)

**Exercise 2.4** - your task is to:
- run the follwing code and compare performance of the Tree model with the Logistic Regression and SVM

In [None]:
from sklearn.tree import DecisionTreeClassifier 

In [None]:
MAX_DEPTH = 4
MIN_SAMPLES_LEAF = 10
MIN_SAMPLE_SPLIT = 15

In [None]:
tree = DecisionTreeClassifier(random_state=RANDOM_SEED,
                             max_depth=MAX_DEPTH,
                             min_samples_leaf=MIN_SAMPLES_LEAF,
                             min_samples_split=MIN_SAMPLE_SPLIT)
tree.fit(X_train, y_train)

In [None]:
model = tree
print("Accuracy on train set:  {:.1f}%".format(model.score(X_train, y_train)*100))
print("Accuracy on test  set:  {:.1f}%".format(model.score(X_test, y_test)*100))