![](https://github.com/datawookie/useful-images/raw/master/banner/banner-lab-tensorflow-keras.png)

# Machine Learning Overview

## Part 1: Feature Engineering

You'll create a Decision Tree model to classify points into one of two classes. You'll engineer a new feature to make this process more efficient.

In [None]:
from sklearn import datasets, tree, metrics, model_selection
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(17)

Create some data.

In [None]:
SAMPLES = 1500

X, y = datasets.make_circles(n_samples=SAMPLES, factor=0.5, noise=0.1)

Visualise the data.

In [None]:
plt.scatter(X[:, 0], X[:, 1], s=10, color=[{0: '#377eb8', 1: '#ff7f00'}[k] for k in y])
plt.show()

We'll use those data to build a Decision Tree.

**Exercises**

1. Split the data into training and testing sets in a 80:20 proportion. Specify a value for the `random_state` parameter.
2. Create a `DecisionTreeClassifier` object with `max_depth=3`. Do not place any restrictions on the tree depth.
3. Fit the model to the training data.
4. Visualise the tree.
5. How deep is the tree? *Hint:* Use the `get_depth()` method.
6. What's the accuracy of the model on the testing data?
7. Add one or more new features to the data. *Hint:* There's one feature which would work rather well. Think about the Pythagorean Theorem.
8. Split the data again into training and testing sets. Use the same random seed for consistency.
9. Fit a new `DecisionTreeClassifier` object to the data. Use `max_depth=3` again.
10. What happened to the accuracy on the testing set?

In [None]:
# ------------------------------------------------------------------------------
#
# Your code goes here.
#
# ------------------------------------------------------------------------------

In [None]:
MAX_DEPTH = 3

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=13)

In [None]:
model = tree.DecisionTreeClassifier(max_depth=MAX_DEPTH)

In [None]:
model.fit(X_train, y_train);

In [None]:
fig = plt.figure(figsize = (12, 12))
tree.plot_tree(model, filled=True, proportion=True);

In [None]:
model.get_depth()

In [None]:
metrics.accuracy_score(y_test, model.predict(X_test))

In [None]:
# Add a new feature (the radial distance from the origin).
X = np.insert(X, 2, values=np.sqrt(X[:,0]**2+X[:,1]**2), axis=1)

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=13)

In [None]:
model = tree.DecisionTreeClassifier(max_depth=MAX_DEPTH)

In [None]:
model.fit(X_train, y_train);

In [None]:
model.get_depth()

In [None]:
metrics.accuracy_score(y_test, model.predict(X_test))