# Nearest Neighbor for Spine Injury Classification

Classify Back Injuries for Patients in a Hospital with **Nearest Neighbor Classification**, based on measurements of the shape and orientation of their pelvis and spine.

The data set contains information from **310** patients. For each patient, there are: six measurements (the x) and a label (the y). The label has **3** possible values, `’NO’` (normal), `’DH’` (herniated disk), or `’SL’` (spondilolysthesis). 

## 1. Setup notebook

In [1]:
import numpy as np

We divide the data into a training set of 248 patients and a separate test set of 62 patients. The following arrays are created:

* **`trainx`** : The training data's features, one point per row.
* **`trainy`** : The training data's labels.
* **`testx`** : The test data's features, one point per row.
* **`testy`** : The test data's labels.

We will use the training set (`trainx` and `trainy`), with nearest neighbor classification, to predict labels for the test data (`testx`). We will then compare these predictions with the correct labels, `testy`.

Notice that we code the three labels as `0. = ’NO’, 1. = ’DH’, 2. = ’SL’`.

In [2]:
# Load dataset and code labels as 0 = 'NO', 1 = 'DH', 2 = 'SL'
labels = [b'NO',b'DH',b'SL']
data   = np.loadtxt('Files/spine/column_3C.dat', converters={6: lambda s: labels.index(s)})

# Separate features from labels
x = data[:,0:6]
y = data[:,6]

# Divide into training and test set
trainingIndices = list(range(0,20)) + list(range(40,188)) + list(range(230,310))
testIndices     = list(range(20,40)) + list(range(188,230))

trainx = x[trainingIndices,:]
trainy = y[trainingIndices]
testx  = x[testIndices,:]
testy  = y[testIndices]

## 2. Nearest neighbor classification with L2 distance
Build a Nearest Neighbor Classifier based on L2 (*Euclidean*) Distance.

<font color="magenta">**Goal:**</font> Write a function, **NN_L2**, which takes as input the training data (`trainx` and `trainy`) and the test points (`testx`) and predicts labels for these test points using 1-NN classification. These labels should be returned in a `numpy` array with one entry per test point. For **NN_L2**, the L2 norm should be used as the distance metric.

**Observation:**
* L1 Distance: Manhattan Distance
* L2 Distance: Euclidean Distance

<img src="./Files/distances.jpg" align="left" />


<font  style="color:blue"> **Code**</font>
```python
# test function 
testy_L2 = NN_L2(trainx, trainy, testx)
print( type( testy_L2) )
print( len(testy_L2) )
print( testy_L2[40:50] )
```

<font  style="color:magenta"> **Output**</font>
```
<class 'numpy.ndarray'>
62
[ 2.  2.  1.  0.  0.  2.  0.  0.  0.  0.]
```
