<a href="https://colab.research.google.com/github/ryan-rattray/colab_images/blob/main/knn_regression_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# k-Nearest Neighbors Regression

## KNN: Instance Based Learning
In supervised learning, many models rely on a structured representation built from the training data. For example, decision trees create a branching structure that reflects patterns in the data. This approach is called model-based learning, where the model generalizes from the training data to make predictions.

In contrast, instance-based learning doesn't build an abstract model. Instead, it keeps the training data in memory and uses it directly to make predictions when new data is encountered. There's no traditional "training" phase—just storing the examples for future reference.

K-nearest neighbors (KNN) is a classic example of instance-based learning. When making a prediction, KNN looks at the most similar stored examples (its "neighbors") and bases its prediction on them. So, rather than learning a model ahead of time, it makes decisions on the fly using the available data.

$$
d(\mathbf{x}_{\text{test}}, \mathbf{x}^{(i)}) = \sqrt{ \sum_{k=1}^{n} \left( x_{\text{test},k} - x_k^{(i)} \right)^2 }
$$


![picture](https://raw.githubusercontent.com/ryan-rattray/colab_images/main/rules_vs_models.png)


## Notebook Setup

Running the following code cells will cofigure the notebook to work with your google drive.  You will then be able to load/access scripts for running the demos.  

In [1]:
# Allowing Colab to acces your google drive.
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Changing the working directory to the Colab Notebooks directory.
import sys
sys.path.append('/content/drive/MyDrive/Colab Notebooks')


In [4]:
# Running the KNN demo.
'''
Select the value of k by using the slider bar. The plot will update with the
new regression curve.

To view the MSE for the different values of k, click the
"Show Error Plot" button.  The plot will take a few seconds to load.
'''
from knn_regression_demo_colab import run_knn_demo
run_knn_demo()

<IPython.core.display.Javascript object>

## Implementing KNN

For this example, you will be looking at the Concrete Compressive Strength data set available here:
[Concrete Data](https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength).

The code cells below will pull the data in directly from the UCI ML Repo site, which allow you to work with the data without having to download a copy on your device or load it into Google Drive.





### Install the ucimlrepo API library

This will allow us to use the UCI API to access data from their site.  When working in Google Colab, any packages that are not automatically included within a Colab session must be installed every time a session is initiated.

In [6]:
# Installing the ucimlrepo API library
!pip install ucimlrepo



In [7]:
# Once installed, the library can be imported for use
from ucimlrepo import fetch_ucirepo

### Load the Data

In [9]:
# fetch dataset
concrete_compressive_strength = fetch_ucirepo(id=165)

In [10]:
# data (as pandas dataframes)
X = concrete_compressive_strength.data.features
y = concrete_compressive_strength.data.targets

In [11]:
# metadata - run this cell if you're interested in learning a bit about the dataset.
print(concrete_compressive_strength.metadata)

{'uci_id': 165, 'name': 'Concrete Compressive Strength', 'repository_url': 'https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength', 'data_url': 'https://archive.ics.uci.edu/static/public/165/data.csv', 'abstract': 'Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. ', 'area': 'Physics and Chemistry', 'tasks': ['Regression'], 'characteristics': ['Multivariate'], 'num_instances': 1030, 'num_features': 8, 'feature_types': ['Real'], 'demographics': [], 'target_col': ['Concrete compressive strength'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1998, 'last_updated': 'Sun Feb 11 2024', 'dataset_doi': '10.24432/C5PK67', 'creators': ['I-Cheng Yeh'], 'intro_paper': {'ID': 383, 'type': 'NATIVE', 'title': 'Modeling of strength of high-performance concrete using artificial neural networks', 'authors': 'I. Yeh', 'venue': 'C

In [12]:
# variable information
print(concrete_compressive_strength.variables)

                            name     role        type demographic description  \
0                         Cement  Feature  Continuous        None        None   
1             Blast Furnace Slag  Feature     Integer        None        None   
2                        Fly Ash  Feature  Continuous        None        None   
3                          Water  Feature  Continuous        None        None   
4               Superplasticizer  Feature  Continuous        None        None   
5               Coarse Aggregate  Feature  Continuous        None        None   
6                 Fine Aggregate  Feature  Continuous        None        None   
7                            Age  Feature     Integer        None        None   
8  Concrete compressive strength   Target  Continuous        None        None   

    units missing_values  
0  kg/m^3             no  
1  kg/m^3             no  
2  kg/m^3             no  
3  kg/m^3             no  
4  kg/m^3             no  
5  kg/m^3             no  


In [13]:
df = concrete_compressive_strength

In [16]:
type(X)