# Zumi Obstacle Classification

**Overview:** Train Zumi with 'soft toy' and 'plastic cup' using the k-NN algorithm and compare this model to a Logistic Regression model. 

## 1.Data Collection

### Feature Engineering

The `calculate_sum` function returns the sum of the front_left and front_right sensor readings. This function is called in the `collect_data` function and the front infared sensors are passed as arguments.

In [None]:
# Function to calculate sum.
def calculate_sum(front_left, front_right):
    
    # Calculate sum of front_left and front_right sensors
    result = front_left + front_right
        
    return result

#### Feature engineering - code details

* *def calculate_sum(front_left, front_right)* : `calculate_sum` is the function name which recieves the front_left and front_right sensor readings as arguments.
* *result = front_left + front_right* : The `result` variable stores the value of the front_left and front_right sensor readings.
* *return result* : The `result` variable is returned by the function.

In [None]:
from zumi.zumi import Zumi
from zumi.util.screen import Screen
import time
import pickle

zumi = Zumi()
screen = Screen()
        
IR_FRONT_RIGHT = 0
IR_FRONT_LEFT = 5

DATA_WITH_NOISE = False

### Data Collection - *collect data without noise*

The `collect_data` function has been implemented two different ways for collecting data.

This was done so that you have the option of running the application and collect data without noise added to the data points or with noise added.

* If you want to add noise to the sensor readings, **do not run the cell block below**. Instead, run the cell block under `Data Augmentation`.

* If you *do not want noise added to the data*, then run the cell below, but do not run the `Data Augmentation` code block.

In [None]:
import pickle

# Try loading previous data, if exists. Otherwise, start with an empty list.
# This will store our sensor readings and labels.
try:
    with open('zumi_data.pkl', 'rb') as file:
        data = pickle.load(file)
except FileNotFoundError:
        data = [] # This will store our sensor readings and labels.

# Function to collect data.
def collect_data(obstacle_label):
    # Read from front infrared sensors.
    front_left = zumi.get_IR_data(IR_FRONT_LEFT)
    front_right = zumi.get_IR_data(IR_FRONT_RIGHT)
    
    # Call calculate_sum function
    ir_sum = calculate_sum(front_left, front_right)

    # Create a dictionary for each data point and then append it to the data list
    # In addition, add the sum as a new feature for each data point
    data_point = {'front_left': front_left, 'front_right': front_right, 'sum': ir_sum, 'label': obstacle_label}
    data.append(data_point)
    
    return data_point

#### Collecting data - code changes

* *sum = calculate_sum(front_left, front_right)* : Calculates the sum of each data point by passing the left and right sensor readings as arugments to the `calculate_sum` function. This value is returned and stored in the `ir_sum` variable.
* *data_point = {'front_left': front_left, 'front_right': front_right, 'sum': ir_sum, 'label': obstacle_label}* : The sum is added as a new feature to the data point.

### Data Augmentation - *collect data with noise*

**Run the cell block below if you want to run the application using noise added to the sensor readings**; do not run the code block above.

The function `add_gaussian_noise` adds noise to a single sensor reading. We call this function in the `collect_data` function and pass the front_left and front_right sensor reading values as arguments.

     Args:
        - sensor_value: Single sensor reading value.
        - mean: Mean of the Gaussian distribution (default: 0).
        - std_dev: Standard deviation of the Gaussian distribution (default: 0.1).
    
    Returns:
        - Sensor reading value with added Gaussian noise.

In [None]:
import numpy as np
import pickle

# Try loading previous data, if exists. Otherwise, start with an empty list.
# This will store our sensor readings and labels.
try:
    with open('zumi_data_noise.pkl', 'rb') as file:
        data = pickle.load(file)
except FileNotFoundError:
        data = [] # This will store our sensor readings and labels.

# Function to add Gaussian noise to sensor readings
def add_gaussian_noise(sensor_value, mean=0, std_dev=0.1):
    
    # Set data noise variable to True
    DATA_WITH_NOISE = True
    
    noise = np.random.normal(mean, std_dev)
    noisy_sensor_value = sensor_value + noise
    return noisy_sensor_value

# Function to collect data with added Gaussian noise
def collect_data(obstacle_label):
    # Read from front infrared sensors.
    front_left = zumi.get_IR_data(IR_FRONT_LEFT)
    front_right = zumi.get_IR_data(IR_FRONT_RIGHT)
    
    # Adding Gaussian noise to the sensor readings
    noisy_front_left = add_gaussian_noise(front_left)
    noisy_front_right = add_gaussian_noise(front_right)

    # Call calculate_sum function
    ir_sum = calculate_sum(noisy_front_left, noisy_front_right)

    # Create a dictionary for each data point and then append it to the data list
    # In addition, add the sum as a new feature for each data point
    data_point = {'front_left': noisy_front_left, 'front_right': noisy_front_right, 'sum': ir_sum, 'label': obstacle_label}
    data.append(data_point)
    
    return data_point

To collect data, manually drive Zumi near each obstacle and execute the cell below.

In [None]:
# Collecting multiple readings for soft toy.
print("Drive Zumi near the soft toy and press Enter...")
for _ in range(5):  # Change this number if you want to collect more or fewer readings.
    input()
    collect_data('soft toy')
    print("Collected data for soft toy. Continue or move to the next position.")
    
# Collecting multiple readings for plastic cup.
print("\nDrive Zumi near the plastic cup and press Enter...")
for _ in range(5):  # Change this number if you want to collect more or fewer readings.
    input()
    collect_data('plastic cup')
    print("Collected data for plastic cup. Continue or move to the next position.")

# Saving the updated data for future use.
if (DATA_WITH_NOISE):
    with open('zumi_data_noise.pkl', 'wb') as file:
        pickle.dump(data, file)
else:
    with open('zumi_data.pkl', 'wb') as file:
        pickle.dump(data, file)

print("\nData collection complete and saved!")

## 2. Storing & Visualizing Data
Print a table of the data colleced by Zumi.

In [None]:
# Display data in a tabular format
print("\nCollected Data:")

# Print header
print("front_left", "front_right", "sum", "label")

# Print data
for row in data:
    print(row['front_left'], row['front_right'], row['sum'], row['label'])
    

## 3. Plotting Data

### Data Visualization
Plot a 2D graph and 3D graph of the data points collected by Zumi.

In [None]:
import matplotlib.pyplot as plt
import random
# We will create a dictionary to hold our groups
grouped_data = {}

# Define markers and colors for each label
markers = ['o', 's', '^', '<', '>', 's', '*', 'D']
colors = ['blue', 'green', 'red', 'yellow', 'magenta', 'cyan', 'black'] 
# Iterate over the data to populate the dictionary
for idx, row in enumerate(data):
    label = row['label']
    if label not in grouped_data:
        grouped_data[label] = {'front_left': [], 'front_right': [], 'marker': random.choice(markers), 'color': random.choice(colors)}
    grouped_data[label]['front_left'].append(row['front_left'])
    grouped_data[label]['front_right'].append(row['front_right'])

# Now we have a dictionary with lists of data points for each label

# Plotting
fig, ax = plt.subplots()

for label, group in grouped_data.items():
    ax.scatter(group['front_left'], group['front_right'], marker=group['marker'], color=group['color'], label=label)

ax.legend()

plt.xlabel('Front Left IR Value')
plt.ylabel('Front Right IR Value')
plt.title('IR Sensor Readings for Different Obstacles')
plt.grid(True)
plt.show()

In [None]:
from mpl_toolkits.mplot3d import Axes3D  # Importing 3D plotting functionality
import random

# Create a dictionary to hold our groups
grouped_data = {}

# Define markers and colors for each label
markers = ['o', 's', '^', '<', '>', 's', '*', 'D']  # You can extend this list with more markers if needed
colors = ['blue', 'green', 'red', 'yellow', 'magenta', 'cyan', 'black']  # You can add more colors or use different color palettes

# Iterate over the data to populate the dictionary
for idx, row in enumerate(data):
    label = row['label']
    if label not in grouped_data:
        grouped_data[label] = {'front_left': [], 'front_right': [], 'sum': [], 'marker': random.choice(markers), 'color': random.choice(colors)}
    grouped_data[label]['front_left'].append(row['front_left'])
    grouped_data[label]['front_right'].append(row['front_right'])
    grouped_data[label]['sum'].append(row['sum'])

# Now we have a dictionary with lists of data points for each label, including the 'sum' feature

# Plotting in 3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

for label, group in grouped_data.items():
    ax.scatter(group['front_left'], group['front_right'], group['sum'], marker=group['marker'], color=group['color'], label=label)

ax.set_xlabel('Front Left IR Value')
ax.set_ylabel('Front Right IR Value')
ax.set_zlabel('Sum of Sensor Readings')
ax.set_title('3D Scatter Plot of IR Sensor Readings')
ax.legend()

plt.show()

#### 3. Plotting Data - Details
**Graph 1:** Plot the data on a 2D graph.
* *markers = ['o', 's', '^', '<', '>', 's', "", 'D']* : Define a list of markers.
* *colors = ['blue', 'green', 'red', 'yellow', 'magenta', 'cyan', 'black']* : Define a list of colors.
* *'marker': random.choice(markers), 'color': random.choice(colors)* : Randomly choose a marker and color from the defined list to represent each label.

**Graph 2:** Plot the data on a 3D graph
* The same marker and color code implementation from 2D graph is used.
* *grouped_data[label] = {'front_left': [], 'front_right': [], 'sum': []* : Create a third feature label (sum).
* *grouped_data[label]['sum'].append(row['sum'])* : Add the third feature data to the `grouped_data` dictionary.

## 4. Implementing & Training k-NN and Logistic Regression
Split the data and train our k-NN classifier.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

# Split the data into training and test sets.

X = [[row['front_left'], row['front_right'], row['sum']] for row in data]
y = [row['label'] for row in data]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

k = 3  # For simplicity, we'll start with k=3.
knn = KNeighborsClassifier(n_neighbors=k)

# Train the classifier.
knn.fit(X_train, y_train)

# Print out classification report
y_pred = knn.predict(X_test)

print(classification_report(y_test, y_pred))

### Model Comparison

Train a `LogisticRegression` model and compare results to the k-NN classifier.

In [None]:
# Train LogisticRegression Model
from sklearn.linear_model import LogisticRegression
    
# Split data
X_log = [[row['front_left'], row['front_right'], row['sum']] for row in data]
y_log = [row['label'] for row in data]

X_train_log, X_test_log, y_train_log, y_test_log = train_test_split(X, y, test_size=0.3)
    
# Train LogisticRegression model
logreg = LogisticRegression(C=0.1)
logreg.fit(X_train_log, y_train_log)
   
pred_logreg = logreg.predict(X_test_log)
    
# Print out classification report
print('Logistic Regression Model')
print(classification_report(y_test_log, pred_logreg))
    
# Print prediction score
print("logreg score: {:.2f}".format(logreg.score(X_test_log, y_test_log)))

### Hyperparameter Tuning
Train the k-NN classifier with different values of `k` to determine the best value for the given dataset.

In [None]:
from sklearn.metrics import accuracy_score

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a range of k values to try
k_values = [1, 3, 5, 7, 9, 11, 13, 15]

# Dictionary to store accuracy scores for different k values
accuracy_scores = {}

# Iterate over each k value
for k in k_values:
    # Initialize k-NN classifier with the current k value
    knn = KNeighborsClassifier(n_neighbors=k)

    # Train the classifier
    knn.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = knn.predict(X_test)

    # Calculate accuracy and store it in the dictionary
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores[k] = accuracy
    print("k = {}, Accuracy = {}".format(k, accuracy))

# Find the best k value based on accuracy
best_k = max(accuracy_scores, key=accuracy_scores.get)
best_accuracy = accuracy_scores[best_k]

print("\nBest k value: {} with accuracy: {}".format(best_k, best_accuracy))

#### Hyperparameter Tuning - Details
* *k_values = [1, 3, 5, 7, 9, 11, 13, 15]* : Create a list of k_values.
* *accuracy_scores = {}* : Dictionary to store accuracy scores for different k values.
* *for k in k_values* : A for loop to iterate over each k value and assign it to the K-NN classifier. Calculate the accuracy and store it in the dictionary.
* *best_k = max(accuracy_scores, key=accuracy_scores.get)* : Find the best k value based on accuracy.
* *print("\nBest k value: {} with accuracy: {}".format(best_k, best_accuracy))* : Print the best k value and its' accuracy score.

### Confusion Matrix
Generate a confusion matrix for the k-NN and Logistic Regression classifiers.

In [None]:
from sklearn.metrics import confusion_matrix
    
# Generate a confusion matrix for k-NN classifier
confusion = confusion_matrix(y_test, y_pred)
print("K-NN confusion matrix:\n{}".format(confusion))
    
# Generate a confusion matrix for LogisticRegression classifier
confusion_log = confusion_matrix(y_test_log, pred_logreg)
print("Logistic Regression confusion matrix:\n{}".format(confusion_log))

### Cross-Validation with cross_val_score
Implement k-fold cross-validation using the `cross_val_score` function.

In [None]:
from sklearn.model_selection import cross_val_score

# K-NN Model
# you can change the number of folds using the "cv" parameter
knn_scores = cross_val_score(knn, X, y, cv=5)
print("K-NN cross-validation scores: {}".format(knn_scores))
# to summarize the cross-validation, compute the mean:
print("K-NN average cross-validation score: {:.2f}".format(knn_scores.mean()))

# Logistic Regression Model
logreg_scores = cross_val_score(logreg, X_log, y_log, cv=5)
print("Logistic Regression cross-validation scores: {}".format(logreg_scores))
# to summarize the cross-validation, compute the mean:
print("Logistic Regression average cross-validation score: {:.2f}".format(logreg_scores.mean()))

## 5. Real-time Decision Making
Instruct Zumi to react differently based on the obstacle type using the `react_to_obstacle` function.

In [None]:
from zumi.personality import Personality

def react_to_obstacle(type):
    
    personality = Personality(zumi, screen)
    
    if (type == 'soft toy'):
        zumi.headlights_on()
        personality.happy()
        time.sleep(2)
        zumi.all_lights_off()
    elif (type == 'plastic cup'):
        zumi.circle_right(speed=60, step=10)
        personality.look_around()
        zumi.circle_left(speed=60, step=10)
    else:
        zumi.all_lights_on()
        time.sleep(3)
        zumi.all_lights_off()

#### `react_to_obstacle` - Details
* *from zumi.personality import Personality* : Import Personality module from Zumi.
* *if (type == 'soft toy')* : If the obstacle type is a `soft toy`, then the following code block executes.
* *zumi.headlights_on()* : Turn Zumi's headlights on.
* *personality.happy()* : Change Zumi's personality to 'happy'.
* *time.sleep(2)* : Wait 2 seconds.
* *zumi.all_lights_off()* : Turn Zumi's headlights off.
* *elif (type == 'plastic cup')* : If the obstacle type is a `plastic cup`, then the following code block executes.
* *zumi.circle_right(speed=60, step=10)* : Circle Zumi to the right at a speed of 60 and a step of 10.
* *personality.look_around()* : Change Zumi's personality to look around.
* *zumi.circle_left(speed=60, step=10)* : Circle Zumi to the left at a speed of 60 and a step of 10.
* *else* : If the obstacle type is **neither** a `soft toy` or `plastic cup`, then the following code block executes.
* *zumi.all_lights_on()* : Turn on all of Zumi's lights.
* *time.sleep(3)* : Wait 3 seconds.
* *zumi.all_lights_off()* Turn off all of Zumi's lights.

In [None]:
def classify_obstacle():
    # Read from front infrared sensors.
    front_left = zumi.get_IR_data(IR_FRONT_LEFT)
    front_right = zumi.get_IR_data(IR_FRONT_RIGHT)
    
    ir_sum = calculate_sum(front_left, front_right)
    
    prediction = knn.predict([[front_left, front_right, ir_sum]])
    
    # Display the prediction on Zumi's screen
    screen.draw_text("Pred: " + str(prediction[0]))
    
    return prediction[0]

print("Classifying obstacle...")
obstacle_type = classify_obstacle()
# Call the 'react_to_obstacle' function
react_to_obstacle(obstacle_type)
obstacle_type

#### `classify_obstacle` - Code changes & details
* *ir_sum = calculate_sum(front_left, front_right)* : Calculate the sum of the infared sensor readings.
* *prediction = knn.predict([[front_left, front_right, ir_sum]])* : Provide the front sensor readings and the sum to the predict method of the classifier.
* *react_to_obstacle(obstacle_type)* : Pass the obstacle type to the `react_to_obstacle` function. Watch as Zumi reacts to the predicted obstacle.

## 6. Evaluation
Check the accuracy of the k-NN and Logistic Regression classifiers using the test set.

In [None]:
from sklearn.metrics import accuracy_score

y_pred = knn.predict(X_test)
accuracy_k = accuracy_score(y_test, y_pred)
print("K-NN accuracy: {}".format(accuracy_k))

y_pred_log = logreg.predict(X_test_log)
accuracy_log = accuracy_score(y_test_log, y_pred_log)
print("Logistic Regression accuracy: {}".format(accuracy_log))