# Machine Learning robot with Arduino UNO and LIDAR
##### Author: [Nikodem Bartnik](https://nikodembartnik.pl/), [Indystry.cc](https://indystry.cc/)
This is the code used to process the data collected during manual racing and based on that train the classifier that will later be used at autonomus driving stage. If you prefer traditional python code you can take a look at main.py. README file in the github repository also have some usefull information. 

If you want to see how the project works you can take a look at these two videos on YouTube:
- [Machine Learning on Arduino Uno was a Good Idea](https://www.youtube.com/watch?v=PdSDhdciSpE)
- [The Racing Machine with AI and Arduino](https://www.youtube.com/watch?v=KJIKexczPrU)

We will start by importing all the necessary libraries. 

In [17]:
import pandas as pd
import numpy as np
# import matplotlib.pyplot as plt
# import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

## Load Dataset

In [24]:
data = pd.read_csv("../../training_data/square_circle.csv")
print(data.head())

     scan_0    scan_1    scan_2    scan_3    scan_4    scan_5    scan_6  \
0  2.747303  2.909352  3.057540  3.151707  3.252886  3.361839  3.479446   
1  2.764004  2.927501  3.075666  3.170654  3.199146  3.203684  3.209219   
2  2.174155  2.174562  2.175635  2.177377  2.179788  2.182873  2.186637   
3  2.903211  3.076593  3.208294  3.307885  3.414936  3.530263  3.654810   
4  2.910652  3.081300  3.239016  3.338389  3.445139  3.560067  3.684094   

     scan_7    scan_8    scan_9  ...  scan_356  scan_357  scan_358  scan_359  \
0  3.606723  3.744855  3.895228  ...  2.337887  2.451952  2.578549  2.719807   
1  3.215759  3.223313  3.231895  ...  2.351148  2.466140  2.593793  2.736266   
2  2.191086  2.196227  2.202067  ...  2.186104  2.184372  2.183310  2.182917   
3  3.789668  3.936110  4.095628  ...  2.466941  2.588598  2.723752  2.874726   
4  3.818285  3.963880  4.122331  ...  2.479446  2.599790  2.733290  2.882173   

   linear_x  linear_y  linear_z  angular_x  angular_y  angular_z  
0

### Data cleaning

In [25]:
def replace_inf_values(data, max_range=8.0):
    """
    Replace inf with max_range and -inf with 0 in a DataFrame.
    """
    for col in data.columns:
        col_values = data[col].values
        col_values[np.isposinf(col_values)] = max_range
        col_values[np.isneginf(col_values)] = 0.0
        data[col] = col_values

    return data

data = replace_inf_values(data, max_range=8.0)

In [26]:
# Get a Boolean mask of where inf exists
np.isinf(data.values).any()

False

In [27]:
X = data.iloc[:, :-6]  # Features: scan_0 to scan_359
y = data.iloc[:, -6:]  # Labels: linear_x to angular_z

In [29]:
# Strip leading and trailing spaces from column names
y.columns = y.columns.str.strip()

# Check column names to ensure they are cleaned
print(y.columns)

Index(['linear_x', 'linear_y', 'linear_z', 'angular_x', 'angular_y',
       'angular_z'],
      dtype='object')


In [30]:
y['label'] = y.apply(lambda row: f"{row['linear_x']}_{row['angular_z']}", axis=1)
y = y['label']  # Use the combined label as the target variable

# label
# "0.5_0.0"   # Move forward
# "0.0_1.0"   # Turn left
# "0.5_1.0"   # Turn while moving forward
# "0.0_0.0"   # Stop

In [31]:
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [32]:
# Train Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train) 

In [33]:
# Predict and Evaluate
y_pred = clf.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred) * 100
print(f"Accuracy: {accuracy}%")

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Accuracy: 82.94970161977835%
Confusion Matrix:
[[2 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 3 0]
 [0 0 0 ... 0 0 3]]


In [34]:
import joblib
joblib.dump(clf, "model_square_circle.pkl")  # Save the model

['model_square_circle.pkl']

## Testing with Another Dataset

In [40]:
test_data = pd.read_csv("../../training_data/square_dataset.csv") 

In [41]:
# Cleaning
def replace_inf_values(data, max_range=8.0):
    """
    Replace inf with max_range and -inf with 0 in a DataFrame.
    """
    for col in data.columns:
        col_values = data[col].values
        col_values[np.isposinf(col_values)] = max_range
        col_values[np.isneginf(col_values)] = 0.0
        data[col] = col_values

    return data

data = replace_inf_values(data, max_range=8.0)

In [42]:
# Predict on the test dataset
y_pred = clf.predict(X_test)

In [43]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


# Print confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Accuracy: 0.8294970161977835
Confusion Matrix:
[[2 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 3 0]
 [0 0 0 ... 0 0 3]]
