Problem Statement:

Deep in the ocean, the brave submarine Explorer set out on a vital mission. Its crew faced the challenge of navigating through dangerous underwater landscapes, wary of hidden threats like rocks and mines. To enhance their safety, the crew employed machine learning techniques. By carefully studying sonar readings, they aimed to create a smart model that could distinguish these dangerous obstacles. With this innovative solution, Explorer and its crew could explore uncharted waters with confidence, safeguarding against potential dangers and ensuring a successful mission.






Objective


1.   Develop a machine learning model to accurately classify sonar signals as rocks or mines using a labeled dataset of sonar readings.

2.   Create a reliable and robust solution for real-world scenarios that require accurate classification of underwater objects.














Workflow

Load the data. 
Explore the data.
Clean the data. 
Feature engineering.
Choose a machine learning algorithm.
Train the model.
Evaluate the model.
Deploy the model. 





Dataset

https://www.kaggle.com/datasets/mayurdalvi/sonar-mine-dataset
Reference


Code


##Import necessary libraries and modules


In [64]:

import numpy as np  # Library for numerical computations and arrays
import pandas as pd  # Library for data manipulation and analysis
import matplotlib.pyplot as plt  # Library for plotting graphs
import seaborn as sns  # Library for enhanced data visualization

from sklearn.model_selection import train_test_split  # Library for splitting the dataset
from sklearn.linear_model import LogisticRegression  # Library for Logistic Regression model
from sklearn.metrics import accuracy_score  # Library for accuracy metric


#Data Collection & Data Procession

In [65]:
# Loading the dataset to a pandas dataframe

sonar_data = pd.read_csv("sonardata.csv", header = None) 

In [66]:
# Display the first few rows of the DataFrame

sonar_data.head() # Gives only 5 rows  

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


In [67]:
#For number of rows and collum

sonar_data.shape # Give the total rows and column

(208, 61)

In [68]:
# Dimensions of the dataset

print("Number of rows:", sonar_data.shape[0])  # Display the number of rows in the dataset
print("Number of columns:", sonar_data.shape[1])  # Display the number of columns in the dataset


Number of rows: 208
Number of columns: 61


In [69]:
# Generate descriptive statistics for the sonar_data DataFrame

sonar_data.describe() #describe --> statistical measures of the data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
count,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,...,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0
mean,0.029164,0.038437,0.043832,0.053892,0.075202,0.10457,0.121747,0.134799,0.178003,0.208259,...,0.016069,0.01342,0.010709,0.010941,0.00929,0.008222,0.00782,0.007949,0.007941,0.006507
std,0.022991,0.03296,0.038428,0.046528,0.055552,0.059105,0.061788,0.085152,0.118387,0.134416,...,0.012008,0.009634,0.00706,0.007301,0.007088,0.005736,0.005785,0.00647,0.006181,0.005031
min,0.0015,0.0006,0.0015,0.0058,0.0067,0.0102,0.0033,0.0055,0.0075,0.0113,...,0.0,0.0008,0.0005,0.001,0.0006,0.0004,0.0003,0.0003,0.0001,0.0006
25%,0.01335,0.01645,0.01895,0.024375,0.03805,0.067025,0.0809,0.080425,0.097025,0.111275,...,0.008425,0.007275,0.005075,0.005375,0.00415,0.0044,0.0037,0.0036,0.003675,0.0031
50%,0.0228,0.0308,0.0343,0.04405,0.0625,0.09215,0.10695,0.1121,0.15225,0.1824,...,0.0139,0.0114,0.00955,0.0093,0.0075,0.00685,0.00595,0.0058,0.0064,0.0053
75%,0.03555,0.04795,0.05795,0.0645,0.100275,0.134125,0.154,0.1696,0.233425,0.2687,...,0.020825,0.016725,0.0149,0.0145,0.0121,0.010575,0.010425,0.01035,0.010325,0.008525
max,0.1371,0.2339,0.3059,0.4264,0.401,0.3823,0.3729,0.459,0.6828,0.7106,...,0.1004,0.0709,0.039,0.0352,0.0447,0.0394,0.0355,0.044,0.0364,0.0439


In [70]:
# Retrieve the column  from the sonar_data DataFrame

sonar_data[60] # here the column 60 is selected

0      R
1      R
2      R
3      R
4      R
      ..
203    M
204    M
205    M
206    M
207    M
Name: 60, Length: 208, dtype: object

In [71]:
# Retrieve the column labeled '60' from the sonar_data DataFrame

print("M for Mine and R for Rock")

sonar_data[60].value_counts()  # Calculate the frequency of each unique value in the column labeled '60'

M for Mine and R for Rock


M    111
R     97
Name: 60, dtype: int64

M = Mine
R = Rock

#changing categorical column to numerical data


In [98]:
# Create a new DataFrame by copying the original DataFrame
new_sonar_data = sonar_data.copy()

# Replace 'R' with 0 and 'M' with 1 in column 60 of the new DataFrame
new_sonar_data[60].replace(['R', 'M'], [0, 1], inplace=True)



In [100]:
print(new_sonar_data[60])


0      0
1      0
2      0
3      0
4      0
      ..
203    1
204    1
205    1
206    1
207    1
Name: 60, Length: 208, dtype: int64


In [142]:
new_sonar_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,0
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,0
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,0
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,0
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,0


In [143]:
# Group the sonar_data DataFrame by values in column labeled '60' and calculate the mean for each group

new_sonar_data.groupby(60).mean()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
60,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.022498,0.030303,0.035951,0.041447,0.062028,0.096224,0.11418,0.117596,0.137392,0.159325,...,0.012311,0.010453,0.00964,0.009518,0.008567,0.00743,0.007814,0.006677,0.007078,0.006024
1,0.034989,0.045544,0.05072,0.064768,0.086715,0.111864,0.128359,0.149832,0.213492,0.251022,...,0.019352,0.016014,0.011643,0.012185,0.009923,0.008914,0.007825,0.00906,0.008695,0.00693


         0       1       2       3       4       5       6       7       8   \
0    0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109   
1    0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337   
2    0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598   
3    0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598   
4    0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
203  0.0187  0.0346  0.0168  0.0177  0.0393  0.1630  0.2028  0.1694  0.2328   
204  0.0323  0.0101  0.0298  0.0564  0.0760  0.0958  0.0990  0.1018  0.1030   
205  0.0522  0.0437  0.0180  0.0292  0.0351  0.1171  0.1257  0.1178  0.1258   
206  0.0303  0.0353  0.0490  0.0608  0.0167  0.1354  0.1465  0.1123  0.1945   
207  0.0260  0.0363  0.0136  0.0272  0.0214  0.0338  0.0655  0.1400  0.1843   

         9   ...      50      51      52      53   

In [145]:
# Separating data and labels
X = new_sonar_data.drop(columns=60, axis=1)
Y = new_sonar_data[60]

# X: Contains the data without the column labeled '60'
# Y: Contains the labels, representing the values from the column labeled '60'


In [146]:
print(X)  # Print the data (X) without the column labeled '60'


         0       1       2       3       4       5       6       7       8   \
0    0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109   
1    0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337   
2    0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598   
3    0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598   
4    0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
203  0.0187  0.0346  0.0168  0.0177  0.0393  0.1630  0.2028  0.1694  0.2328   
204  0.0323  0.0101  0.0298  0.0564  0.0760  0.0958  0.0990  0.1018  0.1030   
205  0.0522  0.0437  0.0180  0.0292  0.0351  0.1171  0.1257  0.1178  0.1258   
206  0.0303  0.0353  0.0490  0.0608  0.0167  0.1354  0.1465  0.1123  0.1945   
207  0.0260  0.0363  0.0136  0.0272  0.0214  0.0338  0.0655  0.1400  0.1843   

         9   ...      50      51      52      53   

In [147]:
print(Y)  # Print the labels (Y) from the column labeled '60'


0      0
1      0
2      0
3      0
4      0
      ..
203    1
204    1
205    1
206    1
207    1
Name: 60, Length: 208, dtype: int64


#Training and Test data


In [124]:
# Split the data into training and testing sets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=1)

# X_train: Training data (features) obtained by splitting X with a test size of 0.1
# X_test: Testing data (features) obtained by splitting X with a test size of 0.1
# Y_train: Training labels obtained by splitting Y with a test size of 0.1
# Y_test: Testing labels obtained by splitting Y with a test size of 0.1

# test_size=0.1: Specifies that 10% of the data will be used for testing, while 90% will be used for training
# stratify=Y: Ensures that the class distribution is preserved in both the training and testing sets, based on the labels in Y
# random_state=1: Sets a random seed to 1, ensuring the same train-test split is generated for reproducibility


In [125]:
print(X.shape)  # Print the shape of the original data (X)
print(X_train.shape)  # Print the shape of the training data (X_train)
print(X_test.shape)  # Print the shape of the testing data (X_test)


(208, 60)
(166, 60)
(42, 60)


In [148]:
print(X_train) # training data 

         0       1       2       3       4       5       6       7       8   \
86   0.0188  0.0370  0.0953  0.0824  0.0249  0.0488  0.1424  0.1972  0.1873   
55   0.0201  0.0116  0.0123  0.0245  0.0547  0.0208  0.0891  0.0836  0.1335   
73   0.0139  0.0222  0.0089  0.0108  0.0215  0.0136  0.0659  0.0954  0.0786   
95   0.0291  0.0400  0.0771  0.0809  0.0521  0.1051  0.0145  0.0674  0.1294   
30   0.0240  0.0218  0.0324  0.0569  0.0330  0.0513  0.0897  0.0713  0.0569   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
194  0.0392  0.0108  0.0267  0.0257  0.0410  0.0491  0.1053  0.1690  0.2105   
119  0.0261  0.0266  0.0223  0.0749  0.1364  0.1513  0.1316  0.1654  0.1864   
68   0.0195  0.0142  0.0181  0.0406  0.0391  0.0249  0.0892  0.0973  0.0840   
92   0.0260  0.0192  0.0254  0.0061  0.0352  0.0701  0.1263  0.1080  0.1523   
56   0.0152  0.0102  0.0113  0.0263  0.0097  0.0391  0.0857  0.0915  0.0949   

         9   ...      50      51      52      53   

In [149]:
print(Y_train)  #training label


86     0
55     0
73     0
95     0
30     0
      ..
194    1
119    1
68     0
92     0
56     0
Name: 60, Length: 166, dtype: int64


#Model Training

Logistic Regression Model

In [150]:
model = LogisticRegression()

# model: Logistic regression classifier object



In [151]:
# Train the logistic regression model using the training data

model.fit(X_train, Y_train)

# X_train: Training data (features)
# Y_train: Training labels

# The `fit()` method is called on the `model` object and takes the training data (`X_train`) and corresponding labels (`Y_train`) as inputs.
# This allows the model to learn from the training data and adjust its internal parameters to find the best decision boundary for classification.

# After this step, the logistic regression model is trained and ready to make predictions on new, unseen data.


#Model Evaluation
 

In [152]:
# Predict the labels for the training data
X_train_prediction = model.predict(X_train)


# Calculate the accuracy score on the training data
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

# Print the accuracy score on the training data
print("Accuracy score on training data: ", training_data_accuracy)

Accuracy score on training data:  0.8554216867469879


In [153]:
# Predict the labels for the test data
X_test_prediction = model.predict(X_test)

# Calculate the accuracy score on the test data
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

# Print the accuracy score on the test data
print("Accuracy score on test data: ", test_data_accuracy)



Accuracy score on test data:  0.7857142857142857


#Making a predictive system 

60

In [171]:
# Define the input data


input_data = [0.0329,0.0216,0.0386,0.0627,0.1158,0.1482,0.2054,0.1605,0.2532,0.2672,0.3056,0.3161,0.2314,0.2067,0.1804,0.2808,0.4423,0.5947,0.6601,0.5844,0.4539,0.4789,0.5646,0.5281,0.7115,1.0000,0.9564,0.6090,0.5112,0.4000,0.0482,0.1852,0.2186,0.1436,0.1757,0.1428,0.1644,0.3089,0.3648,0.4441,0.3859,0.2813,0.1238,0.0953,0.1201,0.0825,0.0618,0.0141,0.0108,0.0124,0.0104,0.0095,0.0151,0.0059,0.0015,0.0053,0.0016,0.0042,0.0053,0.0074]
# Convert input data to a NumPy array
input_data_as_numpy_array = np.asarray(input_data)

# input_data: Empty tuple representing the input data
# input_data_as_numpy_array: NumPy array converted from input_data

#reshape the np array as we are predicting for one instance

input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)


In [172]:
# Predict the label for the input data
prediction = model.predict(input_data_reshaped)

# Check the prediction and print the corresponding label
if prediction[0] == 0:
    print("It is a rock")
else:
    print("It is a mine")





It is a mine


#Conclusion

We developed a machine learning model to predict rock or mine based on sonar signals. We analyzed and split the data into training and testing sets. We used logistic regression, a classification algorithm, to train and evaluate a model capable of predicting rock or mine based on sonar signals. This trained model is now able to make accurate predictions for new signals.

This project offers a practical solution for accurately identifying underwater objects using sonar technology. It has potential applications in areas such as marine security, environmental monitoring, and underwater exploration.