# Sonar

<img src="https://frenzy86.s3.eu-west-2.amazonaws.com/python/sonar.jpg" width="800">

http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)

NAME: Sonar, Mines vs. Rocks

SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.

SOURCE: The data set was contributed to the benchmark collection by Terry Sejnowski, now at the Salk Institute and the University of California at San Deigo. The data set was developed in collaboration with R. Paul Gorman of Allied-Signal Aerospace Technology Center.

MAINTAINER: Scott E. Fahlman

PROBLEM DESCRIPTION:

The file “sonar.mines” contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file “sonar.rocks” contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.

Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

The label associated with each record contains the letter “R” if the object is a rock and “M” if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly.

In [91]:
!wget https://frenzy86.s3.eu-west-2.amazonaws.com/python/data/esercizi/sonar.csv

"wget" non � riconosciuto come comando interno o esterno,
 un programma eseguibile o un file batch.


In [None]:
import pandas as pd
from summarytools import dfSummary
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split

## pulizia

In [93]:
path = "sonar.csv"

In [94]:
colsname = range(0,61,1)

In [95]:
colsname = [s for s in colsname]

In [96]:
df = pd.read_csv(path, names=colsname, index_col=False)

In [97]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60
0,0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.0660,0.2273,0.3100,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.5550,0.6711,0.6415,0.7104,0.8080,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.0510,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,0.4918,0.6552,0.6919,0.7797,0.7464,0.9444,1.0000,0.8874,0.8024,0.7818,0.5212,0.4052,0.3957,0.3914,0.3250,0.3200,0.3271,0.2767,0.4423,0.2028,0.3788,0.2947,0.1984,0.2341,0.1306,0.4182,0.3835,0.1057,0.1840,0.1970,0.1674,0.0583,0.1401,0.1628,0.0621,0.0203,0.0530,0.0742,0.0409,0.0061,0.0125,0.0084,0.0089,0.0048,0.0094,0.0191,0.0140,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.2280,0.2431,0.3771,0.5598,0.6194,0.6333,0.7060,0.5544,0.5320,0.6479,0.6931,0.6759,0.7551,0.8929,0.8619,0.7974,0.6737,0.4293,0.3648,0.5331,0.2413,0.5070,0.8533,0.6036,0.8514,0.8512,0.5045,0.1862,0.2709,0.4232,0.3043,0.6116,0.6756,0.5375,0.4719,0.4647,0.2587,0.2129,0.2222,0.2111,0.0176,0.1348,0.0744,0.0130,0.0106,0.0033,0.0232,0.0166,0.0095,0.0180,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.0100,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,0.0881,0.1992,0.0184,0.2261,0.1729,0.2131,0.0693,0.2281,0.4060,0.3973,0.2741,0.3690,0.5556,0.4846,0.3140,0.5334,0.5256,0.2520,0.2090,0.3559,0.6260,0.7340,0.6120,0.3497,0.3953,0.3012,0.5408,0.8814,0.9857,0.9167,0.6121,0.5006,0.3210,0.3202,0.4295,0.3654,0.2655,0.1576,0.0681,0.0294,0.0241,0.0121,0.0036,0.0150,0.0085,0.0073,0.0050,0.0044,0.0040,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.0590,0.0649,0.1209,0.2467,0.3564,0.4459,0.4152,0.3952,0.4256,0.4135,0.4528,0.5326,0.7306,0.6193,0.2032,0.4636,0.4148,0.4292,0.5730,0.5399,0.3161,0.2285,0.6995,1.0000,0.7262,0.4724,0.5103,0.5459,0.2881,0.0981,0.1951,0.4181,0.4604,0.3217,0.2828,0.2430,0.1979,0.2444,0.1847,0.0841,0.0692,0.0528,0.0357,0.0085,0.0230,0.0046,0.0156,0.0031,0.0054,0.0105,0.0110,0.0015,0.0072,0.0048,0.0107,0.0094,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203,0.0187,0.0346,0.0168,0.0177,0.0393,0.1630,0.2028,0.1694,0.2328,0.2684,0.3108,0.2933,0.2275,0.0994,0.1801,0.2200,0.2732,0.2862,0.2034,0.1740,0.4130,0.6879,0.8120,0.8453,0.8919,0.9300,0.9987,1.0000,0.8104,0.6199,0.6041,0.5547,0.4160,0.1472,0.0849,0.0608,0.0969,0.1411,0.1676,0.1200,0.1201,0.1036,0.1977,0.1339,0.0902,0.1085,0.1521,0.1363,0.0858,0.0290,0.0203,0.0116,0.0098,0.0199,0.0033,0.0101,0.0065,0.0115,0.0193,0.0157,M
204,0.0323,0.0101,0.0298,0.0564,0.0760,0.0958,0.0990,0.1018,0.1030,0.2154,0.3085,0.3425,0.2990,0.1402,0.1235,0.1534,0.1901,0.2429,0.2120,0.2395,0.3272,0.5949,0.8302,0.9045,0.9888,0.9912,0.9448,1.0000,0.9092,0.7412,0.7691,0.7117,0.5304,0.2131,0.0928,0.1297,0.1159,0.1226,0.1768,0.0345,0.1562,0.0824,0.1149,0.1694,0.0954,0.0080,0.0790,0.1255,0.0647,0.0179,0.0051,0.0061,0.0093,0.0135,0.0063,0.0063,0.0034,0.0032,0.0062,0.0067,M
205,0.0522,0.0437,0.0180,0.0292,0.0351,0.1171,0.1257,0.1178,0.1258,0.2529,0.2716,0.2374,0.1878,0.0983,0.0683,0.1503,0.1723,0.2339,0.1962,0.1395,0.3164,0.5888,0.7631,0.8473,0.9424,0.9986,0.9699,1.0000,0.8630,0.6979,0.7717,0.7305,0.5197,0.1786,0.1098,0.1446,0.1066,0.1440,0.1929,0.0325,0.1490,0.0328,0.0537,0.1309,0.0910,0.0757,0.1059,0.1005,0.0535,0.0235,0.0155,0.0160,0.0029,0.0051,0.0062,0.0089,0.0140,0.0138,0.0077,0.0031,M
206,0.0303,0.0353,0.0490,0.0608,0.0167,0.1354,0.1465,0.1123,0.1945,0.2354,0.2898,0.2812,0.1578,0.0273,0.0673,0.1444,0.2070,0.2645,0.2828,0.4293,0.5685,0.6990,0.7246,0.7622,0.9242,1.0000,0.9979,0.8297,0.7032,0.7141,0.6893,0.4961,0.2584,0.0969,0.0776,0.0364,0.1572,0.1823,0.1349,0.0849,0.0492,0.1367,0.1552,0.1548,0.1319,0.0985,0.1258,0.0954,0.0489,0.0241,0.0042,0.0086,0.0046,0.0126,0.0036,0.0035,0.0034,0.0079,0.0036,0.0048,M


In [98]:
df = df.rename(columns={60:"y"})
df["y"] = df["y"].map({"R":0, "M":1})

In [99]:
df["y"].unique()

array([0, 1], dtype=int64)

In [100]:
dfSummary(df)

No,Variable,Stats / Values,Freqs / (% of Valid),Graph,Missing
1,0 [float64],Mean (sd) : 0.0 (0.0) min < med < max: 0.0 < 0.0 < 0.1 IQR (CV) : 0.0 (1.3),177 distinct values,,0 (0.0%)
2,1 [float64],Mean (sd) : 0.0 (0.0) min < med < max: 0.0 < 0.0 < 0.2 IQR (CV) : 0.0 (1.2),182 distinct values,,0 (0.0%)
3,2 [float64],Mean (sd) : 0.0 (0.0) min < med < max: 0.0 < 0.0 < 0.3 IQR (CV) : 0.0 (1.1),190 distinct values,,0 (0.0%)
4,3 [float64],Mean (sd) : 0.1 (0.0) min < med < max: 0.0 < 0.0 < 0.4 IQR (CV) : 0.0 (1.2),181 distinct values,,0 (0.0%)
5,4 [float64],Mean (sd) : 0.1 (0.1) min < med < max: 0.0 < 0.1 < 0.4 IQR (CV) : 0.1 (1.4),193 distinct values,,0 (0.0%)
6,5 [float64],Mean (sd) : 0.1 (0.1) min < med < max: 0.0 < 0.1 < 0.4 IQR (CV) : 0.1 (1.8),196 distinct values,,0 (0.0%)
7,6 [float64],Mean (sd) : 0.1 (0.1) min < med < max: 0.0 < 0.1 < 0.4 IQR (CV) : 0.1 (2.0),195 distinct values,,0 (0.0%)
8,7 [float64],Mean (sd) : 0.1 (0.1) min < med < max: 0.0 < 0.1 < 0.5 IQR (CV) : 0.1 (1.6),201 distinct values,,0 (0.0%)
9,8 [float64],Mean (sd) : 0.2 (0.1) min < med < max: 0.0 < 0.2 < 0.7 IQR (CV) : 0.1 (1.5),205 distinct values,,0 (0.0%)
10,9 [float64],Mean (sd) : 0.2 (0.1) min < med < max: 0.0 < 0.2 < 0.7 IQR (CV) : 0.2 (1.5),207 distinct values,,0 (0.0%)


In [101]:
df.columns[:-1]

Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
       36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
       54, 55, 56, 57, 58, 59],
      dtype='object')

In [102]:
for i, idx in enumerate(df.columns[:-1]):
    print(f"{i} max --> {df[i].min()}, min --> {df[i].max()}")

0 max --> 0.0015, min --> 0.1371
1 max --> 0.0006, min --> 0.2339
2 max --> 0.0015, min --> 0.3059
3 max --> 0.0058, min --> 0.4264
4 max --> 0.0067, min --> 0.401
5 max --> 0.0102, min --> 0.3823
6 max --> 0.0033, min --> 0.3729
7 max --> 0.0055, min --> 0.459
8 max --> 0.0075, min --> 0.6828
9 max --> 0.0113, min --> 0.7106
10 max --> 0.0289, min --> 0.7342
11 max --> 0.0236, min --> 0.706
12 max --> 0.0184, min --> 0.7131
13 max --> 0.0273, min --> 0.997
14 max --> 0.0031, min --> 1.0
15 max --> 0.0162, min --> 0.9988
16 max --> 0.0349, min --> 1.0
17 max --> 0.0375, min --> 1.0
18 max --> 0.0494, min --> 1.0
19 max --> 0.0656, min --> 1.0
20 max --> 0.0512, min --> 1.0
21 max --> 0.0219, min --> 1.0
22 max --> 0.0563, min --> 1.0
23 max --> 0.0239, min --> 1.0
24 max --> 0.024, min --> 1.0
25 max --> 0.0921, min --> 1.0
26 max --> 0.0481, min --> 1.0
27 max --> 0.0284, min --> 1.0
28 max --> 0.0144, min --> 1.0
29 max --> 0.0613, min --> 1.0
30 max --> 0.0482, min --> 0.9657
31 max

In [103]:
df[1].max()

0.2339

## best model

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, BaggingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from xgboost import XGBClassifier


model_xgb = XGBClassifier()
model_logistic_regression = LogisticRegression(random_state=667)
model_lda = LinearDiscriminantAnalysis()
model_qda = QuadraticDiscriminantAnalysis()
model_knn = KNeighborsClassifier(n_neighbors=3)
model_svc_linear = SVC(kernel='linear', random_state=667)  # SVM con kernel lineare
model_decision_tree = DecisionTreeClassifier(random_state=667)
model_random_forest = RandomForestClassifier(n_estimators=100, random_state=667)
model_gradient_boosting = GradientBoostingClassifier(random_state=667)
model_adaboost = AdaBoostClassifier(random_state=667)
model_bagging = BaggingClassifier(random_state=667)
model_naive_bayes = GaussianNB()
model_mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=300, random_state=667)

In [119]:
modelli = [model_xgb,
            model_logistic_regression,
            model_lda,
            model_qda,
            model_knn, 
            model_svc_linear, 
            model_decision_tree, 
            model_random_forest, 
            model_gradient_boosting, 
            model_adaboost, model_bagging, 
            model_naive_bayes, 
            model_mlp]

In [120]:
X = df.drop("y", axis=1)
y = df["y"]

In [121]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=667)

In [122]:
for model in modelli:
    model.fit(X_train, y_train)
    print(f"model = {model}, score = {model.score(X_test, y_test)}")

model = XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=None, ...), score = 0.7777777777777778
model = LogisticRegression(), score = 0.6825396825396826
model = LinearDiscriminantAnalysis(), score = 0.8095238095238095
model = QuadraticDiscriminantAnalysis(), score = 0.7142857142857143
model = KNeighbo



model = MLPClassifier(max_iter=300), score = 0.7936507936507936




In [132]:
from sklearn.model_selection import GridSearchCV
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis()
param_grid = {
    'solver': ['svd', 'lsqr', 'eigen'],
    'shrinkage': [None, 'auto', 0.1, 0.5],
    'tol': [1e-4, 1e-3, 1e-2]
}

grid_search = GridSearchCV(estimator=lda, param_grid=param_grid, scoring='accuracy', cv=5)
grid_search.fit(X_train, y_train)

print("Migliori parametri:", grid_search.best_params_)

Migliori parametri: {'shrinkage': 0.5, 'solver': 'lsqr', 'tol': 0.0001}


45 fits failed out of a total of 180.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
45 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\discriminant_analysis.py", line 621, in fit
    raise NotImplementedError("

In [135]:
from sklearn.model_selection import StratifiedKFold

In [136]:
lda_params = {
    'solver': ['svd', 'lsqr', 'eigen'],
    'shrinkage': [None, 'auto', 0.1, 0.3, 0.5, 0.7, 0.9]
}

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

lda = LinearDiscriminantAnalysis()
lda_random_search = RandomizedSearchCV(
    estimator=lda,
    param_distributions=lda_params,
    n_iter=20,  # Number of random combinations
    scoring='accuracy',
    cv=skf,
    random_state=42,
    n_jobs=-1
)

# Fit the model
lda_random_search.fit(X, y)

25 fits failed out of a total of 100.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
25 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\lucam\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\discriminant_analysis.py", line 621, in fit
    raise NotImplementedError("

In [138]:
lda_random_search.best_score_

0.7838559814169569

In [139]:
gb_params = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 4, 5, 6],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'subsample': [0.6, 0.8, 1.0]
}

gb = GradientBoostingClassifier(random_state=42)
gb_random_search = RandomizedSearchCV(
    estimator=gb,
    param_distributions=gb_params,
    n_iter=20,  # Number of random combinations
    scoring='accuracy',
    cv=skf,
    random_state=42,
    n_jobs=-1
)

# Fit the model
gb_random_search.fit(X, y)

In [140]:
gb_random_search.best_score_

0.8753774680603948

In [141]:
bagging_params = {
    'n_estimators': [50, 100, 200],
    'max_samples': [0.5, 0.7, 0.9, 1.0],
    'max_features': [0.5, 0.7, 1.0],
    'bootstrap': [True, False],
    'bootstrap_features': [True, False]
}

bagging = BaggingClassifier(random_state=42)
bagging_random_search = RandomizedSearchCV(
    estimator=bagging,
    param_distributions=bagging_params,
    n_iter=20,  # Number of random combinations
    scoring='accuracy',
    cv=skf,
    random_state=42,
    n_jobs=-1
)

# Fit the model
bagging_random_search.fit(X, y)

print("Best parameters for Bagging:", bagging_random_search.best_params_)
print("Best score for Bagging:", bagging_random_search.best_score_)    

Best parameters for Bagging: {'n_estimators': 50, 'max_samples': 0.9, 'max_features': 0.5, 'bootstrap_features': True, 'bootstrap': False}
Best score for Bagging: 0.8370499419279908
