# Simple binary classification

The Sonar Dataset involves the prediction of whether or not an object is a mine or a rock given the strength of sonar returns at different angles. It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 208 observations with 60 input variables and 1 output variable. The variable names are as follows:

Sonar returns at different angles
...
Class (M for mine and R for rock)

The baseline performance of predicting the most prevalent class is a classification accuracy of approximately 53%. Top results achieve a classification accuracy of approximately 88%.

1. Split the data into training and test (consider class imbalance).
2. Choose a performance function suitable for the problem.
3. Adjust the following algorithms to the data using the default parameters: Linear Regression, Logistic Regression, SVM, KNN, Decision trees.
4. Select the algorithm with the highest performance measure.
5. Report the performance on the test data.

Database

[https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv)

More information

[https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks))

## References
- [top-julia-machine-learning-libraries](https://www.analyticsvidhya.com/blog/2021/05/top-julia-machine-learning-libraries/)
- [Julia language in machine learning: Algorithms, applications, and open
issues](https://www.sciencedirect.com/science/article/pii/S157401372030071X)

## Solution

In [None]:
using CSV, DataFrames, PlotlyJS, Random, MLDataUtils, Printf, MLJ, ScikitLearn

In [14]:
dataSonar = CSV.read("Datasets/sonar.csv", DataFrame)
dataSonar#[93:105,:]

Unnamed: 0_level_0,Type,a1,a2,a3,a4,a5,a6,a7,a8
Unnamed: 0_level_1,String1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,R,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601
2,R,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481
3,R,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771
4,R,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276
5,R,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467
6,R,0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833
7,R,0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401
8,R,0.0519,0.0548,0.0842,0.0319,0.1158,0.0922,0.1027,0.0613
9,R,0.0223,0.0375,0.0484,0.0475,0.0647,0.0591,0.0753,0.0098
10,R,0.0164,0.0173,0.0347,0.007,0.0187,0.0671,0.1056,0.0697


In [39]:
Rdata, Mdata = groupby(dataSonar, :Type)
NR, NM = size(Rdata,1), size(Mdata,1)
N = NR + NM
NRpart, NMpart = 100*NR/N, 100*NM/N

#println("Porcentaje datos con R: $NRpart % \n Porcentaje datos con M: $NMpart %")
@sprintf("Porcentaje datos con R: %.0f %s, Porcentaje datos con M: %.0f %s", NRpart, "%", NMpart, "%")

"Porcentaje datos con R: 47 %, Porcentaje datos con M: 53 %"

In [41]:
Rdata

Unnamed: 0_level_0,Type,a1,a2,a3,a4,a5,a6,a7,a8
Unnamed: 0_level_1,String1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,R,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601
2,R,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481
3,R,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771
4,R,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276
5,R,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467
6,R,0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833
7,R,0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401
8,R,0.0519,0.0548,0.0842,0.0319,0.1158,0.0922,0.1027,0.0613
9,R,0.0223,0.0375,0.0484,0.0475,0.0647,0.0591,0.0753,0.0098
10,R,0.0164,0.0173,0.0347,0.007,0.0187,0.0671,0.1056,0.0697


### Train and Test Datasets

In [53]:
R1, R2 = splitobs(shuffleobs(Rdata), at = 0.9)
M1, M2 = splitobs(shuffleobs(Mdata), at = 0.9)

train, test = [R1; M1], [R2; M2]

([1m187×61 DataFrame[0m
[1m Row [0m│[1m Type    [0m[1m a1      [0m[1m a2      [0m[1m a3      [0m[1m a4      [0m[1m a5      [0m[1m a6      [0m[1m a7      [0m[1m[0m ⋯
[1m     [0m│[90m String1 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m[0m ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ R         0.0151   0.032    0.0599   0.105    0.1163   0.1734   0.1679  ⋯
   2 │ R         0.0093   0.0269   0.0217   0.0339   0.0305   0.1172   0.145
   3 │ R         0.0519   0.0548   0.0842   0.0319   0.1158   0.0922   0.1027
   4 │ R         0.0211   0.0319   0.0415   0.0286   0.0121   0.0438   0.1299
   5 │ R         0.0333   0.0221   0.027    0.0481   0.0679   0.0981   0.0843  ⋯
   6 │ R         0.019    0.0038   0.0642   0.0452   0.0333   0.069    0.0901
   7 │ R         0.0109   0.0093   0.0121   0.0378   0.0679   0.0863   0.1004
   8 │ R   