#### Classification of Abalone into Male, Female, or Infant from given ohysical measurements.
##### More information about data can be found from following link

https://archive.ics.uci.edu/ml/datasets/Abalone

## Logistic Regression

#### Import necessary libraries

In [230]:
import numpy as np
import pandas as pd

#### Import .data file (text file ) and into csv format for simplification. If you have  already vsc formate then ignore this step

In [231]:
col_name=['Sex','Length','Diameter','Height','Whole weight','Shucked weight','Viscera weight','Shelle weight','Rings']
data=pd.read_table('abalone.data',delimiter=',',names=col_name)
data.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shelle weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [232]:
data.to_csv('abalone.csv',index=False,index_label=False)

#### Load the data set from csv file

In [233]:
df=pd.read_csv('abalone.csv',index_col=None)

In [234]:
df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shelle weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [253]:
print(df.isnull().sum())

Sex               0
Length            0
Diameter          0
Height            0
Whole weight      0
Shucked weight    0
Viscera weight    0
Shelle weight     0
Rings             0
dtype: int64


#### Convert categorical data into labeled data

In [235]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
df['Sex']=le.fit_transform(df['Sex'])

In [236]:
df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shelle weight,Rings
0,2,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,2,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,0,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,2,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,1,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


#### Extract  features and Lables from the dataset

In [238]:
X=df.iloc[:,1:].values
y=df.iloc[:,0]

In [239]:
print(X.shape)
print(y.shape)

(4177, 8)
(4177,)


#### Split dataset into train test data sets

In [240]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
print(X_train.shape)
print(X_test.shape)
y_test=np.array(y_test)

(3341, 8)
(836, 8)


#### Feature Scaling

In [241]:
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)

#### Train the Logistic Regression on training data

In [242]:
from sklearn.linear_model import LogisticRegression
classifier=LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)

LogisticRegression(random_state=0)

#### Predict the Single Result

In [243]:
prediction=classifier.predict(sc.transform([[0.456,0.345,0.049,1.201,0.2249,0.1245,0.2,8]]))
print('The predicted Sex of Abalone with given properties is ',le.inverse_transform(prediction))

The predicted Sex of Abalone with given properties is  ['M']


#### Predict the test data result

In [244]:
y_test_sex_lable=le.inverse_transform(y_test)

In [245]:
y_predict=classifier.predict(X_test)
y_predict_sex_lable=le.inverse_transform(y_predict)

In [246]:
comparision=[]
for i,j in zip(y_predict_sex_lable,y_test_sex_lable):
    comparision.append([i,j])
#print('Predicted Sex and Actual Sex as per following\n', comparision)        

#### Make Confusion metrix

In [247]:
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test,y_predict)
print(cm)

[[ 84  44 129]
 [ 11 232  25]
 [ 91  69 151]]


#### Compute the Accuracy Score

In [248]:
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_test,y_predict)
print('Model Accuracy is {:.2f} %'.format(accuracy*100))

Model Accuracy is 55.86 %


#### Use Grid Search to find best fitting parameters

In [249]:
from sklearn.model_selection import GridSearchCV

parameters=[{'C':[0.1,0.25,0.5,0.75,1,1.25]}]
grid=GridSearchCV(estimator=classifier,param_grid=parameters,scoring='accuracy',cv=10,n_jobs=1)
grid.fit(X_test,y_test)
best_accuracy=grid.best_score_
best_parameters=grid.best_params_
print('Best Accuracy {:.2f}%'.format(best_accuracy*100))
print('Best_Parameters :',best_parameters)

Best Accuracy 54.90%
Best_Parameters : {'C': 0.25}


#### Use k-Fold Cross validation ton find best acciuracy

In [250]:
from sklearn.model_selection import cross_val_score
kfold=cross_val_score(estimator=classifier,X=X_train,y=y_train,cv=10)
print("Accuracy: {:.2f} %".format(kfold.mean()*100))
print("Standard Deviation: {:.2f} %".format(kfold.std()*100))

Accuracy: 55.88 %
Standard Deviation: 1.61 %
