## Logistic Regression with IRis data

###### Simple Logistic Regression with IRIS flower classification

### Step 1 : import all library and data

In [6]:
import os 
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [7]:
col = ['class','alcohal','malic acid','ash','alcal','mag','total','flavanoids','nonflavanoids','proanth',
       'color intensity','hue','od280','proline']
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", names=col, skiprows=1)

In [8]:
df.head()

Unnamed: 0,class,alcohal,malic acid,ash,alcal,mag,total,flavanoids,nonflavanoids,proanth,color intensity,hue,od280,proline
0,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
1,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
2,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
3,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
4,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 14 columns):
class              177 non-null int64
alcohal            177 non-null float64
malic acid         177 non-null float64
ash                177 non-null float64
alcal              177 non-null float64
mag                177 non-null int64
total              177 non-null float64
flavanoids         177 non-null float64
nonflavanoids      177 non-null float64
proanth            177 non-null float64
color intensity    177 non-null float64
hue                177 non-null float64
od280              177 non-null float64
proline            177 non-null int64
dtypes: float64(11), int64(3)
memory usage: 19.4 KB


### Step 2 : understand the data and format according to model building

In [10]:
X = df.iloc[:,1:]
y = df.iloc[:,0]

### Step 3 : Split the data into training and test dataset

In [11]:
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=7, stratify=y)

In [12]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54 entries, 47 to 67
Data columns (total 13 columns):
alcohal            54 non-null float64
malic acid         54 non-null float64
ash                54 non-null float64
alcal              54 non-null float64
mag                54 non-null int64
total              54 non-null float64
flavanoids         54 non-null float64
nonflavanoids      54 non-null float64
proanth            54 non-null float64
color intensity    54 non-null float64
hue                54 non-null float64
od280              54 non-null float64
proline            54 non-null int64
dtypes: float64(11), int64(2)
memory usage: 5.9 KB


### Step 4 : Fit the data into Logistic Reg model

In [13]:
logr = LogisticRegression()

In [14]:
logr.fit(X_train,y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

### Step 5 : Predict the model

In [15]:
y_pred = logr.predict(X_test)

### Step 6 : Check the accuracy

In [16]:
logr.score(X_train,y_train)

0.991869918699187

In [17]:
## using Accuracy score method from SKlearn
accuracy_score(y_test,y_pred)

0.9444444444444444

In [18]:
confusion_matrix(y_test,y_pred)

array([[16,  2,  0],
       [ 0, 21,  1],
       [ 0,  0, 14]], dtype=int64)

In [19]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           1       1.00      0.89      0.94        18
           2       0.91      0.95      0.93        22
           3       0.93      1.00      0.97        14

   micro avg       0.94      0.94      0.94        54
   macro avg       0.95      0.95      0.95        54
weighted avg       0.95      0.94      0.94        54



### Accuracy levels by Random state value with stratify

#### Accuracy score without stratify
    - Random state = 2, score = 99.0,  acc score = 92
    - Random state = 7, score = 97.0,  acc score = 96
    - Random state = 21, score = 97.0, acc score = 94
    
#### Accuracy score with stratify
    - Random state = 21, score = 98.0, acc score = 96
    - Random state = 2, score = 97, acc score = 1.0
    - Random state = 7, score = 99, acc score = 94
