# Scikit-learn

* Open source Python library for machine learning in python
* Contains methods for preprocessing data, cross-validation and visualization algorithms 
* using a unifed interface

## Installation

For pip installation, run the following command in the terminal:
    
`pip install scikit-learn`

## Using Scikit-Learn

In [1]:
import sklearn

## Data Loading

In [48]:
# Import scikit learn
from sklearn import datasets
# Load data
iris = datasets.load_iris()



# Print shape of data to confirm data is loaded
print(iris.data.shape)
print(type(iris.data[:2]))
print(iris.data[0] , iris.target[0])

(150, 4)
<class 'numpy.ndarray'>
[5.1 3.5 1.4 0.2] 0


## Scikit Learn Logistic Regression classification Example

### Import the model

In [6]:
from sklearn import linear_model

### Creating model

In [13]:
clf = linear_model.LogisticRegression()

### Split Iris data to training and testing data

In [22]:
from sklearn.model_selection import train_test_split
import numpy as np

# Split-out validation dataset
X = iris.data
Y = iris.target
validation_size = 0.20
seed = 7
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=validation_size, random_state=seed)

### Train classifier

In [23]:
clf.fit( X = X_train, y = y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

### Visualize the training data
![image.png](attachment:image.png)

### Get test data predictions

In [24]:
predictions = clf.predict(X_test)

In [25]:
print(predictions)

[2 2 0 2 2 0 1 1 0 1 2 2 0 2 0 2 2 2 0 0 1 2 1 2 1 2 1 1 2 2]


In [26]:
print(y_test)

[2 1 0 1 2 0 1 1 0 1 1 1 0 2 0 1 2 2 0 0 1 2 1 2 2 2 1 1 2 2]


## Evaluating the trained model

In [27]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt

#### Accuracy

In [40]:
print("Accuracy of model: %d %% " %(accuracy_score(Y_validation, predictions)*100))

Accuracy of model: 80 % 


#### Confusion matrix

In [41]:
print(confusion_matrix(Y_validation, predictions))

[[ 7  0  0]
 [ 0  7  5]
 [ 0  1 10]]


#### Classification Report

In [42]:
print(classification_report(Y_validation, predictions))

             precision    recall  f1-score   support

          0       1.00      1.00      1.00         7
          1       0.88      0.58      0.70        12
          2       0.67      0.91      0.77        11

avg / total       0.83      0.80      0.80        30

