 # Inroduction to Scikt_Learn(Sklearn)
    
This notebook demonstrates some of the most useful functions of 
the beautiful Scikit-Learn Library.
![w]
What we're going to cover:

0. An end-to-end Scikit-Learn Workfloe
1. Getting the data ready.
2. Choose the right estimator/algorithm for our problems.
3. Fit the model/ algorithm and use it to make predictions on our data
4. Evaluating a model 
5. Improve a model 
6. Save and load trained model
7. Putting it all together.


## 0. An end-to-end Scikit-Learn Workflow

## 1. Get the data ready

In [25]:
import pandas as pd
import numpy as np
heart_disease = pd.read_csv("heart-disease.csv")

In [9]:
# Create X (features matrix)
x = heart_disease.drop("target",axis=1)

# Create Y (labels)
y = heart_disease["target"]


## 2. Choose the right model and Hyperparameters


In [19]:
from sklearn.ensemble import RandomForestClassifier
#Classifier (clf)
clf = RandomForestClassifier()

# We'll keep the default hyperparameters
clf.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

## 3. Fit the model to the training data

In [65]:
from sklearn.model_selection import train_test_split

x_train , x_test, y_train, y_test = train_test_split(x,y, test_size=0.25)

In [66]:
clf.fit(x_train,y_train);

In [67]:
# Make a prediction
y_preds = clf.predict(x_test)

In [68]:
y_preds

array([0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1])

## 4. Evaluate the model on the training data and test data
   

In [69]:
clf.score(x_train,y_train)

1.0

In [70]:
clf.score(x_test,y_test)

0.8026315789473685

In [71]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(classification_report(y_test,y_preds))

              precision    recall  f1-score   support

           0       0.85      0.74      0.79        39
           1       0.76      0.86      0.81        37

    accuracy                           0.80        76
   macro avg       0.81      0.80      0.80        76
weighted avg       0.81      0.80      0.80        76



In [72]:
print(confusion_matrix(y_test,y_preds))

[[29 10]
 [ 5 32]]


In [73]:
accuracy_score(y_test,y_preds)

0.8026315789473685

## 5. Improve a model

In [74]:
# Try different amount of n_estimatoesrs
np.random.seed(42)
for i in range(10,100,10):
    print(f"Trying model with {i} estimator...." )
    clf = RandomForestClassifier(n_estimators=i).fit(x_train,y_train)
    print(f"Model accuracy: {clf.score(x_test,y_test) *100:.2f}%\n")

Trying model with 10 estimator....
Model accuracy: 73.68%

Trying model with 20 estimator....
Model accuracy: 78.95%

Trying model with 30 estimator....
Model accuracy: 81.58%

Trying model with 40 estimator....
Model accuracy: 78.95%

Trying model with 50 estimator....
Model accuracy: 82.89%

Trying model with 60 estimator....
Model accuracy: 78.95%

Trying model with 70 estimator....
Model accuracy: 82.89%

Trying model with 80 estimator....
Model accuracy: 82.89%

Trying model with 90 estimator....
Model accuracy: 82.89%



## 6.Save the model and load it


In [75]:
import pickle

pickle.dump(clf,open("random_forest_model_1.pkl","wb"))

### Loading a model

In [76]:
loaded_model = pickle.load(open("random_forest_model_1.pkl","rb"))
loaded_model.score(x_test,y_test)

0.8289473684210527

In [None]:
sklear