# Introduction to Scikit-Learn 

Hi Guys, Welcome to [Tirendaz Academy](https://youtube.com/c/tirendazacademy) 😀

---
I'm goint to talk about Scikit-Learn library.
In short, the notebooks covers the following topics:
- Buiding the model
- Data scaling
- Pipelines
- Model evaluation
- Automatic parameter searches
---
Happy Learning 😀

## Building the Model

In [1]:
from sklearn.ensemble import RandomForestClassifier

In [2]:
clf = RandomForestClassifier(random_state=0)

In [3]:
X = [[1,2,3],
    [11,12,13]]
y = [0,1]

In [4]:
clf.fit(X,y)

RandomForestClassifier(random_state=0)

In [5]:
clf.predict(X)

array([0, 1])

In [6]:
clf.predict([[4,5,6],[14,15,16]])

array([0, 1])

## Data Scaling

In [7]:
from sklearn.preprocessing import StandardScaler

In [8]:
X = [[0,15],
    [1,-10]]

In [9]:
StandardScaler().fit(X).transform(X)

array([[-1.,  1.],
       [ 1., -1.]])

## Pipelines

In [10]:
from sklearn.pipeline import make_pipeline

In [11]:
from sklearn.linear_model import LogisticRegression

In [12]:
pipe = make_pipeline(
    StandardScaler(),
    LogisticRegression())

In [13]:
from sklearn.datasets import load_iris

In [14]:
X, y = load_iris(return_X_y=True)

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [17]:
pipe.fit(X_train, y_train)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('logisticregression', LogisticRegression())])

In [18]:
from sklearn.metrics import accuracy_score

In [19]:
accuracy_score(pipe.predict(X_test), y_test)

0.9736842105263158

## Model Evaluation

In [20]:
from sklearn.datasets import make_regression

In [21]:
X, y = make_regression(n_samples=1000, random_state=0)

In [22]:
from sklearn.linear_model import LinearRegression

In [23]:
lr = LinearRegression()

In [24]:
from sklearn.model_selection import cross_validate

In [25]:
result = cross_validate(lr, X, y)

In [26]:
result["test_score"]

array([1., 1., 1., 1., 1.])

## Automatic Parameter Searches

In [27]:
from sklearn.datasets import fetch_california_housing 

In [28]:
X, y = fetch_california_housing(return_X_y=True)

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

In [30]:
from sklearn.model_selection import RandomizedSearchCV

In [31]:
from scipy.stats import randint

In [33]:
param_distributions = {"n_estimators" : randint(1,5),
                      "max_depth" : randint(5,10)} 

In [34]:
from sklearn.ensemble import RandomForestRegressor
search = RandomizedSearchCV(
    estimator = RandomForestRegressor(random_state=0),
    n_iter = 5, 
    param_distributions = param_distributions,
    random_state=0)

In [35]:
search.fit(X_train, y_train)

RandomizedSearchCV(estimator=RandomForestRegressor(random_state=0), n_iter=5,
                   param_distributions={'max_depth': <scipy.stats._distn_infrastructure.rv_frozen object at 0x00000246C75F2610>,
                                        'n_estimators': <scipy.stats._distn_infrastructure.rv_frozen object at 0x00000246C8145C10>},
                   random_state=0)

In [36]:
search.best_params_

{'max_depth': 9, 'n_estimators': 4}

In [37]:
search.score(X_test, y_test)

0.735363411343253

Don't forget to follow us on [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [GitHub](http://github.com/tirendazacademy) | [Linkedin](https://www.linkedin.com/in/tirendaz-academy) | [Kaggle](https://www.kaggle.com/tirendazacademy) 😎