# Scikit-learn overview

* Scikit-learn is a library containing many machine learning algorithms.
* It utilizes a genralized "estimator API(application programming interface)" framework to calling the models.
* This means the way algorithms are imported, fitted, and used is uniform across all algorithms.
* This allows users to easily swap algorithms in and out and test various approaches.
* Important Note: 
    This uniform framework also means users can easily apply almost any algorithm effectively without trully understanding what the algorithm is doing.
* Scikit-learn also comes with many convenience tools, including train test split functions, cross validation tools, and a variety of reporting metric functions.

**What is Estimator API**

It is one of the main APIs implemented by Scikit-learn. It provides a consistent interface for a wide range of ML applications that’s why all machine learning algorithms in Scikit-Learn are implemented via Estimator API. The object that learns from the data (fitting the data) is an estimator. It can be used with any of the algorithms like classification, regression, clustering or even with a transformer, that extracts useful features from raw data.

**Use of Estimator API**

Main uses of estimators are as follows −

Estimation and decoding of a model
Estimator object is used for estimation and decoding of a model. Furthermore, the model is estimated as a deterministic function of the following −

The parameters which are provided in object construction.

The global random state (numpy.random) if the estimator’s random_state parameter is set to none.

Any data passed to the most recent call to fit, fit_transform, or fit_predict.

Any data passed in a sequence of calls to partial_fit.

**Mapping non-rectangular data representation into rectangular data**

It maps a non-rectangular data representation into rectangular data. In simple words, it takes input where each sample is not represented as an array-like object of fixed length, and producing an array-like object of features for each sample.

**Distinction between core and outlying samples**

It models the distinction between core and outlying samples by using following methods −

fit

fit_predict if transductive

predict if inductive

# Philosophy of Scikit-Learn

* Scikit-Learn's approach to model building focuses on applying models and performance metrics.
* Academic users used to P style reporting may also want to explore the statsmodels python library if interested in more statistical description of models such as significance levels.

# Supervised Machine Learning Process

We will perform a Train|Tesp split for supervised learning. To do that, we use following codes:

from **sklearn.model_selection** import **train_test_split**

**X_train, X-test, y_train, y_test = train_test_split(X,y)**

Import model that we will use from its model family in sckit-learn as follows:

    from sklearn.model_family import ModelAlgo

Create an instance of the model as follows

    mymodel= ModelAlgo(param1, param2)

Train the model as follows

    mymodel.fit(X_train, y_train)
    
Predict using model as follows

    predictions = mymodel.predict(x_test)
    
Evaluate the model using R2 and error metrics as follows

    from sklearn.metrics import error_metric
    performance = error_metric(y_test, predictions)