### Machine learning with Python

There are many different python packages, one of them we use here is Scikit-Learn(sklearn) Package.
It is the most popular machine learning package for python and has a lot of tools and algorithms builtin for machine learning tasks, such as classification, regression, clustering, dimensionality reduction and more.

It is built on top of other python libraries, such as Numpy and Scipy, and integrates well with the scientific python ecosystem.

- Scikit learn provides a straightforward and consistent API, making it easy to use and learn.
- Includes an extensive selection of algorithms such as support vector machine (SVM), random forests, Knearest neighbors, decision trees, gradient boosting, logistic regression and more.
- It offers tools for data processing: data scaling, feature extraction and handling missing values
- Provides tools for model evaluation, various classification and regression, clustering metrics.

> pip install scikit-learn
or
> conda install scikit-learn


#### ML Process
                          
Data Acquisition --> Data Cleaning -->  Model Training and Building --> Model Testing --> Model Deployment

- Data cleaning --> Model training and test data
- Model testing --> model training and model deployment

Scikit -learn

In __scikit-learn__, every machine learning algorithm is exposed through the concept of __"estimators."__ 
This means that each algorithm, whether it's for classification, regression, clustering, or any other task, is represented by a class that follows a common interface and set of methods.

1. Import model,
from sklearn.linear_model import LinearRegression

linear_model is a family of models; LinearRegression is model itself/estimator

In [1]:
from sklearn.linear_model import LinearRegression

In [3]:
model = LinearRegression(normalize = True)
print(model)

LinearRegression(normalize=True)


In [6]:
model = LinearRegression(copy_X = True, fit_intercept = True, normalize = True)
print(model)


LinearRegression(normalize=True)


__Once you create your model with parameters, you need to fit your model on some data. But, we should split data into training and test dataset.__
Example of how we can do that:

In [8]:
import numpy as np
from sklearn.model_selection import train_test_split

X, y = np.arange(10).reshape((5,2)), range(5)
X


array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [9]:
list(y)

[0, 1, 2, 3, 4]

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3)

X_train

array([[4, 5],
       [8, 9],
       [6, 7]])

In [11]:
y_train

[2, 4, 3]

In [13]:
X_test

array([[0, 1],
       [2, 3]])

In [14]:
y_test

[0, 1]

Now we have split the data, we can train/fit our model on the training data.
This is done using the model.fit()method

> model.fit(X_train, y_train)

Now the model has been fit and trained on training data, the model is ready to predict lables or values on the test set.

> predictions = model.predict(X_test)

Trained model is going to predict new set of data.

After model is ready to predict lables, we evalute our model by comparing our predictions to correct values. Evaluation method depends on what sorts of Aalgorithm we are using (eg: regression, clustering, classification)


__Some available methods in estimator:__

1. model.predict_proba() : 
    - used for classification problems.
    - returns the probability that a new observation has each categorical label.
    - return the label with the highest probabilities. (model.predict())

2. model.score():
    - for classification and regression problems.
    - scores ranges from 0 to 1.
    - larger the score, better the fit.
    
3. model.predict():
    - In unsupervised model
    - predict labels in clustering algorithm.
    
4. model.transform():
    - In unsupervised model
    - transform new data into new basis
    - it also accepts one argument X_new and returns the new representation of the data based on unsupervised model.
    
5. model.fit_transform():
   - In unsupervised model.
   - more efficient.
   - performs a fit and transform on the same i/p data.
