## Contributing Sklearn's decompositon SVD to mlsquare

**Fork mlsquare repository to your account and clone.**

**Or just Clone https://github.com/mlsquare/mlsquare.git**

* Navigate to `src/mlsquare/architectures` folder, Where the code for mapping `TruncatedSVD()` to `tf.linalg.svd()` resides.
* The code for mapping primal model(SVD) to corresponding TF equivalent is saved in `sklearn.py` file.

**The following notebook may serve as walkthough procedure/tutorial.**
* Tutorial for how one may contribute new methods to mlsquare framework.
* Walkthrough procedure for evaluating results with contributed svd method.

### 1. Register the proxy SVD model in `mlsquare/architecture/sklearn.py` as follows


In [None]:
#from ..base import registry, BaseModel
from mlsquare.base import registry, BaseModel
from mlsquare.adapters.sklearn import SklearnKerasRegressor
from mlsquare.architectures.sklearn import GeneralizedLinearModel

from abc import abstractmethod
import tensorflow as tf
import pandas

class DimensionalityReductionModel:
    """
	A base class for all matrix decomposition models.

    This class can be used as a base class for any dimensionality reduction models.
    While implementing ensure all required methods are implemented or over written
    Please refer to sklearn decomposition module for more details.

    Methods
    -------
	fit(input_args)
        fits the model to output singular decomposed values.
        But outputs an object to further transform.

	fir_transform(input_args)
        fits the model to output input values with reduced dimensions.
    """
    #@abstractmethod
    #def fit(self, X, y= None, **kwargs):
    #    """Needs Implementation in sub classes"""

    @abstractmethod
    def fit_transform(self, X, y=None, **kwargs):
        """Needs Implementation in sub classes"""

@registry.register
class SVD(DimensionalityReductionModel, GeneralizedLinearModel):
    def __init__(self):
        self.adapter = SklearnTfTransformer
        self.module_name = 'sklearn'
        self.name = 'TruncatedSVD'
        self.version = 'default'
        model_params = {'full_matrices': False, 'compute_uv': True, 'name':None}
        self.set_params(params=model_params, set_by='model_init')

    def fit(self, X, y=None, **kwargs):
        self.fit_transform(X)
        return self

    def fit_transform(self, X, y=None,**kwargs):
        model_params= _parse_params(self._model_params, return_as='nested')

        #changing to recommended dtype, accomodating dataframe & numpy array
        X = np.array(X, dtype= np.float32 if str(X.values.dtype)==
        'float32' else np.float64) if isinstance(X,
        pandas.core.frame.DataFrame) else np.array(X, dtype= np.float32
        if str(X.dtype)=='float32' else np.float64)

        n_components= self.primal.n_components#using primal attributes passed from adapter
        n_features = X.shape[1]

        if n_components>= n_features:
                raise ValueError("n_components must be < n_features;"
                                 " got %d >= %d" % (n_components, n_features))

        sess= tf.Session()#for TF  1.13
        s,u,v= sess.run(tf.linalg.svd(X, full_matrices=model_params['full_matrices'], compute_uv=model_params['compute_uv']))#for TF  1.13
        #s: singular values
        #u: normalised projection distances
        #v: decomposition/projection orthogonal axes

        self.components_= v[:n_components,:]
        X_transformed = u[:,:n_components] * s[:n_components]

        self.explained_variance_= np.var(X_transformed, axis=0)
        self.singular_values_ = s[:n_components]

        #passing sigma & vh to adapter for subsequent access from adapter object itself.
        model_params={'singular_values_':self.singular_values_,'components_':self.components_}
        self.update_params(model_params)

        return X_transformed

    def transform(self, X):
        return np.dot(X, self.components_.T)

    def inverse_transform(self, X):
        return np.dot(X, self.components_)


### 2. Define a new adapter `SklearnTfTransformer` mapping `sklearn.decomposition.TruncatedSVD`  to `tensorflow.linalg.svd` in `mlsquare/adapters/sklearn.py`  and work with usual sklearn methods. 
* The adapter serves as a wrapper to perform operations underlying `proxy_model` in architecture.

In [None]:
from mlsquare.utils.functions import _parse_params
import numpy as np
from ..architectures import sklearn

class SklearnTfTransformer():
    """
	Adapter to connect sklearn decomposition methods to respective TF implementations.

    This class can be used as an adapter for primal decomposition methods that can
    utilise TF backend for proxy model.

    Parameters
    ----------
    proxy_model : proxy model instance
        The proxy model passed from dope.

    primal_model : primal model instance
        The primal model passed from dope.

    params : dict, optional
        Additional model params passed by the user.


    Methods
    -------
	fit(X, y)
        Method to train a transpiled model

	transform(X)
        Method to transform the input matrix to truncated dimensions;
        Only once the decomposed values are computed.

	fit_transform(X)
        Method to right away transform the input matrix to truncated dimensions.

	inverse_transform(X)
        This method returns Original values from the resulting decomposed matrices.

    """

    def __init__(self, proxy_model, primal_model, **kwargs):
        self.primal_model = primal_model
        self.proxy_model = proxy_model
        self.proxy_model.primal = self.primal_model
        #self.proxy_model(primal_model)#to access proxy_model.n_components
        self.params = None

    def fit(self, X, y=None, **kwargs):
        self.proxy_model.X = X
        self.proxy_model.y = y

        if self.params != None: ## Validate implementation with different types of tune input
            if not isinstance(self.params, dict):
                raise TypeError("Params should be of type 'dict'")
            self.params = _parse_params(self.params, return_as='flat')
            self.proxy_model.update_params(self.params)

        #if self.proxy_model.__class__.__name in ['SVD', 'PCA']:
        if isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            self.fit_transform(X)

            self.params = self.proxy_model.get_params()
            #to avoid calling model.fit(X).proxy_model for sigma & Vh
            self.components_= self.params['components_']
            self.singular_values_= self.params['singular_values_']
            return self

    def transform(self, X):
        if not isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            raise AttributeError("'SklearnTfTransformer' object has no attribute 'transform'")
        return self.proxy_model.transform(X)

    def fit_transform(self, X,y=None):
        if not isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            raise AttributeError("'SklearnTfTransformer' object has no attribute 'fit_transform'")
        self.proxy_model.primal = self.primal_model
        return self.proxy_model.fit_transform(X)

    def inverse_transform(self, X):
        if not isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            raise AttributeError("'SklearnTfTransformer' object has no attribute 'inverse_transform'")
        return self.proxy_model.inverse_transform(X)


**Registered methods so far:**

In [1]:
from mlsquare.base import registry
registry.data

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-12-04 22:16:46,838	INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-12-04_22-16-46_14956/logs.
2019-12-04 22:16:46,947	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:50854 to respond...
2019-12-04 22:16:47,064	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:63690 to respond...
2019-12-04 22:16:47,066	INFO services.py:760 -- Starting Redis shard with 20.0 GB max memory.
2019-12-04 22:16:47,101	INFO services.py:1384 -- Starting the Plasma object store with 1.0 GB memory using /dev/shm.


{('sklearn',
  'TruncatedSVD'): {'default': [<mlsquare.architectures.sklearn.SVD at 0x7fe568ff5ba8>,
   mlsquare.adapters.sklearn.SklearnTfTransformer]},
 ('sklearn',
  'LogisticRegression'): {'default': [<mlsquare.architectures.sklearn.LogisticRegression at 0x7fe568f870b8>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'LinearRegression'): {'default': [<mlsquare.architectures.sklearn.LinearRegression at 0x7fe568f87278>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Ridge'): {'default': [<mlsquare.architectures.sklearn.Ridge at 0x7fe568f87438>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Lasso'): {'default': [<mlsquare.architectures.sklearn.Lasso at 0x7fe568f875f8>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'ElasticNet'): {'default': [<mlsquare.architectures.sklearn.ElasticNet at 0x7fe568f877f0>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'LinearSVC'): {'defaul

(**Once the new model is registered & corresponding adapter is defined in mlsquare framework.**)
#### User Interaction with `dope` with sklearn SVD preference & intent to utilise underlying TF SVD 

    

    1. a) User instantiates a primal model `sklearn.decomposition.TruncatedSVD` with args --`n_components` as number of required singular components.
    b) User loads the data & proceed with necessary data preparation steps. 
    
    
    2. Now, import `dope` from mlsquare & `dope` the primal model by passing primal model to dope function. The `dope` function equips above primal model with standard sklearn methods--`fit, fit_transform, save, explain.`
    
    3.  Carry on with usual sklearn SVD methods; Try out sklearn 
    methods -- `.fit( )`, `.fit_transform( )`, `.transform( )` with the doped model.

### 1.a Instantiate primal module
* n_components: 10 (number of reduced dimensions)

In [2]:
import numpy as np
from sklearn.decomposition import TruncatedSVD

primal = TruncatedSVD(n_components=10)

In [3]:
primal.get_params()

{'algorithm': 'randomized',
 'n_components': 10,
 'n_iter': 5,
 'random_state': None,
 'tol': 0.0}

### 1.b Following are data preparation steps required to instantiate a svd model
* Also evaluating the regression results at various stages with varying dimensions.

In [4]:
import os
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.preprocessing import LabelEncoder


import pandas as pd
reg= linear_model.LinearRegression()

boston =load_boston()
df_x= pd.DataFrame(boston.data, columns= boston.feature_names)
lbe= LabelEncoder()
df_x = df_x.apply(lambda x: lbe.fit_transform(x))#df_x[col]))
df_y= df_y= pd.DataFrame(boston.target)


print('original df_x dims:', df_x.shape)
xtrain, xtest, ytrain, ytest = train_test_split(df_x, df_y, test_size=0.2)

original df_x dims: (506, 13)


In [5]:
df_x.head(3)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0,3,19,0,51,320,172,297,0,34,9,356,53
1,23,0,56,0,36,279,225,333,1,11,23,356,161
2,22,0,56,0,36,400,159,333,1,11,23,271,28


* **Validating results with full dimensionality.**

In [6]:
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

0.7102619689909073


* **Validating results with reduced dimensionality through primal model.**

In [7]:
skl_truncated_x = primal.fit(df_x).transform(df_x)

xtrain, xtest, ytrain, ytest = train_test_split(skl_truncated_x, df_y, test_size=0.2)
print('sklearn_svd truncated dims:', skl_truncated_x.shape)
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

sklearn_svd truncated dims: (506, 10)
0.7406353348247017


### 2. dope the model to obtain keras svd

In [8]:
from mlsquare import dope

model = dope(primal)# adapter(proxy_model=proxy_model, primal_model=primal)

Transpiling your model to it's Deep Neural Network equivalent...


In [9]:
print('proxy model object from registry:\n', model.proxy_model, '\n\ncorrespnding adapter:\n', model)

proxy model object from registry:
 <mlsquare.architectures.sklearn.SVD object at 0x7fe568ff5ba8> 

correspnding adapter:
 <mlsquare.adapters.sklearn.SklearnTfTransformer object at 0x7fe5402cab38>


In [10]:
??model

### 3. Try out sklearn methods-- `.fit( )`, `.transform( )`& `.fit_transform( )` to obtain reduced dimensionality, with sklearn's `boston_dataset` from `1.b` above.
* Fitting the doped model with -- Dataframe input Or Numpy array inputs

In [11]:
inp= np.array(df_x.values, dtype= np.float64)

#dope_truncated_x=model.fit_transform(df_x) #takes in dataframe input
dope_truncated_x= model.fit_transform(inp)

dope_truncated_x.shape
#dimensionality reduced to n_components using tf.linalg.svd

(506, 10)

* **Validating results with reduced dimensionality through doped model & ascertaining approximately faithful results through underlying TF method.**

In [12]:
#truncated_x= model.fit(df_x).fit_transform(df_x)
xtrain, xtest, ytrain, ytest = train_test_split(dope_truncated_x, df_y, test_size=0.2)

print('doped_svd truncated dims:', dope_truncated_x.shape)

reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

doped_svd truncated dims: (506, 10)
0.7128582648403458


____________________________

### Remarks on accessing/evaluating adapter methods & attributes

**Note:**
* `model` : adapter/`SklearnTfTransformer` object (Cell#8).
* `primal` : sklearn/`TruncatedSVD` object (Cell#7)
* `proxy_model` : architecture/`SVD` object (Cell#8).


* `model.fit` implicitly calls adapter's `.fit_transform` which then routes to archiecture's `proxy_model.fit_transform()` where -- `components_`, `singular_values_` & `X_transformed` is computed.
* model fit returns adapter object only.

In [13]:
model.fit(inp)

<mlsquare.adapters.sklearn.SklearnTfTransformer at 0x7fe5402cab38>

* In sklearn's context, `primal_model.fit()` leads to computation of intrinsic state/attributes such as `components_` or `vh` and `singular_values_` or Sigma along with truncated input values.
* In sklearn `.fit( )` enables user to call for values of Sigma & Vh as `primal_model.singular_values_` & `primal_model.components_` respectively; So should be the case post doping primal_model.

In [14]:
primal.singular_values_#Output from primal model post fit

array([12795.41792279,  5233.02454574,  2860.09836322,  2199.84308866,
        1596.72603145,  1118.21187031,   369.75399984,   304.29119115,
         245.93492619,   193.30180866])

In [15]:
model.singular_values_#Output from proxy_model post fit on dope object.

array([12795.41792279,  5233.02454574,  2860.09836322,  2199.84308866,
        1596.72603145,  1118.21187031,   369.75399984,   304.29119115,
         245.93492619,   193.30180866])

* further Chaining/subsequent method calls from `model.fit()` to `model.transform()` as in context of sklearn's `primal_model.fit(inp).transform(inp)` executes on same sklearn object.
* Since the adapter serves as a wrapper to access architecture's methods for whatever operations required on proxy_model. In case of chained calls/subsequent methods, it is ensured that respective operations are accessed via adapter object(model) ONLY.
* So all defined methods in architecture-- `fit`, `fit_transform`, `transform`, `inverse_transform` should be availed through respective adapter methods.

In [16]:
model.fit(inp).transform#chained call on adapter

<bound method SklearnTfTransformer.transform of <mlsquare.adapters.sklearn.SklearnTfTransformer object at 0x7fe5402cab38>>

In [17]:
model.fit_transform

<bound method SklearnTfTransformer.fit_transform of <mlsquare.adapters.sklearn.SklearnTfTransformer object at 0x7fe5402cab38>>