### Contributing Sklearn's decompositon SVD to mlsquare

**Fork mlsquare repository to your account and clone.**

**Or just Clone https://github.com/mlsquare/mlsquare.git**

* Navigate to `src/mlsquare/architectures` folder, Where the code for mapping `TruncatedSVD()` to `tf.linalg.svd()` resides.
* The code for mapping primal model(SVD) to corresponding TF equivalent is saved in `sklearn.py` file.

In [1]:
import os
os.getcwd()

'/home/kev/Desktop/mlsquare/src'

In [2]:
import tensorflow as tf
tf.__version__

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


'1.13.1'

#### 1. Register the proxy SVD model in `mlsquare/architecture/sklearn.py` as follows

In [3]:
#from ..base import registry, BaseModel
from mlsquare.base import registry, BaseModel
from mlsquare.adapters.sklearn import SklearnKerasDecompose
from mlsquare.architectures.sklearn import GeneralizedLinearModel

#from mlsquare.adapters.sklearn import #SurpriselibModels

@registry.register
class SVD(GeneralizedLinearModel):
    def __init__(self):
        self.adapter = SklearnKerasDecompose
        self.module_name = 'sklearn'
        self.name = 'TruncatedSVD'
        self.version = 'default'
        model_params = {'full_matrices': False,
                       'compute_uv': True,
                      'name':None}

              #          }

        self.set_params(params=model_params, set_by='model_init')
    def create_model(self, **kwargs):
        pass

Using TensorFlow backend.
2019-11-20 14:02:24,075	INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-11-20_14-02-24_5650/logs.
2019-11-20 14:02:24,202	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:24890 to respond...
2019-11-20 14:02:24,323	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:34796 to respond...
2019-11-20 14:02:24,326	INFO services.py:760 -- Starting Redis shard with 20.0 GB max memory.
2019-11-20 14:02:24,347	INFO services.py:1384 -- Starting the Plasma object store with 1.0 GB memory using /dev/shm.


#### 2. Define an adapter `SklearnKerasDecompose` for mapping `sklearn.decomposition.TruncatedSVD`  to `tensorflow.linalg.svd` in `mlsquare/adapters/sklearn.py` to work with sklearn methods. 

In [4]:
from mlsquare.utils.functions import _parse_params
import numpy as np

import tensorflow as tf
from keras.utils import to_categorical

class SklearnKerasDecompose():
    def __init__(self, proxy_model, primal_model, **kwargs):
        self.primal_model = primal_model
        self.params = None ## Temporary!
        self.proxy_model = proxy_model
        
    def fit(self, X, y=None, **kwargs):
        self.fit_transform(X)
        return self
    
    def fit_transform(self, X, y=None,**kwargs):
        kwargs.setdefault('full_matrices', False)
        kwargs.setdefault('params', self.params)
        kwargs.setdefault('space', False)
        kwargs.setdefault('compute_uv', True)
        kwargs.setdefault('name', None)
        self.params = kwargs['params']
        X = np.array(X)
        y = np.array(y)
        
        primal_model = self.primal_model
        self.proxy_model.num_components= primal_model.n_components
        
        sess= tf.Session()#for TF  1.13
        s,u,v= sess.run(tf.linalg.svd(X))#for TF  1.13
        #s: singular values
        #u: normalised projection distances
        #v: decomposition/projection orthogonal axes
        
        self.proxy_model.components_= v[:self.proxy_model.num_components,:]#analogous to TruncatedSVD().components_ Or primal_model.components_ Or Vh component from randomised SVD
        #Sigma = s[:self.proxy_model.num_components]
        X_transformed = u[:,:self.proxy_model.num_components] * s[:self.proxy_model.num_components]
        self.singular_values_ = s[:self.proxy_model.num_components]# Store the n_components singular values            
        return X_transformed
    
    def explained_variance_(self):
        print('Method not implemented yet!')
        
    def explained_variance_ratio_(self):
        print('Method not implemented yet!')    
    
    def explain(self, **kwargs):
        # @param: SHAP or interpret
        print('Coming soon...')

* registered methods so far:

In [5]:
from mlsquare.base import registry
registry.data

{('sklearn',
  'xyz'): {'default': [<mlsquare.architectures.sklearn.SVD_1 at 0x7fc68d0309b0>,
   mlsquare.adapters.sklearn.SklearnKerasDecompose_1]},
 ('sklearn',
  'TruncatedSVD'): {'default': [<__main__.SVD at 0x7fc69ba24be0>,
   mlsquare.adapters.sklearn.SklearnKerasDecompose]},
 ('sklearn',
  'LogisticRegression'): {'default': [<mlsquare.architectures.sklearn.LogisticRegression at 0x7fc6ec451fd0>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'LinearRegression'): {'default': [<mlsquare.architectures.sklearn.LinearRegression at 0x7fc6ec45e1d0>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Ridge'): {'default': [<mlsquare.architectures.sklearn.Ridge at 0x7fc6ec45e390>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Lasso'): {'default': [<mlsquare.architectures.sklearn.Lasso at 0x7fc6ec45e550>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'ElasticNet'): {'default': [<mlsquare.architectures.s

(**Once the new model is registered & corresponding adapter is defined in mlsquare framework.**)
#### User Interaction with `dope` with sklearn SVD preference & intent to utilise underlying TF SVD 

    

    1. a) User instantiates a primal model `sklearn.decomposition.TruncatedSVD` with args --`n_components` as number of required singular components.
    b) User loads the data & proceed with necessary data preparation steps 
    
    
    2. Now, import `dope` from mlsquare & `dope` the primal model by passing primal model to dope function. The `dope` function equips above primal model with standard sklearn methods--`fit, fit_transform, save, explain.`
    
    3.  Carry on with usual sklearn SVD methods; Try out sklearn 
    methods -- `.fit( )` & `.fit_transform( )` with the doped model.

#### 1.a Instantiate primal module
* n_components: 10 (number of reduced dimensions)

In [6]:
from sklearn.decomposition import TruncatedSVD

primal = TruncatedSVD(n_components=10)

In [7]:
primal.get_params()

{'algorithm': 'randomized',
 'n_components': 10,
 'n_iter': 5,
 'random_state': None,
 'tol': 0.0}

#### 1.b Following are data preparation steps required to instantiate a svd model
* Also evaluating the regression results at various stages with varying dimensions.

In [8]:
import os
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.preprocessing import LabelEncoder


import pandas as pd
reg= linear_model.LinearRegression()

boston =load_boston()
df_x= pd.DataFrame(boston.data, columns= boston.feature_names)
lbe= LabelEncoder()
df_x = df_x.apply(lambda x: lbe.fit_transform(x))#df_x[col]))
df_y= df_y= pd.DataFrame(boston.target)


xtrain, xtest, ytrain, ytest = train_test_split(df_x, df_y, test_size=0.2)
print(xtrain.shape, xtest.shape)

(404, 13) (102, 13)


In [9]:
df_x.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0,3,19,0,51,320,172,297,0,34,9,356,53
1,23,0,56,0,36,279,225,333,1,11,23,356,161
2,22,0,56,0,36,400,159,333,1,11,23,271,28
3,32,0,16,0,33,383,112,361,2,5,31,311,6
4,110,0,16,0,33,395,139,361,2,5,31,356,64


* Validating results with full dimensionality.

In [10]:
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

0.7718433212036813


* Validating results with reduced dimensionality through primal model.

In [11]:
skl_truncated_x = primal.fit(df_x).transform(df_x)

xtrain, xtest, ytrain, ytest = train_test_split(skl_truncated_x, df_y, test_size=0.2)
print('sklearn_svd truncated dims:', skl_truncated_x.shape)
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

sklearn_svd truncated dims: (506, 10)
0.7610344819179395


#### 2. dope the model to obtain keras svd

In [12]:
from mlsquare import dope

model = dope(primal)# adapter(proxy_model=proxy_model, primal_model=primal)

Transpiling your model to it's Deep Neural Network equivalent...


In [13]:
print('proxy model object from registry:\n', model.proxy_model)

proxy model object from registry:
 <__main__.SVD object at 0x7fc69ba24be0>


#### 3. Try out sklearn methods-- `.fit( )` & `.fit_transform( )` to obtained reduced dimensionality.

In [14]:
a= np.random.randn(5,4)
print('input arr a:\n',a,'\n\nshape of a:', a.shape, '\n\ndtype of a:', a.dtype)

input arr a:
 [[-0.37185829  0.92336504  1.5171664   1.18310932]
 [ 0.33509612 -0.25202299  2.27541814 -0.01449869]
 [-1.53221078 -0.50267072 -1.10473735  0.46878196]
 [-0.90897815  0.47738812 -1.0676476  -2.14099275]
 [-1.00401905  0.38661532  0.55889188 -0.32992535]] 

shape of a: (5, 4) 

dtype of a: float64


In [15]:
tf_truncated_x= model.fit_transform(a)
tf_truncated_x

array([[ 1.765971  ,  0.07249544, -1.10125291,  0.59645285],
       [ 1.97173184, -1.00333899, -0.06425587, -0.67453579],
       [-1.16614764,  1.03002233, -1.16152227, -0.52016207],
       [-2.12782336, -1.44799052, -0.36044253,  0.15373504],
       [ 0.04632845, -0.639334  , -1.07900081, -0.05999566]])

In [16]:
tf_truncated_x.shape

(5, 4)

* Similarly with sklearn's `boston_dataset` from `1.b` above

In [17]:
inp2= np.array(df_x.values, dtype= np.float64)

In [18]:
dope_truncated_x= model.fit_transform(inp2)
dope_truncated_x

array([[ 471.04962214,  330.53051303,    8.67494127, ...,  -10.20964288,
         -11.40944565,   26.75673524],
       [ 545.473621  ,  266.09903932,   78.12796733, ...,   19.84120668,
          -4.45406447,   -5.85600368],
       [ 477.92571461,  357.67984616,  -89.8250494 , ...,   28.90475808,
          -3.75504454,   -3.6268214 ],
       ...,
       [ 533.89491136,  184.50034054, -103.19963966, ...,   29.29001292,
          13.7427487 ,   -3.59156992],
       [ 545.86519119,  132.99450123, -118.31830091, ...,   28.33873663,
          12.90476995,   -3.26489671],
       [ 463.39533327,  126.36114053,   96.81230318, ...,   29.68789887,
           7.78444792,    4.41248029]])

In [19]:
dope_truncated_x.shape
#dimensionality reduced to n_components using tf.linalg.svd

(506, 10)

* Validating results with reduced dimensionality through doped model & ascertaining approximately faithful results through underlying TF method.

In [20]:
#truncated_x= model.fit(df_x).fit_transform(df_x)
xtrain, xtest, ytrain, ytest = train_test_split(dope_truncated_x, df_y, test_size=0.2)

print('doped_svd truncated dims:', dope_truncated_x.shape)

reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

doped_svd truncated dims: (506, 10)
0.6110820718545925


In [21]:
np.allclose(skl_truncated_x, dope_truncated_x)

False