### Contributing Sklearn's decompositon SVD to mlsquare

**Fork mlsquare repository to your account and clone.**

**Or just Clone https://github.com/mlsquare/mlsquare.git**

* Navigate to `src/mlsquare/architectures` folder, Where the code for mapping `TruncatedSVD()` to `tf.linalg.svd()` resides.
* The code for mapping primal model(SVD) to corresponding TF equivalent is saved in `sklearn.py` file.

In [37]:
import os
os.getcwd()

'/home/kev/Desktop/mlsquare_experiments/src'

In [2]:
import tensorflow as tf
tf.__version__

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


'1.13.1'

#### 1. Register the proxy SVD model in `mlsquare/architecture/sklearn.py` as follows

In [3]:
#from ..base import registry, BaseModel
from mlsquare.base import registry, BaseModel
from mlsquare.adapters.sklearn import SklearnKerasDecompose
from mlsquare.architectures.sklearn import GeneralizedLinearModel

#from mlsquare.adapters.sklearn import #SurpriselibModels

@registry.register
class SVD(GeneralizedLinearModel):
    def __init__(self):
        self.adapter = SklearnKerasDecompose
        self.module_name = 'sklearn'
        self.name = 'TruncatedSVD'
        self.version = 'default'
        model_params = {'full_matrices': False,
                       'compute_uv': True,
                      'name':None}

              #          }

        self.set_params(params=model_params, set_by='model_init')
    def create_model(self, **kwargs):
        pass

Using TensorFlow backend.
2019-11-28 17:44:53,353	INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-11-28_17-44-53_6525/logs.
2019-11-28 17:44:53,493	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:21174 to respond...
2019-11-28 17:44:53,603	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:27128 to respond...
2019-11-28 17:44:53,605	INFO services.py:760 -- Starting Redis shard with 20.0 GB max memory.
2019-11-28 17:44:53,619	INFO services.py:1384 -- Starting the Plasma object store with 1.0 GB memory using /dev/shm.


#### 2. Define an adapter `SklearnKerasDecompose` for mapping `sklearn.decomposition.TruncatedSVD`  to `tensorflow.linalg.svd` in `mlsquare/adapters/sklearn.py` to work with sklearn methods. 

In [4]:
from mlsquare.utils.functions import _parse_params
import numpy as np

import tensorflow as tf
from keras.utils import to_categorical

class SklearnKerasDecompose():
    def __init__(self, proxy_model, primal_model, **kwargs):
        self.primal_model = primal_model
        self.params = None ## Temporary!
        self.proxy_model = proxy_model
        self.n_components= primal_model.n_components#moved here so user like in sklearncan can access n_components, even before .fit_transform. 
        
    def fit(self, X, y=None, **kwargs):
        self.fit_transform(X)
        return self
    
    def fit_transform(self, X, y=None,**kwargs):
        kwargs.setdefault('full_matrices', False)
        kwargs.setdefault('params', self.params)
        kwargs.setdefault('space', False)
        kwargs.setdefault('compute_uv', True)
        kwargs.setdefault('name', None)
        self.params = kwargs['params']
        X = np.array(X)
        y = np.array(y)
        
        #primal_model = self.primal_model
        #self.proxy_model.n_components= primal_model.n_components        
        #self.n_components= primal_model.n_components # Now its callable as model.num_components just like a sklearn svd object
        
        #?--should the `.num_components`, '.components_', '.singular_values_' be defined as attributes of proxy_model class or adapter class --?
        
        k = self.n_components
        n_features = X.shape[1]
        if k>= n_features:
                raise ValueError("n_components must be < n_features;"
                                 " got %d >= %d" % (k, n_features))
            
        sess= tf.Session()#for TF  1.13
        s,u,v= sess.run(tf.linalg.svd(X, full_matrices=kwargs['full_matrices'], compute_uv=kwargs['compute_uv']))#for TF  1.13
        #s: singular values
        #u: normalised projection distances
        #v: decomposition/projection orthogonal axes
        
        self.components_= v[:self.n_components,:]
        #self.proxy_model.components_= v[:self.proxy_model.n_components,:]#analogous to TruncatedSVD().components_ Or primal_model.components_ Or Vh component from randomised SVD
        
        #Sigma = s[:self.proxy_model.num_components]
        X_transformed = u[:,:self.n_components] * s[:self.n_components]
        #X_transformed = u[:,:self.proxy_model.n_components] * s[:self.proxy_model.n_components]
        
        self.singular_values_ = s[:self.n_components]
        #self.proxy_model.singular_values_ = s[:self.proxy_model.n_components]# Store the n_components singular values
        
        return X_transformed

* registered methods so far:

In [5]:
from mlsquare.base import registry
registry.data

{('sklearn',
  'xyz'): {'default': [<mlsquare.architectures.sklearn.SVD_1 at 0x7fe8de47e9e8>,
   mlsquare.adapters.sklearn.SklearnKerasDecompose_1]},
 ('sklearn',
  'TruncatedSVD'): {'default': [<__main__.SVD at 0x7fe8ece73630>,
   mlsquare.adapters.sklearn.SklearnKerasDecompose]},
 ('sklearn',
  'LogisticRegression'): {'default': [<mlsquare.architectures.sklearn.LogisticRegression at 0x7fe926d15048>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'LinearRegression'): {'default': [<mlsquare.architectures.sklearn.LinearRegression at 0x7fe926d15208>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Ridge'): {'default': [<mlsquare.architectures.sklearn.Ridge at 0x7fe926d153c8>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Lasso'): {'default': [<mlsquare.architectures.sklearn.Lasso at 0x7fe926d15588>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'ElasticNet'): {'default': [<mlsquare.architectures.s

(**Once the new model is registered & corresponding adapter is defined in mlsquare framework.**)
#### User Interaction with `dope` with sklearn SVD preference & intent to utilise underlying TF SVD 

    

    1. a) User instantiates a primal model `sklearn.decomposition.TruncatedSVD` with args --`n_components` as number of required singular components.
    b) User loads the data & proceed with necessary data preparation steps 
    
    
    2. Now, import `dope` from mlsquare & `dope` the primal model by passing primal model to dope function. The `dope` function equips above primal model with standard sklearn methods--`fit, fit_transform, save, explain.`
    
    3.  Carry on with usual sklearn SVD methods; Try out sklearn 
    methods -- `.fit( )` & `.fit_transform( )` with the doped model.

#### 1.a Instantiate primal module
* n_components: 10 (number of reduced dimensions)

In [6]:
from sklearn.decomposition import TruncatedSVD

primal = TruncatedSVD(n_components=10)

In [7]:
primal.get_params()

{'algorithm': 'randomized',
 'n_components': 10,
 'n_iter': 5,
 'random_state': None,
 'tol': 0.0}

#### 1.b Following are data preparation steps required to instantiate a svd model
* Also evaluating the regression results at various stages with varying dimensions.

In [8]:
import os
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.preprocessing import LabelEncoder


import pandas as pd
reg= linear_model.LinearRegression()

boston =load_boston()
df_x= pd.DataFrame(boston.data, columns= boston.feature_names)
lbe= LabelEncoder()
df_x = df_x.apply(lambda x: lbe.fit_transform(x))#df_x[col]))
df_y= df_y= pd.DataFrame(boston.target)


xtrain, xtest, ytrain, ytest = train_test_split(df_x, df_y, test_size=0.2)
print(xtrain.shape, xtest.shape)

(404, 13) (102, 13)


In [9]:
df_x.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0,3,19,0,51,320,172,297,0,34,9,356,53
1,23,0,56,0,36,279,225,333,1,11,23,356,161
2,22,0,56,0,36,400,159,333,1,11,23,271,28
3,32,0,16,0,33,383,112,361,2,5,31,311,6
4,110,0,16,0,33,395,139,361,2,5,31,356,64


* Validating results with full dimensionality.

In [10]:
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

0.7565245838025516


* Validating results with reduced dimensionality through primal model.

In [11]:
skl_truncated_x = primal.fit(df_x).transform(df_x)

xtrain, xtest, ytrain, ytest = train_test_split(skl_truncated_x, df_y, test_size=0.2)
print('sklearn_svd truncated dims:', skl_truncated_x.shape)
reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

sklearn_svd truncated dims: (506, 10)
0.7834073266399079


#### 2. dope the model to obtain keras svd

In [7]:
from mlsquare import dope

model = dope(primal)# adapter(proxy_model=proxy_model, primal_model=primal)

Transpiling your model to it's Deep Neural Network equivalent...


In [13]:
print('proxy model object from registry:\n', model.proxy_model)

proxy model object from registry:
 <__main__.SVD object at 0x7fe8ece73630>


#### 3. Try out sklearn methods-- `.fit( )` & `.fit_transform( )` to obtain reduced dimensionality.

In [14]:
a= np.random.randn(5,4)
print('input arr a:\n',a,'\n\nshape of a:', a.shape, '\n\ndtype of a:', a.dtype)

input arr a:
 [[-1.04549857  0.08445003  0.46343894 -0.08542798]
 [ 0.1164637   0.14950925  0.46024374  0.41852031]
 [ 0.53976946  0.43226543  0.11751776  1.2850961 ]
 [ 1.43346079 -0.41787555 -0.85370613  0.1358221 ]
 [-1.7253289  -0.05526024 -0.63811273 -1.06074793]] 

shape of a: (5, 4) 

dtype of a: float64


In [15]:
tf_truncated_x= model.fit_transform(a)#This shouldn't work, since 4(i.e, a.shape[1]) < 10(i.e., n_components)
tf_truncated_x

array([[ 1.765971  ,  0.07249544, -1.10125291,  0.59645285],
       [ 1.97173184, -1.00333899, -0.06425587, -0.67453579],
       [-1.16614764,  1.03002233, -1.16152227, -0.52016207],
       [-2.12782336, -1.44799052, -0.36044253,  0.15373504],
       [ 0.04632845, -0.639334  , -1.07900081, -0.05999566]])

In [16]:
tf_truncated_x.shape

(5, 4)

In [17]:
tf_truncated_x= model.fit_transform(a)#Prints Valueerror; After adding a conditional
tf_truncated_x

ValueError: n_components must be < n_features; got 10 >= 4

* Similarly with sklearn's `boston_dataset` from `1.b` above

In [13]:
inp2= np.array(df_x.values, dtype= np.float64)

In [19]:
dope_truncated_x= model.fit_transform(inp2)
dope_truncated_x

array([[ 471.04962214,  330.53051303,    8.67494127, ...,  -10.20964288,
         -11.40944565,   26.75673524],
       [ 545.473621  ,  266.09903932,   78.12796733, ...,   19.84120668,
          -4.45406447,   -5.85600368],
       [ 477.92571461,  357.67984616,  -89.8250494 , ...,   28.90475808,
          -3.75504454,   -3.6268214 ],
       ...,
       [ 533.89491136,  184.50034054, -103.19963966, ...,   29.29001292,
          13.7427487 ,   -3.59156992],
       [ 545.86519119,  132.99450123, -118.31830091, ...,   28.33873663,
          12.90476995,   -3.26489671],
       [ 463.39533327,  126.36114053,   96.81230318, ...,   29.68789887,
           7.78444792,    4.41248029]])

In [20]:
dope_truncated_x.shape
#dimensionality reduced to n_components using tf.linalg.svd

(506, 10)

* Validating results with reduced dimensionality through doped model & ascertaining approximately faithful results through underlying TF method.

In [21]:
#truncated_x= model.fit(df_x).fit_transform(df_x)
xtrain, xtest, ytrain, ytest = train_test_split(dope_truncated_x, df_y, test_size=0.2)

print('doped_svd truncated dims:', dope_truncated_x.shape)

reg= linear_model.LinearRegression()
reg.fit(xtrain, ytrain)
print(reg.score(xtest, ytest))

doped_svd truncated dims: (506, 10)
0.7153780768050211


_________________

* Arranging matrix tranformation operations into architecture
* Utilising existing `SklearnKerasRegressor` methods
* Restraining `SklearnKerasRegressor`'s standard methods

* Registering the model

In [None]:
from abc import abstractmethod
import tensorflow as tf
import pandas

class DimensionalityReductionModel:
    @abstractmethod
    def fit(self, X, y= None):
        """Needs Implementation in sub classes"""
    @abstractmethod
    def fit_transform(self, X, y=None):
        """Needs Implementation in sub classes"""


@registry.register
class SVD(DimensionalityReductionModel, GeneralizedLinearModel):
    def __init__(self):
        self.adapter = SklearnKerasRegressor#SklearnKerasDecompose
        self.module_name = 'sklearn' 
        self.name = 'TruncatedSVD'
        self.version = 'default'
        model_params = {'full_matrices': False,
                       'compute_uv': True,
                      'name':None}

        self.set_params(params=model_params, set_by='model_init')
    def fit(self, X, y=None, **kwargs):
        self.fit_transform(X)
        return self
    def fit_transform(self, X, y=None,**kwargs):
        kwargs.setdefault('full_matrices', False)
        kwargs.setdefault('compute_uv', True)
        kwargs.setdefault('name', None)
        
        X = np.array(X, dtype= np.float32 if str(X.values.dtype)=='float32' else np.float64) if isinstance(X, pandas.core.frame.DataFrame) else np.array(X, dtype= np.float32 if str(X.dtype)=='float32' else np.float64)#changing to recommended dtype, accomodating dataframe & numpy array

        #X = np.array(X)
        #y = np.array(y)
        
        n_components= self.primal.n_components#using primal attributes passed from adapter
        n_features = X.shape[1]
        if n_components>= n_features:
                raise ValueError("n_components must be < n_features;"
                                 " got %d >= %d" % (n_components, n_features))
                
        sess= tf.Session()#for TF  1.13
        s,u,v= sess.run(tf.linalg.svd(X, full_matrices=kwargs['full_matrices'], compute_uv=kwargs['compute_uv']))#for TF  1.13
        
        self.components_= v[:n_components,:]
        X_transformed = u[:,:n_components] * s[:n_components]
        
        self.singular_values_ = s[:n_components]
        return X_transformed


* Using existing adapter `SklearnKerasRegressor` for model with minor modifications

In [None]:
from mlsquare.architectures import sklearn
class SklearnKerasRegressor():
    
    def __init__(self, proxy_model, primal_model, **kwargs):
        self.primal_model = primal_model
        self.proxy_model = proxy_model
        self.params = None

    def fit(self, X, y, **kwargs):
        self.proxy_model.X = X
        self.proxy_model.y = y
        self.proxy_model.primal = self.primal_model
        kwargs.setdefault('verbose', 0)
        kwargs.setdefault('epochs', 250)
        kwargs.setdefault('batch_size', 30)
        kwargs.setdefault('params', self.params)
        self.params = kwargs['params']

        if self.params != None: ## Validate implementation with different types of tune input
            if not isinstance(self.params, dict):
                raise TypeError("Params should be of type 'dict'")
            self.params = _parse_params(self.params, return_as='flat')
            self.proxy_model.update_params(self.params)

        #if self.proxy_model.__class__.__name in ['SVD', 'PCA']:
        if isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):#Triggers when adapter is being used for matrix decomposition apis
            return self.proxy_model.fit_transform(X)
        
        primal_model = self.primal_model
        primal_model.fit(X, y)
        y_pred = primal_model.predict(X)
        primal_data = {
            'y_pred': y_pred,
            'model_name': primal_model.__class__.__name__
        }

        self.final_model = get_best_model(X, y, proxy_model=self.proxy_model, primal_data=primal_data,
                                          epochs=kwargs['epochs'], batch_size=kwargs['batch_size'],
                                          verbose=kwargs['verbose'])
        return self.final_model  # Not necessary.

    def score(self, X, y, **kwargs):
        if isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            raise AttributeError("'SklearnKerasRegressor' object has no attribute 'score'")

        score = self.final_model.evaluate(X, y, **kwargs)
        return score

    def predict(self, X):
        '''
        Pending:
        1) Write a 'filter_sk_params' function(check keras_regressor wrapper) if necessary.
        2) Data checks and data conversions
        '''
        if isinstance(self.proxy_model, (sklearn.DimensionalityReductionModel)):
            raise AttributeError("'SklearnKerasRegressor' object has no attribute 'predict'")
            
        pred = self.final_model.predict(X)
        return pred

    def save(self, filename=None):
        if filename == None:
            raise ValueError(
                'Name Error: to save the model you need to specify the filename')

        pickle.dump(self.final_model, open(filename + '.pkl', 'wb'))

        self.final_model.save(filename + '.h5')

        onnx_model = onnxmltools.convert_keras(self.final_model)
        onnxmltools.utils.save_model(onnx_model, filename + '.onnx')

    def explain(self, **kwargs):
        # @param: SHAP or interpret
        print('Coming soon...')
        return self.final_model.summary()

* Loading dataframe for test

In [1]:
import os
from sklearn.datasets import load_boston
from sklearn.preprocessing import LabelEncoder


import pandas as pd

boston =load_boston()
df_x= pd.DataFrame(boston.data, columns= boston.feature_names)
lbe= LabelEncoder()
df_x = df_x.apply(lambda x: lbe.fit_transform(x))#df_x[col]))
df_y= df_y= pd.DataFrame(boston.target)

print('original df_x dims:', df_x.shape)

original df_x dims: (506, 13)


* Initiating `primal model`

In [2]:
from sklearn.decomposition import TruncatedSVD

primal = TruncatedSVD(n_components=10)

In [3]:
skl_truncated_x = primal.fit(df_x).transform(df_x)

print('sklearn_svd truncated dims:', skl_truncated_x.shape)

sklearn_svd truncated dims: (506, 10)


* Trying the Base_model/Parent class methods from `sklearn svd` & other similar decomposition models

In [4]:
primal.get_params()#inherited from TruncatedSVD.BaseEstimator

{'algorithm': 'randomized',
 'n_components': 10,
 'n_iter': 5,
 'random_state': None,
 'tol': 0.0}

In [5]:
primal.__repr__#inherited from TruncatedSVD.BaseEstimator

<bound method BaseEstimator.__repr__ of TruncatedSVD(algorithm='randomized', n_components=10, n_iter=5,
             random_state=None, tol=0.0)>

In [6]:
primal.__getstate__#inherited from TruncatedSVD.BaseEstimator

<bound method BaseEstimator.__getstate__ of TruncatedSVD(algorithm='randomized', n_components=10, n_iter=5,
             random_state=None, tol=0.0)>

In [7]:
#from sklearn.decomposition import PCA

#pca = PCA(n_components=10)
#pca

In [8]:
#pca_transformed_x = pca.fit_transform(df_x)# PCA's APIs
#pca_transformed_x.shape#equivalent of left singular matrix from svd

* Doping the primal model to use `tf SVD`

In [10]:
from mlsquare.base import registry
registry.data

{('sklearn',
  'TruncatedSVD'): {'default': [<mlsquare.architectures.sklearn.SVD at 0x7f0b7413ceb8>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'LogisticRegression'): {'default': [<mlsquare.architectures.sklearn.LogisticRegression at 0x7f0bb57d8d30>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'LinearRegression'): {'default': [<mlsquare.architectures.sklearn.LinearRegression at 0x7f0bb57d8ef0>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Ridge'): {'default': [<mlsquare.architectures.sklearn.Ridge at 0x7f0bb57e90f0>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Lasso'): {'default': [<mlsquare.architectures.sklearn.Lasso at 0x7f0bb57e92b0>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'ElasticNet'): {'default': [<mlsquare.architectures.sklearn.ElasticNet at 0x7f0bb57e9470>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'LinearSVC'): {'defau

In [11]:
mod,ada= registry[('sklearn', 'TruncatedSVD')]['default']
print('proxy model:', mod,'\ncorrespnding adapter:', ada)

proxy model: <mlsquare.architectures.sklearn.SVD object at 0x7f0b7413ceb8> 
correspnding adapter: <class 'mlsquare.adapters.sklearn.SklearnKerasRegressor'>


In [12]:
model = ada(primal_model=primal, proxy_model=mod)
model#SklearnKerasRegressor class object

<mlsquare.adapters.sklearn.SklearnKerasRegressor at 0x7f0bd035b2b0>

In [13]:
??model

In [14]:
import numpy as np
inp2= np.array(df_x.values, dtype= np.float64)

* Fitting the doped model with -- Dataframe inputs Or Numpy array inputs

In [15]:
#trans_input= mode1.fit(df_x, df_y) #takes in dataframe input
trans_input = model.fit(inp2, df_y) #takes in numpy array input

print('transformed input shape:', trans_input.shape)

transformed input shape: (506, 10)


* Trying how sklearn SVD deals with anamoly methods--`.score()`, `.predict()` and implement similar error flagging for undefined proxy_model apis
    * Chances are a user presuming TrucnatedSVD as a usual model will try out above methods

In [16]:
primal.predict(df_x)#error from sklearn_svd's undefined  api
#primal is an sklearn object

AttributeError: 'TruncatedSVD' object has no attribute 'predict'

In [26]:
model.score(inp2, df_y)#before any explicit error flagging
#model is a adapter object

AttributeError: 'SklearnKerasRegressor' object has no attribute 'final_model'

In [27]:
model.predict(inp2)#before any explicit error flagging
#model is a adapter object

AttributeError: 'SklearnKerasRegressor' object has no attribute 'final_model'

* Un-implemented methods flag an `AttributeError`

In [17]:
model.predict(inp2)#After flagging 
#model is a adapter object

AttributeError: 'SklearnKerasRegressor' object has no attribute 'predict'

In [18]:
model.score(inp2, df_y)#After flagging, an un-implemented methods will anyways shoot an `AttributeError`
#model is a adapter object

AttributeError: 'SklearnKerasRegressor' object has no attribute 'score'