### Contributing SVD to mlsquare

**Fork mlsquare repository to your account and clone.**

**Or just Clone https://github.com/mlsquare/mlsquare.git**

* Navigate to `src/mlsquare/architectures` folder, Where the code for mapping Logistic regression  to DNN resides.
* The code for mapping primal model(SVD) to corresponding dnn equivalent is saved as `surprise_svd.py` file

In [1]:
import os

In [2]:
os.getcwd()

'/home/kev/Desktop/mlsquare/src'

In [3]:
from mlsquare.base import BaseModel
import tensorflow as tf

#from ..adapters.AdaptDeepctr import DeepCtr
from mlsquare.adapters.AdaptDeepctr import DeepCtr
#from tensorflow.keras.layers import Dense

from deepctr.inputs import build_input_features, input_from_feature_columns
from deepctr.inputs import SparseFeat
from deepctr.layers.interaction import FM
from deepctr.layers.utils import concat_fun

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-11-13 20:27:14,399	INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-11-13_20-27-14_12204/logs.
2019-11-13 20:27:14,530	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:27833 to respond...
2019-11-13 20:27:14,668	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:42992 to respond...
2019-11-13 20:27:14,672	INFO services.py:760 -- Starting Redis shard with 20.0 GB max memory.
2019-11-13 20:27:14,694	INFO services.py:1384 -- Starting the Plasma object store with 1.0 GB memory using /dev/shm.


In [4]:
??BaseModel

* Requires a separate adapter for surpriselib's SVD or can work with sklearn methods??

In [5]:
#from ..base import registry, BaseModel

from mlsquare.base import registry, BaseModel

#from mlsquare.adapters.sklearn import #SurpriselibModels

@registry.register
class SVD(BaseModel):
    def __init__(self):
        self.adapter = DeepCtr
        self.module_name = 'deepctr'
        self.name = 'SVD'
        self.version = 'default'
        #feature_cols= feature_columns
    def create_model(self,feature_columns, **kwargs):
        """Instantiates the Neural Factorization Machine architecture.

        :param feature_columns: An iterable containing all the sparse features used by model.
        :param num_factors: number of units in latent representation layer.
        :param l2_reg_embedding: float. L2 regularizer strength applied to embedding vector
        :param l2_reg_linear: float. L2 regularizer strength applied to linear part.
        :param l2_reg_dnn: float . L2 regularizer strength applied to DNN
        :param init_std: float,to use as the initialize std of embedding vector
        :param seed: integer ,to use as random seed.
        :param biout_dropout: When not ``None``, the probability we will drop out the output of BiInteractionPooling Layer.
        :param dnn_dropout: float in [0,1), the probability we will drop out a given DNN coordinate.
        :param act_func: Activation function to use at prediction layer.
        :param task: str, ``"binary"`` for  'binary_crossentropy' loss or  ``"multiclass"`` for 'categorical_crossentropy' loss
        :return: A Keras model instance.
        """
    #ensure that the `feature columns` is a list of `DenseFeat` Instances otherwise the model resulting here will have an Input shape (None,1)
        kwargs.setdefault('embedding_size', 100)
        kwargs.setdefault('l2_reg_embedding',1e-5)
        kwargs.setdefault('l2_reg_linear', 1e-5)
        kwargs.setdefault('l2_reg_dnn', 0)
        kwargs.setdefault('init_std',0.0001)
        kwargs.setdefault('seed', 1024)
        kwargs.setdefault('bi_dropout', 0)
        kwargs.setdefault('dnn_dropout', 0)
        
    
        features = build_input_features(feature_columns)

        input_layers = list(features.values())

        sparse_embedding_list, _ = input_from_feature_columns(features,feature_columns, kwargs['embedding_size'], kwargs['l2_reg_embedding'], kwargs['init_std'], kwargs['seed'])
    
        fm_input = concat_fun(sparse_embedding_list, axis=1)
        fm_logit = FM()(fm_input)

    #if task=='binary':
    #    act_func = 'sigmoid'
    #    n_last = 1
    #elif task=='multiclass':
    #    act_func= 'softmax'
    #    n_last = 5

    #predictions = Dense(n_last, activation=act_func)(merge_layer)
    
        model = tf.keras.models.Model(inputs=input_layers, outputs=fm_logit)
    
        return model
    
    
    def set_params(self, **kwargs):
        pass

    def get_params(self):
        pass#return self._model_params

    def update_params(self, params):
        pass#self._model_params.update(params)

    def adapter(self):
        return self._adapter

* Defining interaction between user and deepctr-interface-SVD (provided from inside the mlsquare lib); **Once the new model is registered in mlsquare.**
    1. a) User instantiates a primal model svd (DeepFM imported from deepctr) explicitly as a module `mlsquare.models.svd`.
    
    b) User loads the model object & adapter from `mlsquare.base.regsitry` and then instantiate with required arguments.
    
    2. Thereafter, `Dope` equips above primal model with standard methods--fit, predict, score, save, explain.

* Following are data preparation steps required to instantiate a svd model

In [6]:
import os
import pandas as pd

from sklearn.preprocessing import LabelEncoder

from deepctr.inputs import SparseFeat

data_path = os.path.expanduser('u.data')

df= pd.read_csv(data_path, sep='\t',names= 'user_id	movie_id	rating	timestamp'.split('	'))#, header=None)#used for DeepCTR

sparse_features = ["movie_id", "user_id"]
y= ['rating']

#This counts unique values & encodes existing value to new lable in progression
for feat in sparse_features:
        lbe = LabelEncoder()
        df[feat] = lbe.fit_transform(df[feat])
    ##unique features for each sparse field
    
feature_columns = [SparseFeat(feat, df[feat].nunique()) for feat in sparse_features]
print(feature_columns)

[SparseFeat:movie_id, SparseFeat:user_id]


In [7]:
from sklearn.model_selection import train_test_split

trainset, testset= train_test_split(df, test_size=0.2)

train_model_input = [trainset[name].values for name in sparse_features]#includes values from only data[user_id], data[movie_id]
train_y= trainset[y].values

test_model_input = [testset[name].values for name in sparse_features]#includes values from only data[user_id], data[movie_id]
test_y= testset[y].values

**1.a User instantiates a primal model svd (DeepFM imported from deepctr) explicitly as a module `mlsquare.models.svd`. ---?**

In [8]:
#from mlsquare.models import svd

#svd_mod = svd.SVD(feature_columns, task='multiclass')

**1.b User loads the model object & adapter from `mlsquare.base.regsitry` and then instantiate with required arguments.** -- For example `registry[('sklearn', 'LogisticRegression')]`

In [9]:
from mlsquare.base import registry

In [10]:
registry.data

{('sklearn',
  'LogisticRegression'): {'default': [<mlsquare.architectures.sklearn.LogisticRegression at 0x7f229b2626a0>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'LinearRegression'): {'default': [<mlsquare.architectures.sklearn.LinearRegression at 0x7f22f03d05f8>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Ridge'): {'default': [<mlsquare.architectures.sklearn.Ridge at 0x7f22f03d07b8>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'Lasso'): {'default': [<mlsquare.architectures.sklearn.Lasso at 0x7f22f03d0978>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'ElasticNet'): {'default': [<mlsquare.architectures.sklearn.ElasticNet at 0x7f22f03d0b38>,
   mlsquare.adapters.sklearn.SklearnKerasRegressor]},
 ('sklearn',
  'LinearSVC'): {'default': [<mlsquare.architectures.sklearn.LinearSVC at 0x7f22f03d0cf8>,
   mlsquare.adapters.sklearn.SklearnKerasClassifier]},
 ('sklearn',
  'SVC'): {'default

In [11]:
proxy_model, adapter =registry[('sklearn', 'LogisticRegression')]['default']
proxy_model

<mlsquare.architectures.sklearn.LogisticRegression at 0x7f229b2626a0>

In [12]:
proxy_model.create_model

<bound method GeneralizedLinearModel.create_model of <mlsquare.architectures.sklearn.LogisticRegression object at 0x7f229b2626a0>>

* Codeblocks from adapters & tune.py

In [13]:
from sklearn.model_selection import train_test_split
from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
Y = iris.target

# Split the data in to test and train batches
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.60, random_state=0)

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

In [14]:
model.fit(x_train, y_train)
y_pred= model.predict(x_train)



In [15]:
model_params = {'layer_1': {'units': 1, ## Make key name private - '_layer'
                                    'l1': 0,
                                    'l2': 0,
                                    'activation': 'sigmoid'},
                        'optimizer': 'adam',
                        'loss': 'binary_crossentropy'
                        }
proxy_model.set_params(params=model_params, set_by='model_init')

In [16]:
X, y, y_pred= proxy_model.transform_data(x_train, y_train, y_pred)
proxy_model.X = X ##  abstract -> model_skeleton
proxy_model.y = y

In [17]:
proxy_model

<mlsquare.architectures.sklearn.LogisticRegression at 0x7f229b2626a0>

In [18]:
proxy_model.create_model()

Instructions for updating:
Colocations handled automatically by placer.


<keras.engine.sequential.Sequential at 0x7f22f0314470>

In [19]:
proxy_model.fit
#fit doesnt work as its a method of `sklearnKerasClassifier adapter`
#proxy model has to be passed into an adapter object -- adapter(proxy_model, primal_model) # line 66 dope function

AttributeError: 'LogisticRegression' object has no attribute 'fit'

* Possible interactions:
    * User invokes the registery.register to load SVD model object and corresponding adapter.
    * User then invokes create_model() on obtained svd object to create a primal model by providing feature columns, and thus model is initiated as a primal model.
    * The primal model is then passed into adapter as proxy model directly(Or into dope to enable access to methods--fit, save & explain)

####  1. User invokes the registery.register to load SVD model object and corresponding adapter.

In [20]:
svd_model, svd_adapter =registry[('deepctr', 'SVD')]['default']

In [21]:
print(svd_model, '\n', svd_adapter)

<__main__.SVD object at 0x7f22f032a898> 
 <class 'mlsquare.adapters.AdaptDeepctr.DeepCtr'>


#### 2. User then invokes create_model() on obtained svd object to create a primal model by providing feature columns, and thus model is initiated as a primal model.

In [22]:
model = svd_model.create_model(feature_columns)#primal

Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [23]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
movie_id (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
user_id (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
sparse_emb_movie_id (Embedding) (None, 1, 100)       168200      movie_id[0][0]                   
__________________________________________________________________________________________________
sparse_emb_user_id (Embedding)  (None, 1, 100)       94300       user_id[0][0]                    
__________________________________________________________________________________________________
no_mask (N

#### 3. The primal model is then passed into adapter as proxy model directly(Or into dope to enable access to methods--fit, save & explain)

In [None]:
model = svd_adapter(proxy_model=proxy_model,)

* Faulty implementation as of now, the `model.create_model()` is been originally called inside `tune.py` (line#29), despite being used explicitly to define model structure in step 2 above.
* Either the model definition has to be done with existing arrangement,  with means of creating `SparseFeat` objects(Cell 6 &7) and passing `feature columns`(cell 22) within the `AdaptDeepctr` itself; Or optimizer code needs to modified.

* Or Pass it onto dope along with adapter by making corresponding changes.

In [None]:
from deepctr
from mlsquare import dope

#m= dope(proxy_model= model, adapter=)