# Retail SVD Recommender

In this example we use SQL to calculate dot products for a drug recommendation.

In [1]:
import pandas as pd

# Local libraries should automatically reload
%load_ext autoreload
%autoreload 1

### Get actual purchases

In [2]:
%aimport gpudb_df

_cnxn = gpudb_df.get_odbc()

CUSTOMER_ID = 12358

_sql = """
select 
    rm.STOCK_CODE,
    rp.DESCRIPTION,
    rm.TXN_COUNT
from RETAIL_MATRIX rm
join RETAIL_PROD rp
on rp.STOCK_CODE = rm.STOCK_CODE
where rm.CUSTOMER_ID = {}
order by rm.CUSTOMER_ID, rm.STOCK_CODE
""".format(CUSTOMER_ID)

_actual_df = pd.read_sql(_sql, _cnxn)
_cnxn.close()

_actual_df = _actual_df.set_index('STOCK_CODE')
_actual_df

Connected to GPUdb ODBC Server (6.2.0.9.20180622232941)


DatabaseError: Execution failed on sql '
select 
    rm.STOCK_CODE,
    rp.DESCRIPTION,
    rm.TXN_COUNT
from RETAIL_MATRIX rm
join RETAIL_PROD rp
on rp.STOCK_CODE = rm.STOCK_CODE
where rm.CUSTOMER_ID = 12358
order by rm.CUSTOMER_ID, rm.STOCK_CODE
': ('42S02', '[42S02] [Kinetica][SQLEngine] (31740) Table or view not found: KINETICA..RETAIL_MATRIX (31740) (SQLExecDirectW)')

### Get approximated purchases

We get approximated purchases with:

$
\mathbf{\tilde{a}}_i = \mathbf{u}_i \Sigma \mathbf{V}^T
$

This reduces to a set of dot products.

In [5]:
%aimport gpudb_df

_cnxn = gpudb_df.get_odbc()

_sql = """
select top 10 
    iv.STOCK_CODE as STOCK_CODE,
    rp.DESCRIPTION,
    (cv.U0 * iv.V0)
    + (cv.U1 * iv.V1) 
    + (cv.U2 * iv.V2) 
    + (cv.U3 * iv.V3) 
    + (cv.U4 * iv.V4) 
    + (cv.U5 * iv.V5) 
    + (cv.U6 * iv.V6) 
    + (cv.U7 * iv.V7) 
    + (cv.U8 * iv.V8) 
    + (cv.U9 * iv.V9) 
    as ITEM_RATING
from RETAIL_CUST_VEC as cv, RETAIL_ITEM_VEC as iv
join RETAIL_PROD rp
    on rp.STOCK_CODE = iv.STOCK_CODE
where cv.CUSTOMER_ID = {}
order by ITEM_RATING desc
""".format(CUSTOMER_ID)

_approx_df = pd.read_sql(_sql, _cnxn)
_cnxn.close()

_approx_df = _approx_df.set_index('STOCK_CODE')
_approx_df

Connected to GPUdb ODBC Server (6.1.0.9.20180315110536)


Unnamed: 0_level_0,DESCRIPTION,ITEM_RATING
STOCK_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1
POST,POSTAGE,0.641927
15056N,EDWARDIAN PARASOL NATURAL,0.311275
21731,RED TOADSTOOL LED NIGHT LIGHT,0.308938
15056BL,EDWARDIAN PARASOL BLACK,0.274233
22326,ROUND SNACK BOXES SET OF4 WOODLAND,0.25011
21166,COOK WITH WINE METAL SIGN,0.25009
21175,GIN AND TONIC DIET METAL SIGN,0.232669
22423,REGENCY CAKESTAND 3 TIER,0.204413
22629,SPACEBOY LUNCH BOX,0.194178
21232,STRAWBERRY CERAMIC TRINKET POT,0.193377


### Get recommended purchases

The recommendation is:

$
\mathbf{\tilde{r}}_i = \mathbf{\tilde{a}}_i - \mathbf{a}_i
$

In [7]:
_recommended_df = _approx_df.loc[set(_approx_df.index) - set(_actual_df.index)]
_recommended_df.sort_values('ITEM_RATING', ascending=False)

Unnamed: 0_level_0,DESCRIPTION,ITEM_RATING
STOCK_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1
21731,RED TOADSTOOL LED NIGHT LIGHT,0.308938
22326,ROUND SNACK BOXES SET OF4 WOODLAND,0.25011
21166,COOK WITH WINE METAL SIGN,0.25009
21175,GIN AND TONIC DIET METAL SIGN,0.232669
22423,REGENCY CAKESTAND 3 TIER,0.204413
22629,SPACEBOY LUNCH BOX,0.194178
