## 用户推荐算法

题中给出数据为用户半年内的每一天里，在每个商品上的浏览、收藏、推荐和购买数量，要求每月月初给头部的商家推荐一定数量的用户。

假设共有5类商品（A,B,C,D,E,F)，我们可以针对其在商品上的浏览、收藏、推荐和购买数量计算其喜好程度，例如用户X对A类商品的浏览、收藏、推荐和购买数量为（4,3,2,1），其中将浏览量权重设为1，收藏权重为2，推荐为3，购买为5，则X对A类喜好程度为4x1+3x2+2x3+1x5=21。通过浏览、收藏、推荐和购买数量计算出用户的喜好程度，然后根据每个用户对不同商品种类的喜好程度进行建模分析可以有效防止模型过拟合，然后根据分析结果推荐给相对应的商家。

### 数据生成
生成10x1000组用户半年（180天）内的对10类不同商品喜好程度的数据，形状为（10，1000，180,10），其中因为是每月月初给商家进行用户推荐，所以利用上个月30天的数据作为一个训练数据，可以将10x1000组用户半年（180天）内的对10类不同商品喜好程度的数据，划分为形状为（60000,30,10）数据。

### 模型选择
因为用户对于物品的喜好程度拥有一定的时效性，时间越接近的越能够代表用户近期的需求，所以我们可以利用RNN建立时序模型，将形状（30,10）30天10类物品喜好程度的数据每次向RNN模型输入一天的数据，连续输入30天，然后计算结果，并按照结果推荐给商家，例如如果结果为（0,1,0,0,0,0,0,0,0,0）就将该用户推荐给第二类商品卖家。



In [51]:
import numpy as np
import tensorflow as tf


### 生成训练数据



In [52]:
a= np.random.randint(40,size=(1000,120,10))
d=np.random.randint(0,6,size=[10,1000,180,10]) 
for i in range(10):
    b=np.zeros((1000,120,10),dtype=np.int32)
    b[:,:,i]=a[:,:,i]
    d[i,:,:120,:]+=b

In [53]:
import random
rand_temp =[i for i in range(180)] 
random.shuffle(rand_temp) 
arrdata=np.zeros_like(d)
for i in range(180):
    arrdata[:,:,i,:]=d[:,:,rand_temp[i],:]


In [54]:
print('The shape of generated data: ',np.shape(arrdata))
arrdata=np.reshape(arrdata,[60000,30,10])
print('Change the shape of generated data: ',np.shape(arrdata))

The shape of generated data:  (10, 1000, 180, 10)
Change the shape of generated data:  (60000, 30, 10)



### 训练数据打标并打乱顺序

In [55]:
arr1=np.zeros((30,10))  
arr2=np.zeros((60000,10))

for i in range(60000):
    for i1 in range(30):
        arr1[i1]=arrdata[i][i1]*(1.1**i1)
    a=arr1.sum(axis=0)
    c=np.where(a==np.max(a))
    arr2[i][c]=1
rand_temp =[i for i in range(60000)] 
random.shuffle(rand_temp) 
data=np.zeros_like(arrdata)
label=np.zeros_like(arr2)
for i in range(60000):
    data[i,:,:]=arrdata[rand_temp[i],:,:]
    label[i]=arr2[rand_temp[i]]
    


### 单个数据及其标签

In [56]:
print('The shape of data: ',np.shape(data))
print('The shape of labels: ',np.shape(label))
print('data:\n',data[2])
print('laebel:',label[2])

The shape of data:  (60000, 30, 10)
The shape of labels:  (60000, 10)
data:
 [[ 5  1  1  4  2  5  0  5  2  0]
 [ 1  3  0 43  5  2  2  0  4  0]
 [ 2  1  2  0  0  2  5  5  1  2]
 [ 2  3  2 23  2  4  1  3  5  2]
 [ 2  5  4 34  3  0  3  1  0  2]
 [ 5  0  3 19  0  1  0  5  1  1]
 [ 3  4  0  0  1  4  1  1  4  4]
 [ 1  4  3  0  0  1  0  1  3  4]
 [ 5  4  2  0  3  2  2  4  2  2]
 [ 1  4  2 27  5  1  0  4  3  5]
 [ 2  2  2  7  5  3  0  0  5  1]
 [ 3  0  0  9  1  3  2  1  5  3]
 [ 4  5  1  3  3  2  1  2  4  5]
 [ 2  2  3 38  0  4  5  3  5  1]
 [ 3  2  2  4  3  2  3  4  3  0]
 [ 1  2  2 40  4  4  5  2  0  2]
 [ 5  1  5 27  4  5  3  0  2  2]
 [ 2  4  3  1  5  2  0  3  0  1]
 [ 2  4  1  5  5  0  4  4  4  5]
 [ 1  0  0 24  3  3  5  1  0  3]
 [ 2  4  2 35  2  3  2  5  2  4]
 [ 2  5  1 17  0  1  3  4  3  2]
 [ 4  5  4  4  0  2  4  1  0  4]
 [ 5  5  3 13  1  2  4  2  1  5]
 [ 1  5  0  9  5  2  5  1  5  1]
 [ 2  0  1  5  4  3  4  0  5  2]
 [ 5  0  1 15  5  0  1  2  5  1]
 [ 1  3  1  3  1  1  2  3  4  3]

### 将数据分为训练集、验证集、测试集

In [57]:
split_frac = 0.8
split_idx = int(len(data)*0.8)
train_x, val_x = data[:split_idx], data[split_idx:]
train_y, val_y = label[:split_idx], label[split_idx:]

test_idx = int(len(val_x)*0.5)
val_x, test_x = val_x[:test_idx], val_x[test_idx:]
val_y, test_y = val_y[:test_idx], val_y[test_idx:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(48000, 30, 10) 
Validation set: 	(6000, 30, 10) 
Test set: 		(6000, 30, 10)


### 批量化处理

In [58]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]


### 建立训练模型

In [59]:
BATCH_SIZE = 64
TIME_STEP = 30          
INPUT_SIZE = 10         
LR = 0.01               
epochs=5
lstm_layers=2

x = tf.placeholder(tf.float32, [None, TIME_STEP , INPUT_SIZE],name='x')      
y = tf.placeholder(tf.int32, [None, 10],name='y')
keep_prob=tf.placeholder(tf.float32,name='keep_prob')

with tf.variable_scope('var_scope'):
    rnn_cell = tf.contrib.rnn.BasicLSTMCell(num_units=64)
    drop = tf.contrib.rnn.DropoutWrapper(rnn_cell, output_keep_prob=keep_prob)
    cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
    outputs, (h_c, h_n) = tf.nn.dynamic_rnn(cell,x,initial_state=None,  dtype=tf.float32,time_major=False)

    output = tf.layers.dense(outputs[:, -1, :], 10,name='logits')              
    loss = tf.losses.softmax_cross_entropy(onehot_labels=y, logits=output)           
    train_op = tf.train.AdamOptimizer(LR).minimize(loss)

    correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(output, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32),name='accuracy')
sess = tf.Session()
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) 
sess.run(init_op)     

for i in range(epochs):
    step=0
    for batch_x,batch_y in get_batches(train_x,train_y,batch_size=BATCH_SIZE):
        
        _, loss_ = sess.run([train_op, loss], {x: batch_x, y: batch_y,keep_prob:0.6})
        if step%100==0:
            accuracy_ = sess.run(accuracy, {x: val_x, y: val_y,keep_prob:1.0})
            print('train loss: %.4f' % loss_, '| validation accuracy: %.2f' % accuracy_)
        step+=1
print('-'*40)
accuracy_ = sess.run(accuracy, {x: test_x, y: test_y,keep_prob:1.0})
print('train loss: %.4f' % loss_, '| test accuracy: %.2f' % accuracy_)


train loss: 2.4033 | validation accuracy: 0.56
train loss: 0.0015 | validation accuracy: 1.00
train loss: 0.0004 | validation accuracy: 1.00
train loss: 0.0005 | validation accuracy: 1.00
train loss: 0.0003 | validation accuracy: 1.00
train loss: 0.0004 | validation accuracy: 1.00
train loss: 0.0001 | validation accuracy: 1.00
train loss: 0.0001 | validation accuracy: 1.00
train loss: 0.0002 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0001 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0000 | validation accuracy: 1.00
train loss: 0.0001 | validation accuracy: 1.00
train loss: 0

## 总结

因为生成的数据较为理想化所以训练时数据的损失较小，预测结果较高，在真实的数据情况下可以根据需要建立更为复杂的RNN模型，在输入输出层加入全连接层或者增加lstm层数，改变学习速率和epochs都可以改变RNN模型预测效果