## 实例：Autoencoder与聚类结合在预测用户偏好中的应用

### 项目背景：
- 电商数据，包含30w条数据，一共150种商品品类，1w个会员。特征只有用户的购买记录


### 项目需求：
- 希望根据用户的偏好进行分群，便于以后推荐和预测。

### 数据格式

In [2]:
cat test.txt 

id	goods_name	goods_amount
1	男士手袋	1882.0
2	淑女装	2491.0
2	女士手袋	345.0
4	基础内衣	328.0
5	商务正装	4985.0
5	时尚	969.0
5	女饰品	86.0
6	专业运动	399.0
6	童装（中大童)	2033.0
6	男士配件	38.0


### 项目步骤

#### 数据预处理
- 将数据全部转换为一个30w*150维度的矩阵，变成特征矩阵（此处为部分数据做演示）

In [3]:
#!/usr/bin/python
#coding:utf-8
#Author：Charlotte
import pandas as pd
import numpy as np
import time

#加载数据文件(你可以加载自己的文件，文件格式如上所示)
x=pd.read_table('test.txt',sep = "\t")

#去除NULL值
x.dropna()
a1=list(x.iloc[:,0])
a2=list(x.iloc[:,1])
a3=list(x.iloc[:,2])

#A是商品类别
dicta=dict(zip(a2,zip(a1,a3)))
A=list(dicta.keys())
#B是用户id
B=list(set(a1))

#创建商品类别字典
a = np.arange(len(A))
lista = list(a)
dict_class = dict(zip(A,lista))
#print dict_class

f=open('class.txt','w')
for k ,v in dict_class.items():
    f.write(str(k)+'\t'+str(v)+'\n')
f.close()

#计算运行时间
start=time.clock()

#创建大字典存储数据
dictall = {}
for i in xrange(len(a1)):
    if a1[i] in dictall.keys():
        value = dictall[a1[i]]
        j = dict_class[a2[i]]
        value[j] = a3[i]
        dictall[a1[i]]=value
    else:
        value = list(np.zeros(len(A)))
        j = dict_class[a2[i]]
        value[j] = a3[i]
        dictall[a1[i]]=value

#将字典转化为dataframe
dictall1 = pd.DataFrame(dictall)
dictall_matrix = dictall1.T
print dictall_matrix

end = time.clock()
print "赋值过程运行时间是:%f s"%(end-start)

      0       1      2       3      4       5      6       7      8     9
1   0.0  1882.0    0.0     0.0    0.0     0.0    0.0     0.0    0.0   0.0
2   0.0     0.0  345.0     0.0    0.0     0.0    0.0  2491.0    0.0   0.0
4   0.0     0.0    0.0     0.0    0.0     0.0    0.0     0.0  328.0   0.0
5  86.0     0.0    0.0     0.0    0.0  4985.0  969.0     0.0    0.0   0.0
6   0.0     0.0    0.0  2033.0  399.0     0.0    0.0     0.0    0.0  38.0
赋值过程运行时间是:0.010700 s


In [2]:
cat class.txt

专业运动	4
男士手袋	1
女士手袋	2
童装（中大童)	3
男士配件	9
基础内衣	8
时尚	6
淑女装	7
商务正装	5
女饰品	0


#### 用Autoencoder进行降维

In [5]:
#/usr/bin/python
#coding:utf-8

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing

class AutoEncoder():
    """ Auto Encoder  
    layer      1     2    ...    ...    L-1    L
      W        0     1    ...    ...    L-2
      B        0     1    ...    ...    L-2
      Z              0     1     ...    L-3    L-2
      A              0     1     ...    L-3    L-2
    """
    
    def __init__(self, X, Y, nNodes):
        # training samples
        self.X = X
        self.Y = Y
        # number of samples
        self.M = len(self.X)
        # layers of networks
        self.nLayers = len(nNodes)
        # nodes at layers
        self.nNodes = nNodes
        # parameters of networks
        self.W = list()
        self.B = list()
        self.dW = list()
        self.dB = list()
        self.A = list()
        self.Z = list()
        self.delta = list()
        for iLayer in range(self.nLayers - 1):
            self.W.append( np.random.rand(nNodes[iLayer]*nNodes[iLayer+1]).reshape(nNodes[iLayer],nNodes[iLayer+1]) ) 
            self.B.append( np.random.rand(nNodes[iLayer+1]) )
            self.dW.append( np.zeros([nNodes[iLayer], nNodes[iLayer+1]]) )
            self.dB.append( np.zeros(nNodes[iLayer+1]) )
            self.A.append( np.zeros(nNodes[iLayer+1]) )
            self.Z.append( np.zeros(nNodes[iLayer+1]) )
            self.delta.append( np.zeros(nNodes[iLayer+1]) )
            
        # value of cost function
        self.Jw = 0.0
        # active function (logistic function)
        self.sigmod = lambda z: 1.0 / (1.0 + np.exp(-z))
        # learning rate 1.2
        self.alpha = 2.5
        # steps of iteration 30000
        self.steps = 10000
        
    def BackPropAlgorithm(self):
        # clear values
        self.Jw -= self.Jw
        for iLayer in range(self.nLayers-1):
            self.dW[iLayer] -= self.dW[iLayer]
            self.dB[iLayer] -= self.dB[iLayer]
        # propagation (iteration over M samples)    
        for i in range(self.M):
            # Forward propagation
            for iLayer in range(self.nLayers - 1):
                if iLayer==0: # first layer
                    self.Z[iLayer] = np.dot(self.X[i], self.W[iLayer])
                else:
                    self.Z[iLayer] = np.dot(self.A[iLayer-1], self.W[iLayer])
                self.A[iLayer] = self.sigmod(self.Z[iLayer] + self.B[iLayer])            
            # Back propagation
            for iLayer in range(self.nLayers - 1)[::-1]: # reserve
                if iLayer==self.nLayers-2:# last layer
                    self.delta[iLayer] = -(self.X[i] - self.A[iLayer]) * (self.A[iLayer]*(1-self.A[iLayer]))
                    self.Jw += np.dot(self.Y[i] - self.A[iLayer], self.Y[i] - self.A[iLayer])/self.M
                else:
                    self.delta[iLayer] = np.dot(self.W[iLayer].T, self.delta[iLayer+1]) * (self.A[iLayer]*(1-self.A[iLayer]))
                # calculate dW and dB 
                if iLayer==0:
                    self.dW[iLayer] += self.X[i][:, np.newaxis] * self.delta[iLayer][:, np.newaxis].T
                else:
                    self.dW[iLayer] += self.A[iLayer-1][:, np.newaxis] * self.delta[iLayer][:, np.newaxis].T
                self.dB[iLayer] += self.delta[iLayer] 
        # update
        for iLayer in range(self.nLayers-1):
            self.W[iLayer] -= (self.alpha/self.M)*self.dW[iLayer]
            self.B[iLayer] -= (self.alpha/self.M)*self.dB[iLayer]
        
    def PlainAutoEncoder(self):
        for i in range(self.steps):
            self.BackPropAlgorithm()
            print "step:%d" % i, "Jw=%f" % self.Jw

    def ValidateAutoEncoder(self):
        for i in range(self.M):
            print self.X[i]
            for iLayer in range(self.nLayers - 1):
                if iLayer==0: # input layer
                    self.Z[iLayer] = np.dot(self.X[i], self.W[iLayer])
                else:
                    self.Z[iLayer] = np.dot(self.A[iLayer-1], self.W[iLayer])
                self.A[iLayer] = self.sigmod(self.Z[iLayer] + self.B[iLayer])
                print "\t layer=%d" % iLayer, self.A[iLayer]        

data=[]
index=[]
f=open('./data_matrix.txt','r')
for line in f.readlines():
    ss=line.replace('\n','').split('\t')
    index.append(ss[0])
    ss1=ss[1].split(' ')
    tmp=[]
    for i in xrange(len(ss1)):
        tmp.append(float(ss1[i]))
    data.append(tmp)
f.close()

x = np.array(data)
#print x
#归一化处理
xx = preprocessing.scale(x)
nNodes = np.array([ 10, 5, 10])
ae3 = AutoEncoder(xx,xx,nNodes)
ae3.PlainAutoEncoder()
ae3.ValidateAutoEncoder()

step:0 Jw=17.378615
step:1 Jw=15.703543
step:2 Jw=13.533427
step:3 Jw=11.703695
step:4 Jw=10.511857
step:5 Jw=9.690371
step:6 Jw=9.157876
step:7 Jw=8.728858
step:8 Jw=8.350373
step:9 Jw=8.016493
step:10 Jw=7.725659
step:11 Jw=7.473621
step:12 Jw=7.254889
step:13 Jw=7.064274
step:14 Jw=6.897315
step:15 Jw=6.750286
step:16 Jw=6.620098
step:17 Jw=6.504208
step:18 Jw=6.400517
step:19 Jw=6.307292
step:20 Jw=6.223098
step:21 Jw=6.146740
step:22 Jw=6.077220
step:23 Jw=6.013697
step:24 Jw=5.955462
step:25 Jw=5.901911
step:26 Jw=5.852527
step:27 Jw=5.806864
step:28 Jw=5.764539
step:29 Jw=5.725216
step:30 Jw=5.688604
step:31 Jw=5.654445
step:32 Jw=5.622514
step:33 Jw=5.592612
step:34 Jw=5.564559
step:35 Jw=5.538199
step:36 Jw=5.513390
step:37 Jw=5.490005
step:38 Jw=5.467932
step:39 Jw=5.447067
step:40 Jw=5.427319
step:41 Jw=5.408603
step:42 Jw=5.390846
step:43 Jw=5.373976
step:44 Jw=5.357934
step:45 Jw=5.342660
step:46 Jw=5.328103
step:47 Jw=5.314216
step:48 Jw=5.300956
step:49 Jw=5.288281
step:

step:728 Jw=4.058678
step:729 Jw=4.058562
step:730 Jw=4.058447
step:731 Jw=4.058333
step:732 Jw=4.058218
step:733 Jw=4.058105
step:734 Jw=4.057991
step:735 Jw=4.057879
step:736 Jw=4.057766
step:737 Jw=4.057654
step:738 Jw=4.057543
step:739 Jw=4.057432
step:740 Jw=4.057322
step:741 Jw=4.057211
step:742 Jw=4.057102
step:743 Jw=4.056993
step:744 Jw=4.056884
step:745 Jw=4.056776
step:746 Jw=4.056668
step:747 Jw=4.056561
step:748 Jw=4.056454
step:749 Jw=4.056347
step:750 Jw=4.056241
step:751 Jw=4.056135
step:752 Jw=4.056030
step:753 Jw=4.055925
step:754 Jw=4.055821
step:755 Jw=4.055717
step:756 Jw=4.055613
step:757 Jw=4.055510
step:758 Jw=4.055407
step:759 Jw=4.055305
step:760 Jw=4.055203
step:761 Jw=4.055101
step:762 Jw=4.055000
step:763 Jw=4.054899
step:764 Jw=4.054799
step:765 Jw=4.054699
step:766 Jw=4.054599
step:767 Jw=4.054500
step:768 Jw=4.054401
step:769 Jw=4.054303
step:770 Jw=4.054205
step:771 Jw=4.054107
step:772 Jw=4.054010
step:773 Jw=4.053913
step:774 Jw=4.053816
step:775 Jw=4

step:1418 Jw=4.025476
step:1419 Jw=4.025455
step:1420 Jw=4.025435
step:1421 Jw=4.025415
step:1422 Jw=4.025394
step:1423 Jw=4.025374
step:1424 Jw=4.025354
step:1425 Jw=4.025334
step:1426 Jw=4.025314
step:1427 Jw=4.025293
step:1428 Jw=4.025273
step:1429 Jw=4.025253
step:1430 Jw=4.025233
step:1431 Jw=4.025213
step:1432 Jw=4.025193
step:1433 Jw=4.025173
step:1434 Jw=4.025153
step:1435 Jw=4.025133
step:1436 Jw=4.025114
step:1437 Jw=4.025094
step:1438 Jw=4.025074
step:1439 Jw=4.025054
step:1440 Jw=4.025034
step:1441 Jw=4.025015
step:1442 Jw=4.024995
step:1443 Jw=4.024975
step:1444 Jw=4.024956
step:1445 Jw=4.024936
step:1446 Jw=4.024917
step:1447 Jw=4.024897
step:1448 Jw=4.024878
step:1449 Jw=4.024858
step:1450 Jw=4.024839
step:1451 Jw=4.024820
step:1452 Jw=4.024800
step:1453 Jw=4.024781
step:1454 Jw=4.024762
step:1455 Jw=4.024742
step:1456 Jw=4.024723
step:1457 Jw=4.024704
step:1458 Jw=4.024685
step:1459 Jw=4.024666
step:1460 Jw=4.024646
step:1461 Jw=4.024627
step:1462 Jw=4.024608
step:1463 

step:1821 Jw=4.019279
step:1822 Jw=4.019267
step:1823 Jw=4.019256
step:1824 Jw=4.019244
step:1825 Jw=4.019233
step:1826 Jw=4.019221
step:1827 Jw=4.019210
step:1828 Jw=4.019198
step:1829 Jw=4.019187
step:1830 Jw=4.019175
step:1831 Jw=4.019164
step:1832 Jw=4.019152
step:1833 Jw=4.019141
step:1834 Jw=4.019129
step:1835 Jw=4.019118
step:1836 Jw=4.019107
step:1837 Jw=4.019095
step:1838 Jw=4.019084
step:1839 Jw=4.019072
step:1840 Jw=4.019061
step:1841 Jw=4.019050
step:1842 Jw=4.019038
step:1843 Jw=4.019027
step:1844 Jw=4.019016
step:1845 Jw=4.019005
step:1846 Jw=4.018993
step:1847 Jw=4.018982
step:1848 Jw=4.018971
step:1849 Jw=4.018960
step:1850 Jw=4.018949
step:1851 Jw=4.018937
step:1852 Jw=4.018926
step:1853 Jw=4.018915
step:1854 Jw=4.018904
step:1855 Jw=4.018893
step:1856 Jw=4.018882
step:1857 Jw=4.018871
step:1858 Jw=4.018859
step:1859 Jw=4.018848
step:1860 Jw=4.018837
step:1861 Jw=4.018826
step:1862 Jw=4.018815
step:1863 Jw=4.018804
step:1864 Jw=4.018793
step:1865 Jw=4.018782
step:1866 

step:2308 Jw=4.014923
step:2309 Jw=4.014917
step:2310 Jw=4.014910
step:2311 Jw=4.014903
step:2312 Jw=4.014896
step:2313 Jw=4.014889
step:2314 Jw=4.014882
step:2315 Jw=4.014875
step:2316 Jw=4.014868
step:2317 Jw=4.014862
step:2318 Jw=4.014855
step:2319 Jw=4.014848
step:2320 Jw=4.014841
step:2321 Jw=4.014834
step:2322 Jw=4.014827
step:2323 Jw=4.014821
step:2324 Jw=4.014814
step:2325 Jw=4.014807
step:2326 Jw=4.014800
step:2327 Jw=4.014793
step:2328 Jw=4.014787
step:2329 Jw=4.014780
step:2330 Jw=4.014773
step:2331 Jw=4.014766
step:2332 Jw=4.014760
step:2333 Jw=4.014753
step:2334 Jw=4.014746
step:2335 Jw=4.014739
step:2336 Jw=4.014733
step:2337 Jw=4.014726
step:2338 Jw=4.014719
step:2339 Jw=4.014712
step:2340 Jw=4.014706
step:2341 Jw=4.014699
step:2342 Jw=4.014692
step:2343 Jw=4.014686
step:2344 Jw=4.014679
step:2345 Jw=4.014672
step:2346 Jw=4.014666
step:2347 Jw=4.014659
step:2348 Jw=4.014652
step:2349 Jw=4.014646
step:2350 Jw=4.014639
step:2351 Jw=4.014632
step:2352 Jw=4.014626
step:2353 

step:2851 Jw=4.011931
step:2852 Jw=4.011927
step:2853 Jw=4.011922
step:2854 Jw=4.011918
step:2855 Jw=4.011913
step:2856 Jw=4.011909
step:2857 Jw=4.011905
step:2858 Jw=4.011900
step:2859 Jw=4.011896
step:2860 Jw=4.011892
step:2861 Jw=4.011887
step:2862 Jw=4.011883
step:2863 Jw=4.011878
step:2864 Jw=4.011874
step:2865 Jw=4.011870
step:2866 Jw=4.011865
step:2867 Jw=4.011861
step:2868 Jw=4.011857
step:2869 Jw=4.011852
step:2870 Jw=4.011848
step:2871 Jw=4.011844
step:2872 Jw=4.011839
step:2873 Jw=4.011835
step:2874 Jw=4.011831
step:2875 Jw=4.011826
step:2876 Jw=4.011822
step:2877 Jw=4.011818
step:2878 Jw=4.011813
step:2879 Jw=4.011809
step:2880 Jw=4.011805
step:2881 Jw=4.011800
step:2882 Jw=4.011796
step:2883 Jw=4.011792
step:2884 Jw=4.011788
step:2885 Jw=4.011783
step:2886 Jw=4.011779
step:2887 Jw=4.011775
step:2888 Jw=4.011770
step:2889 Jw=4.011766
step:2890 Jw=4.011762
step:2891 Jw=4.011758
step:2892 Jw=4.011753
step:2893 Jw=4.011749
step:2894 Jw=4.011745
step:2895 Jw=4.011741
step:2896 

step:3413 Jw=4.009885
step:3414 Jw=4.009882
step:3415 Jw=4.009879
step:3416 Jw=4.009876
step:3417 Jw=4.009873
step:3418 Jw=4.009870
step:3419 Jw=4.009867
step:3420 Jw=4.009864
step:3421 Jw=4.009861
step:3422 Jw=4.009858
step:3423 Jw=4.009855
step:3424 Jw=4.009852
step:3425 Jw=4.009849
step:3426 Jw=4.009846
step:3427 Jw=4.009843
step:3428 Jw=4.009840
step:3429 Jw=4.009837
step:3430 Jw=4.009834
step:3431 Jw=4.009831
step:3432 Jw=4.009828
step:3433 Jw=4.009825
step:3434 Jw=4.009822
step:3435 Jw=4.009819
step:3436 Jw=4.009816
step:3437 Jw=4.009813
step:3438 Jw=4.009810
step:3439 Jw=4.009807
step:3440 Jw=4.009804
step:3441 Jw=4.009801
step:3442 Jw=4.009798
step:3443 Jw=4.009795
step:3444 Jw=4.009792
step:3445 Jw=4.009789
step:3446 Jw=4.009786
step:3447 Jw=4.009783
step:3448 Jw=4.009780
step:3449 Jw=4.009777
step:3450 Jw=4.009775
step:3451 Jw=4.009772
step:3452 Jw=4.009769
step:3453 Jw=4.009766
step:3454 Jw=4.009763
step:3455 Jw=4.009760
step:3456 Jw=4.009757
step:3457 Jw=4.009754
step:3458 

step:4043 Jw=4.008293
step:4044 Jw=4.008291
step:4045 Jw=4.008289
step:4046 Jw=4.008287
step:4047 Jw=4.008285
step:4048 Jw=4.008282
step:4049 Jw=4.008280
step:4050 Jw=4.008278
step:4051 Jw=4.008276
step:4052 Jw=4.008274
step:4053 Jw=4.008272
step:4054 Jw=4.008270
step:4055 Jw=4.008268
step:4056 Jw=4.008266
step:4057 Jw=4.008263
step:4058 Jw=4.008261
step:4059 Jw=4.008259
step:4060 Jw=4.008257
step:4061 Jw=4.008255
step:4062 Jw=4.008253
step:4063 Jw=4.008251
step:4064 Jw=4.008249
step:4065 Jw=4.008247
step:4066 Jw=4.008245
step:4067 Jw=4.008243
step:4068 Jw=4.008240
step:4069 Jw=4.008238
step:4070 Jw=4.008236
step:4071 Jw=4.008234
step:4072 Jw=4.008232
step:4073 Jw=4.008230
step:4074 Jw=4.008228
step:4075 Jw=4.008226
step:4076 Jw=4.008224
step:4077 Jw=4.008222
step:4078 Jw=4.008220
step:4079 Jw=4.008217
step:4080 Jw=4.008215
step:4081 Jw=4.008213
step:4082 Jw=4.008211
step:4083 Jw=4.008209
step:4084 Jw=4.008207
step:4085 Jw=4.008205
step:4086 Jw=4.008203
step:4087 Jw=4.008201
step:4088 

step:4625 Jw=4.007220
step:4626 Jw=4.007219
step:4627 Jw=4.007217
step:4628 Jw=4.007215
step:4629 Jw=4.007214
step:4630 Jw=4.007212
step:4631 Jw=4.007211
step:4632 Jw=4.007209
step:4633 Jw=4.007207
step:4634 Jw=4.007206
step:4635 Jw=4.007204
step:4636 Jw=4.007203
step:4637 Jw=4.007201
step:4638 Jw=4.007199
step:4639 Jw=4.007198
step:4640 Jw=4.007196
step:4641 Jw=4.007195
step:4642 Jw=4.007193
step:4643 Jw=4.007191
step:4644 Jw=4.007190
step:4645 Jw=4.007188
step:4646 Jw=4.007187
step:4647 Jw=4.007185
step:4648 Jw=4.007183
step:4649 Jw=4.007182
step:4650 Jw=4.007180
step:4651 Jw=4.007179
step:4652 Jw=4.007177
step:4653 Jw=4.007175
step:4654 Jw=4.007174
step:4655 Jw=4.007172
step:4656 Jw=4.007171
step:4657 Jw=4.007169
step:4658 Jw=4.007168
step:4659 Jw=4.007166
step:4660 Jw=4.007164
step:4661 Jw=4.007163
step:4662 Jw=4.007161
step:4663 Jw=4.007160
step:4664 Jw=4.007158
step:4665 Jw=4.007157
step:4666 Jw=4.007155
step:4667 Jw=4.007153
step:4668 Jw=4.007152
step:4669 Jw=4.007150
step:4670 

step:5278 Jw=4.006306
step:5279 Jw=4.006304
step:5280 Jw=4.006303
step:5281 Jw=4.006302
step:5282 Jw=4.006301
step:5283 Jw=4.006299
step:5284 Jw=4.006298
step:5285 Jw=4.006297
step:5286 Jw=4.006296
step:5287 Jw=4.006295
step:5288 Jw=4.006293
step:5289 Jw=4.006292
step:5290 Jw=4.006291
step:5291 Jw=4.006290
step:5292 Jw=4.006288
step:5293 Jw=4.006287
step:5294 Jw=4.006286
step:5295 Jw=4.006285
step:5296 Jw=4.006284
step:5297 Jw=4.006282
step:5298 Jw=4.006281
step:5299 Jw=4.006280
step:5300 Jw=4.006279
step:5301 Jw=4.006278
step:5302 Jw=4.006276
step:5303 Jw=4.006275
step:5304 Jw=4.006274
step:5305 Jw=4.006273
step:5306 Jw=4.006271
step:5307 Jw=4.006270
step:5308 Jw=4.006269
step:5309 Jw=4.006268
step:5310 Jw=4.006267
step:5311 Jw=4.006265
step:5312 Jw=4.006264
step:5313 Jw=4.006263
step:5314 Jw=4.006262
step:5315 Jw=4.006261
step:5316 Jw=4.006259
step:5317 Jw=4.006258
step:5318 Jw=4.006257
step:5319 Jw=4.006256
step:5320 Jw=4.006255
step:5321 Jw=4.006253
step:5322 Jw=4.006252
step:5323 

step:5853 Jw=4.005673
step:5854 Jw=4.005672
step:5855 Jw=4.005671
step:5856 Jw=4.005670
step:5857 Jw=4.005669
step:5858 Jw=4.005668
step:5859 Jw=4.005667
step:5860 Jw=4.005666
step:5861 Jw=4.005665
step:5862 Jw=4.005664
step:5863 Jw=4.005663
step:5864 Jw=4.005662
step:5865 Jw=4.005661
step:5866 Jw=4.005660
step:5867 Jw=4.005659
step:5868 Jw=4.005658
step:5869 Jw=4.005657
step:5870 Jw=4.005656
step:5871 Jw=4.005655
step:5872 Jw=4.005654
step:5873 Jw=4.005653
step:5874 Jw=4.005652
step:5875 Jw=4.005651
step:5876 Jw=4.005650
step:5877 Jw=4.005649
step:5878 Jw=4.005648
step:5879 Jw=4.005647
step:5880 Jw=4.005646
step:5881 Jw=4.005645
step:5882 Jw=4.005644
step:5883 Jw=4.005644
step:5884 Jw=4.005643
step:5885 Jw=4.005642
step:5886 Jw=4.005641
step:5887 Jw=4.005640
step:5888 Jw=4.005639
step:5889 Jw=4.005638
step:5890 Jw=4.005637
step:5891 Jw=4.005636
step:5892 Jw=4.005635
step:5893 Jw=4.005634
step:5894 Jw=4.005633
step:5895 Jw=4.005632
step:5896 Jw=4.005631
step:5897 Jw=4.005630
step:5898 

step:6490 Jw=4.005106
step:6491 Jw=4.005105
step:6492 Jw=4.005104
step:6493 Jw=4.005103
step:6494 Jw=4.005103
step:6495 Jw=4.005102
step:6496 Jw=4.005101
step:6497 Jw=4.005100
step:6498 Jw=4.005099
step:6499 Jw=4.005099
step:6500 Jw=4.005098
step:6501 Jw=4.005097
step:6502 Jw=4.005096
step:6503 Jw=4.005095
step:6504 Jw=4.005095
step:6505 Jw=4.005094
step:6506 Jw=4.005093
step:6507 Jw=4.005092
step:6508 Jw=4.005092
step:6509 Jw=4.005091
step:6510 Jw=4.005090
step:6511 Jw=4.005089
step:6512 Jw=4.005088
step:6513 Jw=4.005088
step:6514 Jw=4.005087
step:6515 Jw=4.005086
step:6516 Jw=4.005085
step:6517 Jw=4.005084
step:6518 Jw=4.005084
step:6519 Jw=4.005083
step:6520 Jw=4.005082
step:6521 Jw=4.005081
step:6522 Jw=4.005080
step:6523 Jw=4.005080
step:6524 Jw=4.005079
step:6525 Jw=4.005078
step:6526 Jw=4.005077
step:6527 Jw=4.005076
step:6528 Jw=4.005076
step:6529 Jw=4.005075
step:6530 Jw=4.005074
step:6531 Jw=4.005073
step:6532 Jw=4.005072
step:6533 Jw=4.005072
step:6534 Jw=4.005071
step:6535 

step:7102 Jw=4.004659
step:7103 Jw=4.004658
step:7104 Jw=4.004657
step:7105 Jw=4.004657
step:7106 Jw=4.004656
step:7107 Jw=4.004655
step:7108 Jw=4.004655
step:7109 Jw=4.004654
step:7110 Jw=4.004653
step:7111 Jw=4.004653
step:7112 Jw=4.004652
step:7113 Jw=4.004651
step:7114 Jw=4.004651
step:7115 Jw=4.004650
step:7116 Jw=4.004649
step:7117 Jw=4.004649
step:7118 Jw=4.004648
step:7119 Jw=4.004647
step:7120 Jw=4.004647
step:7121 Jw=4.004646
step:7122 Jw=4.004645
step:7123 Jw=4.004645
step:7124 Jw=4.004644
step:7125 Jw=4.004643
step:7126 Jw=4.004643
step:7127 Jw=4.004642
step:7128 Jw=4.004641
step:7129 Jw=4.004641
step:7130 Jw=4.004640
step:7131 Jw=4.004639
step:7132 Jw=4.004639
step:7133 Jw=4.004638
step:7134 Jw=4.004637
step:7135 Jw=4.004637
step:7136 Jw=4.004636
step:7137 Jw=4.004635
step:7138 Jw=4.004635
step:7139 Jw=4.004634
step:7140 Jw=4.004633
step:7141 Jw=4.004633
step:7142 Jw=4.004632
step:7143 Jw=4.004631
step:7144 Jw=4.004631
step:7145 Jw=4.004630
step:7146 Jw=4.004629
step:7147 

step:7663 Jw=4.004312
step:7664 Jw=4.004312
step:7665 Jw=4.004311
step:7666 Jw=4.004311
step:7667 Jw=4.004310
step:7668 Jw=4.004309
step:7669 Jw=4.004309
step:7670 Jw=4.004308
step:7671 Jw=4.004308
step:7672 Jw=4.004307
step:7673 Jw=4.004307
step:7674 Jw=4.004306
step:7675 Jw=4.004306
step:7676 Jw=4.004305
step:7677 Jw=4.004304
step:7678 Jw=4.004304
step:7679 Jw=4.004303
step:7680 Jw=4.004303
step:7681 Jw=4.004302
step:7682 Jw=4.004302
step:7683 Jw=4.004301
step:7684 Jw=4.004300
step:7685 Jw=4.004300
step:7686 Jw=4.004299
step:7687 Jw=4.004299
step:7688 Jw=4.004298
step:7689 Jw=4.004298
step:7690 Jw=4.004297
step:7691 Jw=4.004296
step:7692 Jw=4.004296
step:7693 Jw=4.004295
step:7694 Jw=4.004295
step:7695 Jw=4.004294
step:7696 Jw=4.004294
step:7697 Jw=4.004293
step:7698 Jw=4.004292
step:7699 Jw=4.004292
step:7700 Jw=4.004291
step:7701 Jw=4.004291
step:7702 Jw=4.004290
step:7703 Jw=4.004290
step:7704 Jw=4.004289
step:7705 Jw=4.004288
step:7706 Jw=4.004288
step:7707 Jw=4.004287
step:7708 

step:8303 Jw=4.003975
step:8304 Jw=4.003975
step:8305 Jw=4.003974
step:8306 Jw=4.003974
step:8307 Jw=4.003973
step:8308 Jw=4.003973
step:8309 Jw=4.003972
step:8310 Jw=4.003972
step:8311 Jw=4.003971
step:8312 Jw=4.003971
step:8313 Jw=4.003971
step:8314 Jw=4.003970
step:8315 Jw=4.003970
step:8316 Jw=4.003969
step:8317 Jw=4.003969
step:8318 Jw=4.003968
step:8319 Jw=4.003968
step:8320 Jw=4.003967
step:8321 Jw=4.003967
step:8322 Jw=4.003966
step:8323 Jw=4.003966
step:8324 Jw=4.003965
step:8325 Jw=4.003965
step:8326 Jw=4.003964
step:8327 Jw=4.003964
step:8328 Jw=4.003963
step:8329 Jw=4.003963
step:8330 Jw=4.003962
step:8331 Jw=4.003962
step:8332 Jw=4.003961
step:8333 Jw=4.003961
step:8334 Jw=4.003960
step:8335 Jw=4.003960
step:8336 Jw=4.003959
step:8337 Jw=4.003959
step:8338 Jw=4.003958
step:8339 Jw=4.003958
step:8340 Jw=4.003957
step:8341 Jw=4.003957
step:8342 Jw=4.003957
step:8343 Jw=4.003956
step:8344 Jw=4.003956
step:8345 Jw=4.003955
step:8346 Jw=4.003955
step:8347 Jw=4.003954
step:8348 

step:8935 Jw=4.003691
step:8936 Jw=4.003690
step:8937 Jw=4.003690
step:8938 Jw=4.003689
step:8939 Jw=4.003689
step:8940 Jw=4.003689
step:8941 Jw=4.003688
step:8942 Jw=4.003688
step:8943 Jw=4.003687
step:8944 Jw=4.003687
step:8945 Jw=4.003686
step:8946 Jw=4.003686
step:8947 Jw=4.003686
step:8948 Jw=4.003685
step:8949 Jw=4.003685
step:8950 Jw=4.003684
step:8951 Jw=4.003684
step:8952 Jw=4.003684
step:8953 Jw=4.003683
step:8954 Jw=4.003683
step:8955 Jw=4.003682
step:8956 Jw=4.003682
step:8957 Jw=4.003681
step:8958 Jw=4.003681
step:8959 Jw=4.003681
step:8960 Jw=4.003680
step:8961 Jw=4.003680
step:8962 Jw=4.003679
step:8963 Jw=4.003679
step:8964 Jw=4.003679
step:8965 Jw=4.003678
step:8966 Jw=4.003678
step:8967 Jw=4.003677
step:8968 Jw=4.003677
step:8969 Jw=4.003676
step:8970 Jw=4.003676
step:8971 Jw=4.003676
step:8972 Jw=4.003675
step:8973 Jw=4.003675
step:8974 Jw=4.003674
step:8975 Jw=4.003674
step:8976 Jw=4.003674
step:8977 Jw=4.003673
step:8978 Jw=4.003673
step:8979 Jw=4.003672
step:8980 

step:9532 Jw=4.003457
step:9533 Jw=4.003456
step:9534 Jw=4.003456
step:9535 Jw=4.003456
step:9536 Jw=4.003455
step:9537 Jw=4.003455
step:9538 Jw=4.003455
step:9539 Jw=4.003454
step:9540 Jw=4.003454
step:9541 Jw=4.003453
step:9542 Jw=4.003453
step:9543 Jw=4.003453
step:9544 Jw=4.003452
step:9545 Jw=4.003452
step:9546 Jw=4.003452
step:9547 Jw=4.003451
step:9548 Jw=4.003451
step:9549 Jw=4.003451
step:9550 Jw=4.003450
step:9551 Jw=4.003450
step:9552 Jw=4.003449
step:9553 Jw=4.003449
step:9554 Jw=4.003449
step:9555 Jw=4.003448
step:9556 Jw=4.003448
step:9557 Jw=4.003448
step:9558 Jw=4.003447
step:9559 Jw=4.003447
step:9560 Jw=4.003447
step:9561 Jw=4.003446
step:9562 Jw=4.003446
step:9563 Jw=4.003445
step:9564 Jw=4.003445
step:9565 Jw=4.003445
step:9566 Jw=4.003444
step:9567 Jw=4.003444
step:9568 Jw=4.003444
step:9569 Jw=4.003443
step:9570 Jw=4.003443
step:9571 Jw=4.003442
step:9572 Jw=4.003442
step:9573 Jw=4.003442
step:9574 Jw=4.003441
step:9575 Jw=4.003441
step:9576 Jw=4.003441
step:9577 

In [4]:
cat data_matrix.txt |head

1	0.0 1882.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2	0.0 0.0 345.0 0.0 0.0 0.0 0.0 2491.0 0.0 0.0
4	0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 328.0 0.0
5	86.0 0.0 0.0 0.0 0.0 4895.0 969.0 0.0 0.0 0.0
6	0.0 0.0 0.0 2033.0 399.0 0.0 0.0 0.0 0.0 38.0


#### kmeans聚类

In [None]:
# !/usr/bin/python
# coding:utf-8
# Author :Charlotte

from matplotlib import pyplot
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster   import KMeans
from scipy import sparse
import pandas as pd 
import Pycluster as pc
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
import pickle
from sklearn.externals import joblib


#加载数据
data = pd.read_table('data_all.txt',header = None,sep = " ")
x = data.ix[:,1:141]
card = data.ix[:,0]
x1 = np.array(x)
xx = preprocessing.scale(x1)
num_clusters = 5

clf = KMeans(n_clusters=num_clusters,  n_init=1, n_jobs = -1,verbose=1)
clf.fit(xx)
print(clf.labels_)
labels = clf.labels_
#score是轮廓系数
score = metrics.silhouette_score(xx, labels)
# clf.inertia_用来评估簇的个数是否合适，距离越小说明簇分的越好
print (clf.inertia_)
print ("score:%s"%score)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix


#### 结果对比
- base版本
    - inertia:252666.064229

    - score:0.676239435

- AE模型跑后的版本
    - inertia:662.704257502

    - score:0.962147623