Neural Netowork to classify Otto Products
#from Kaggle project: https://www.kaggle.com/c/otto-group-product-classification-challenge
#from Jiuzhang



In [1]:
import numpy as np
import pandas as pd
from patsy import dmatrices
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import metrics
import matplotlib.pyplot as plt
from __future__ import division

[](http://)Load data from ./otto_train.csv (excat path may be different)
从./otto_train.csv读入数据

In [2]:
data = pd.read_csv('../input/train.csv')

In [3]:
data

In [4]:
data.dtypes
# all features are continuous number, no discrete

数据中从第一列到倒数第二列是feature

In [5]:
#data.info alternatively

In [6]:
columns = data.columns[1:-1] # feature column, last column is "label"

In [7]:
X = data[columns]

In [8]:
y = np.ravel(data['target'])

In [9]:
y.shape

Now let's see distribution of each class
观察商品种类的分布

In [10]:
distribution = data.groupby('target').size() / data.shape[0] * 100.0
# this will give you percentage of each class, total would be 100%
distribution.plot(kind='bar')
plt.ylabel('percentage')
plt.show()

显示一个feature在不同类下的分布图

In [11]:
# show how a specific feature distributes in 9 classes
# feature 20
for id in range(9):
    plt.subplot(3, 3, id + 1) # 3行3列
    #plt.axis('off') # 不显示坐标轴
    data[data.target == 'Class_' + str(id + 1)].feat_20.hist()
plt.show()    

显示两个feature的散点图

In [12]:
# observe relationship between two features
plt.scatter(data.feat_19, data.feat_20)
plt.xlabel('feat_19')
plt.ylabel('feat_20')
plt.show()
# this means inverse proportional; for proportional, it should be a straightline

In [13]:
# On contrary, if we were to plot feat_19 again itself...
plt.scatter(data.feat_19, data.feat_19)
plt.xlabel('feat_19')
plt.ylabel('feat_19')
plt.show()

We can show feature-feature correlation matrix
显示所有feature的相关系数矩阵

In [14]:
X.corr()

In [15]:
# Let's use visualization to help understand the correlation matrix
# show relationship between all pairs of features
# correlation

fig = plt.figure()
ax = fig.add_subplot(111) # 1 row, 1 col, 1st plot
cax = ax.matshow(X.corr(), interpolation='nearest') # correlation is -1 to 1
fig.colorbar(cax)
plt.xlabel('feature')
plt.ylabel('feature')
plt.show()

In [33]:
num_fea = X.shape[1]

Now initialize neural network, whole net will be 93x30x10x9 
初始化神经网络模型，两个隐藏层，整个网络为93x30x10x9

In [34]:
#alpha is L-2 regularization coefficient
# normally need to iterate on # of nodes, and find the best
model = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes = (30, 10), random_state = 1, verbose = True)
# structure: 93 x 30 x 10 x 9

Training would normally take 1 minute
训练模型需要约1分钟

In [35]:
model.fit(X, y)
# could have standardize features

Check model coefficients and bias
观察模型系数和bias

In [19]:
model.intercepts_

In [20]:
print(model.coefs_[0].shape)
print(model.coefs_[1].shape)
print(model.coefs_[2].shape)

In [21]:
pred = model.predict(X)
pred

Print accuracy score of the model
输出训练数据上的准确度

In [22]:
model.score(X, y)

In [23]:
# alternatively, calculate in the following way
sum(pred == y) / len(y)

In [24]:
y

In [25]:
pred

In [26]:
len(y)

In [27]:
sum(pred == y)

Now let's perform prediction on the test set by loading test set first
在测试数据上进行预测

In [28]:
test_data = pd.read_csv('../input/test.csv')
Xtest = test_data[test_data.columns[1:]]
Xtest

In [29]:
test_prob = model.predict_proba(Xtest)

输出为对每一个商品预测出的属于每一种类别的概率，并加入id列，输出到./otto_prediction.tsv里

In [30]:
solution = pd.DataFrame(test_prob, columns=['Class_1','Class_2','Class_3','Class_4','Class_5','Class_6','Class_7','Class_8','Class_9'])
solution

In [31]:
solution['id'] = test_data['id']
cols = solution.columns.tolist()
cols = cols[-1:] + cols[:-1]
solution = solution[cols]

In [32]:
solution.to_csv('./otto_prediction.tsv', index = False)

Note that we won't know accuracy on test set until submitting to this competition

Summary:

1. Neural Network method was used to perform classification on Otto dataset
2. Correlation matrix and visualization were used to see feature-feature correltation
3. Classic neural network achieved ~80% accuracy
4. Training time is as short as 1 min (on a normal computer)
5. In future, feature engineering can be used to determine which features to use
6. In future, hyperparameter tuning will be studied to target higher score

