# Photo to Mood

画像データについて、GracenoteのどのMoodに相当するかを判断するModelを作成します


In [14]:
# グラフが文章中に表示されるようにするおまじない
%matplotlib inline

# autoreload module
%load_ext autoreload
%autoreload 2

# load local package
import sys
import os
current_path = os.getcwd()
sys.path.append(os.path.join(current_path, "../../"))  # load project root


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Load the Data

image_url、moodを左端に設定したファイルから学習データを読み込みます。
なお、今回値はRekognitionのスコアであり、全項目同じ範囲の値のため正規化は行いません。

In [32]:
import os
import numpy as np

data_file = os.path.join(current_path, "./data/training_data.txt")
ignore_column = 1
header = []
data = None

with open(data_file, "rb") as f:
    header = f.readline().decode("utf-8").split()
    data = np.genfromtxt(f, usecols=range(ignore_column, len(header)))
    
X = data[:, 1:]
y = data[:, 0]
header = header[(ignore_column + 1):] # ignore column + y column

print(header)
print(X.shape)
print(y.shape)


['building', 'city', 'clothing', 'downtown', 'maillot', 'swimwear']
(13, 6)
(13,)


## Create the Model

今回扱うのは画像の分類問題になります。そこで、分類問題でよく使われるSupport Vector Machineを利用します。  
特徴量の数が多いため、有効なものに限って使用します。

In [33]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

feature_count = 5
get_headers = lambda s: [i_h[1] for i_h in enumerate(header) if s[i_h[0]]]

selector = SelectKBest(f_classif, k=feature_count).fit(X, y)
selected = selector.get_support()
kbests = sorted(zip(get_headers(selected), selector.scores_[selected]), key=lambda h_s: h_s[1], reverse=True)
print(kbests)

[('building', inf), ('clothing', inf), ('maillot', inf), ('swimwear', inf), ('downtown', -13539310061041850.0)]


  f = msb / msw


## Training the Model

データとモデルがそろったため、学習させてみます。  
パラメーターはGrid Searchで探索します。

In [38]:
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.externals import joblib
from sklearn import svm

X_c = X[:, selected]

x_train, x_test, y_train, y_test = train_test_split(X_c, y, test_size=0.25, random_state=42)

candidates = [{'kernel': ["rbf"], 'gamma': [1e-3, 1e-4], 'C': [1, 10, 100]},
              {'kernel': ['linear'], 'C': [1, 10, 100]}]

clf = GridSearchCV(svm.SVC(C=1), candidates, cv=2, scoring="f1")
clf.fit(x_train, y_train)

for params, mean_score, scores in sorted(clf.grid_scores_, key=lambda s: s[1], reverse=True):
    print("%0.3f (+/-%0.03f) for %r" % (mean_score, scores.std() / 2, params))

columns = get_headers(selected)
model = clf.best_estimator_

y_predict = model.predict(x_test)
print(classification_report(y_test, y_predict))


1.000 (+/-0.000) for {'gamma': 0.001, 'kernel': 'rbf', 'C': 100}
1.000 (+/-0.000) for {'kernel': 'linear', 'C': 1}
1.000 (+/-0.000) for {'kernel': 'linear', 'C': 10}
1.000 (+/-0.000) for {'kernel': 'linear', 'C': 100}
0.556 (+/-0.250) for {'gamma': 0.001, 'kernel': 'rbf', 'C': 1}
0.556 (+/-0.250) for {'gamma': 0.0001, 'kernel': 'rbf', 'C': 1}
0.556 (+/-0.250) for {'gamma': 0.001, 'kernel': 'rbf', 'C': 10}
0.556 (+/-0.250) for {'gamma': 0.0001, 'kernel': 'rbf', 'C': 10}
0.556 (+/-0.250) for {'gamma': 0.0001, 'kernel': 'rbf', 'C': 100}
             precision    recall  f1-score   support

        0.0       1.00      1.00      1.00         2
        1.0       1.00      1.00      1.00         2

avg / total       1.00      1.00      1.00         4



  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


## Store the Model

最後に、学習させたモデルを保存します。アプリケーション側で、その結果を確認してみてください。

In [39]:
from sklearn.externals import joblib

print(columns)
joblib.dump(model, "./machine.pkl") 


['building', 'clothing', 'downtown', 'maillot', 'swimwear']


['./machine.pkl',
 './machine.pkl_01.npy',
 './machine.pkl_02.npy',
 './machine.pkl_03.npy',
 './machine.pkl_04.npy',
 './machine.pkl_05.npy',
 './machine.pkl_06.npy',
 './machine.pkl_07.npy',
 './machine.pkl_08.npy',
 './machine.pkl_09.npy',
 './machine.pkl_10.npy',
 './machine.pkl_11.npy']