# Kaggle Challenge : Give Me Some Credit


<img src="https://kaggle2.blob.core.windows.net/competitions/kaggle/2551/logos/front_page.png" style="width:200px;height:100px;">

Give Me Some Credit
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years.

Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. 

Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years.

The goal of this competition is to build a model that borrowers can use to help make the best financial decisions.

Historical data are provided on 250,000 borrowers and 
the prize pool is 5,000 

* 3,000 for first, 
* 1,500 for second and 
* 500 for third.

## Methods

I'm going to implement a ensemble classifier using a couple of methods:

* xgboost
* Random Forest 
* SVM
* PCA
* Artificial Neural Networks

## Scoring Metric: Area Under the Curve (AUC)

Evaluation is done us the AUC or Receiver operating characteristic (ROC), sometimes also referred collectively as 
 Area Under the Receiver Operating Characteristic curve.
 

Y-axis is True Positive Rate (TPR) / Sensitivity
X-axis is the False Positive Rate (FPR) / 1 - specificity

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Sensitivity_and_specificity.svg/700px-Sensitivity_and_specificity.svg.png"  style="height:400px; width:200px">
<caption><center> **ROC AUC**</center></caption><br>

TPR = True positives / Positives 
Positives = True Positives + False Negativs

FPR = False Positives / Negatives
Negatives = True Negatives + False Positives

<img src="http://mchp-appserv.cpe.umanitoba.ca/concept/roc_gif_small.gif">
<caption><center> **Figure 1**</center></caption><br>

Thus having a high AUC curve will be to have a high sensitivity while keeping the false positive rate low

Besides AUC/ROC there's also log

In [None]:
#now lets import the universe
import os

import pandas as pd
import numpy as np 

import rpy2
%load_ext rpy2.ipython
# use R's ggplot to plot instead
import matplotlib.pyplot as plt
%matplotlib inline
from tqdm import tqdm_notebook

#machine learning
## xgboost
import xgboost as xgb
from xgboost import XGBClassifier
## sklearn
import sklearn
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.pipeline import FeatureUnion     #
from sklearn_pandas import DataFrameMapper    #
from sklearn_pandas import CategoricalImputer #
from sklearn.model_selection import cross_val_score
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import FunctionTransformer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import Imputer
#score
from sklearn.metrics import roc_auc_score
import subprocess
%connect_info

# Input data

Below's the description of the input data:

| Variable Name | Description | Type |
| --- | --- | --- |
| SeriousDlqin2yrs  | Person experienced 90 days past due delinquency or worse |  Y/N |
| RevolvingUtilizationOfUnsecuredLines  | Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits  | percentage | 
| age | Age of borrower in years  | integer |
| NumberOfTime30-59DaysPastDueNotWorse  | Number of times borrower has been 30-59 days past due but no worse in the last 2 years. | integer |
| DebtRatio | Monthly debt payments, alimony,living costs divided by monthy gross income | percentage |
| MonthlyIncome | Monthly income  | real |
| NumberOfOpenCreditLinesAndLoans | Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards)  | integer |
| NumberOfTimes90DaysLate | Number of times borrower has been 90 days or more past due. | integer |
| NumberRealEstateLoansOrLines  | Number of mortgage and real estate loans including home equity lines of credit  | integer| 
| NumberOfTime60-89DaysPastDueNotWorse  | Number of times borrower has been 60-89 days past due but no worse in the last 2 years. | integer | 
| NumberOfDependents |  Number of dependents in family excluding themselves (spouse, children etc.) | integer | 

# Exploratory Analysis

## 1. Class imbalance 

In [None]:
X_train, X_test, y_train, y_test, testDF = loadData(preprocessed=False)
defaulted = np.sum(y_train != 0) + np.sum(y_test != 0)
clean = np.sum(y_train == 0) + np.sum(y_test == 0)

classSep = pd.DataFrame({
    "class":["defaulted","clean"],
    "value":[defaulted, clean]
}, index=[0,1])
classSep

In [None]:
weights = np.array(classSep['value']/np.sum(classSep['value']))
classweight = {1:weights[0], 0:weights[1]}
classweight

In [None]:
%%R -i classSep -w 10 -h 5 -u in

suppressPackageStartupMessages({
    library(tidyverse)
    library(cowplot)
})

suppressMessages({
    p1 = ggplot(classSep, aes(class, value, fill=class))+ 
        geom_histogram(stat="identity") +
        geom_text(aes(y=value, label=value)) + 
        scale_fill_discrete("Default", labels=c("Good", "Default")) +
        scale_y_log10() + ggtitle("Y-Log10 scale")
     p2 = ggplot(classSep, aes(class, value, fill=class))+ 
        geom_histogram(stat="identity") +
        geom_text(aes(y=value, label=value)) + 
        scale_fill_discrete("Default", labels=c("Good", "Default")) + ggtitle("Linear Y Scale")
    p = plot_grid(p1, p2, nrow=1) 
        })   
print(p)

In [None]:
%%R -i X_train 

suppressWarnings({ 
        library(tidyverse)
        library(GGally)
        ggpairs(X_train) %>% ggsave(filename="ggpairs.png", w=30, h=30, dpi=300) 
})

<img src="./ggpairs.png">
<caption><center> **Figure 2**: Pairs plot</center></caption><br>

In [None]:
X_train, X_test, y_train, y_test, testDF = loadData(logTransform=True, preprocessed=False)

In [None]:
%%R -i X_train 

suppressWarnings({ 
        library(tidyverse)
        library(GGally)
        X_train %>% select(-DebtRatio, -NumberOfTime30.59DaysPastDueNotWorse, -MonthlyIncome, 
                                    -NumberOfOpenCreditLinesAndLoans, -NumberOfTimes90DaysLate, 
                                    -NumberRealEstateLoansOrLines, -NumberOfTime60.89DaysPastDueNotWorse) %>% 
        ggpairs() %>%
        ggsave(filename="ggpairslog.png", w=30, h=30, dpi=300) 
})

<img src="./ggpairslog.png">
<caption><center> **Figure 2**: Pairs plot</center></caption><br>

# Model2: XGB

Now that we know the logistic regression's error rate lets try a few more models

## Log columns 

In [None]:
#with log columns
X_train, X_test, y_train, y_test, testDF = loadData(logTransform=True, 
                                                    impute=False, 
                                                    preprocessed=False)
X_train.columns

In [None]:
#sanity check
np.sum(y_train)/len(y_train), np.sum(y_test)/len(y_test)

In [None]:
X_train.head()

In [None]:
# Standardised sklearn pipeline with XGB

# Create a boolean mask for categorical columns
# Dont really need cause none of the columns are objects but lets just keep it 
categorical_feature_mask = X_train.dtypes == object

# Get list of categorical column names
categorical_columns = X_train.columns[categorical_feature_mask].tolist()

# Get list of non-categorical column names
non_categorical_columns = X_train.columns[~categorical_feature_mask].tolist()

# Apply numeric imputer (using median/mean) both gives almost the same value
# aka fill the NaNs
numeric_imputation_mapper = DataFrameMapper(
   [([numeric_feature], Imputer(strategy="median")) for numeric_feature in non_categorical_columns],
   input_df=True,
   df_out=True
)

categorical_imputation_mapper = DataFrameMapper(
    [(category_feature, Categorical()) for category_feature in categorical_columns],
    input_df=True,
    df_out=True
)

# Combine the numeric and categorical transformations
numeric_categorical_union = FeatureUnion([
    ("num_mapper", numeric_imputation_mapper),
    ("cat_mapper", categorical_imputation_mapper)
])

#param['tree_method'] = 'gpu_hist'

params = { 
        "n_estimators": 400, 
        'tree_method':['gpu_hist'], 
        'predictor':['gpu_predictor'] 
         }

# Create full pipeline
#pipeline = Pipeline([
#   ("featureunion", numeric_imputation_mapper),
#   ("clf", xgb.XGBClassifier(max_depth=3, scale_pos_weight=1)) #class imbalance
#])
weights = (y_train == 0).sum() / (1.0 * (y_train == 1).sum())
pipeline = Pipeline([
   ("featureunion", numeric_imputation_mapper),
   ("clf", xgb.XGBClassifier(max_depth=10, 
                         scale_pos_weight=weights, 
                             gamma=20
                            )) #class imbalance
])

# Perform cross-validation
#cross_val_scores_cpu = cross_val_score(pipeline, X_train, y_train, scoring="roc_auc", cv=3)

In [None]:
# Print avg. AUC
print("3-fold AUC: ", np.mean(cross_val_scores_cpu))

In [None]:
model = pipeline.fit(X_train, y_train)

In [None]:
#model = pipeline.fit(X_train, y_train)
dev = model.predict(X_test)
roc_auc_score(dev, y_test)

In [None]:
predsSubmit = model.predict(testDF)
submit(preds, "xgb_straitified_processing_weights_gamma.csv", "xgb stratified preprocessing weights gamma")

In [None]:
# Standardised sklearn pipeline with XGB

# Create a boolean mask for categorical columns
# Dont really need cause none of the columns are objects but lets just keep it 
categorical_feature_mask = X_train.dtypes == object

# Get list of categorical column names
categorical_columns = X_train.columns[categorical_feature_mask].tolist()

# Get list of non-categorical column names
non_categorical_columns = X_train.columns[~categorical_feature_mask].tolist()

# Apply numeric imputer (using median/mean) both gives almost the same value
# aka fill the NaNs
numeric_imputation_mapper = DataFrameMapper(
   [([numeric_feature], Imputer(strategy="median")) for numeric_feature in non_categorical_columns],
   input_df=True,
   df_out=True
)

categorical_imputation_mapper = DataFrameMapper(
    [(category_feature, Categorical()) for category_feature in categorical_columns],
    input_df=True,
    df_out=True
)

# Combine the numeric and categorical transformations
numeric_categorical_union = FeatureUnion([
    ("num_mapper", numeric_imputation_mapper),
    ("cat_mapper", categorical_imputation_mapper)
])

#tried running with GPU (nope doesnt work)
#param['tree_method'] = 'gpu_hist'
params = { 
        "n_estimators": 400, 
        'tree_method':['gpu_hist'], 
        'predictor':['gpu_predictor'] 
         }

# Create full pipeline
#pipeline = Pipeline([
#   ("featureunion", numeric_imputation_mapper),
#   ("clf", xgb.XGBClassifier(max_depth=3, scale_pos_weight=1)) #class imbalance
#])
weights = (y_train == 0).sum() / (1.0 * (y_train == 1).sum())
pipeline = Pipeline([
   ("featureunion", numeric_imputation_mapper),
   ("clf", xgb.XGBClassifier(max_depth=10, 
                         scale_pos_weight=weights, 
                             gamma=20
                            )) #class imbalance
])

# Perform cross-validation
#cross_val_scores_cpu = cross_val_score(pipeline, X_train, y_train, scoring="roc_auc", cv=3)

In [None]:
# Print avg. AUC
print("3-fold AUC: ", np.mean(cross_val_scores_cpu))

In [None]:
model = pipeline.fit(X_train, y_train)

In [None]:
#model = pipeline.fit(X_train, y_train)
dev = model.predict(X_test)
roc_auc_score(dev, y_test)

In [None]:
predsSubmit = model.predict(testDF)
submit(preds, "xgb_straitified_processing_weights_gamma.csv", "xgb stratified preprocessing weights gamma")

Amazing!!!

Adding the weights, and the gamma set to damn high 20, my public and private rose

0.775060, 0.769128

Tuning, we could do either one of two 

1. Randomised Search
2. Grid Search

In [None]:
gbm_param_grid = {
        'clf__learning_rate': np.arange(0.05, 1, 0.05),
            'clf__max_depth': np.arange(3, 10, 1),
        'clf__n_estimators': np.arange(50, 200, 50)#,
        #'clf__gamma':[5,10,13,16,19,20]
}

# Perform RandomizedSearchCV
grid_roc_auc = GridSearchCV(pipeline,
    param_grid=gbm_param_grid,
    scoring="roc_auc",
    cv=3,
    verbose=1,
    n_jobs=-1
)

# Fit the estimator
grid_roc_auc.fit(X_train, y_train)

# Compute metrics
print(f'my best score: {grid_roc_auc.best_score_}')
print(grid_roc_auc.best_estimator_)

In [None]:
preds = grid_roc_auc.predict(X_test)
accuracy = float(np.sum(preds==y_test)/y_test.shape[0] )
print(f'Accuracy: {accuracy}')
print(f'AUC: {roc_auc_score(preds, y_test)}')

Interesting with the gamma turned on the AUC is bad

In [None]:
preds = grid_roc_auc.predict(testDF)

In [None]:
#predsSubmit = model.predict(testDF)
submit(preds, "xgb_straitified_processing_weights_gamma_gridsearch2.csv", "xgb stratified preprocessing weights gamma grid2")

In [None]:
grid_roc_auc.best_estimator_.named_steps['clf']

with the grid search

| public | private |  
| --- | --- | 
| 0.790252 0.783193 | 

# Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier


rf_pipeline = Pipeline([
   ("featureunion", numeric_imputation_mapper),
   ("rf",RandomForestClassifier(random_state=123, n_jobs=-1, class_weight=classweight, n_estimators=600))
])

# Perform cross-validation
cross_val_scores = cross_val_score(rf_pipeline, X_train, y_train, scoring="roc_auc", cv=10)
model = pipeline.fit(X_train, y_train)
    
dev = model.predict(X_test)
testScore = roc_auc_score(dev, y_test)
    
print("3-fold AUC: ", np.mean(cross_val_scores))
print("test AUC: ", testScore)

In [None]:
rfPreds = model.predict(testDF)
#nestimators 500 or 400 gets the same score
submit(rfPreds, "rf_500.csv", "rf 500")

RF is doing quite similarly with XGB, the private, public LB is: 0.775060, 0.769128 compared with XGB's 0.790252, 0.783193 but with lesser tunning for hyperparameters required.

## Stacking / ensembling


tried stacking with a metaclassifier (linear regression) doesnt work its, worse. 
tried ensembling (vote) better but the score is still worse than individual predictor. 

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from mlxtend.classifier import StackingClassifier
from mlxtend.classifier import EnsembleVoteClassifier

X_train, X_test, y_train, y_test, testDF = loadData(
    logTransform=True, 
    impute=False, 
    preprocessed=False, 
    continuous=False
)

from sklearn.pipeline import make_pipeline

pipe1 = make_pipeline(numeric_imputation_mapper,
                      XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=20, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=150,
       n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=13.960728088766986,
       seed=None, silent=True, subsample=1))

pipe2 = make_pipeline(numeric_imputation_mapper,
                      RandomForestClassifier(max_depth=None, random_state=123, class_weight=classweight, n_estimators=400)
                      )


# Initializing models
#clf1 = KNeighborsClassifier(n_neighbors=1)
#using the gridSearch version
#clf2 = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
#       colsample_bytree=1, gamma=20, learning_rate=0.1, max_delta_step=0,
#       max_depth=3, min_child_weight=1, missing=None, n_estimators=150,
#       n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,
#       reg_alpha=0, reg_lambda=1, scale_pos_weight=13.960728088766986,
#       seed=None, silent=True, subsample=1)
#clf1 = RandomForestClassifier(max_depth=None, random_state=123, class_weight=classweight, n_estimators=500)
#clf2 = XGBClassifier(max_depth=3, scale_pos_weight=1) #class imbalance
#clf3 = GaussianNB()
#clf3 = LogisticRegression()
eclf = EnsembleVoteClassifier(clfs=[pipe1, pipe2],
#eclf = EnsembleVoteClassifier(clfs=[clf1, clf2],
                              weights=[1, 1], voting='soft')
#sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], 
#                          meta_classifier=lr)

#params = {
#    'logisticregression__C':  [0.001, 0.01]
#}

#grid = GridSearchCV(estimator=sclf, 
#grid = GridSearchCV(estimator=eclf, 
#                    param_grid=params, 
#                    cv=5,
#                    n_jobs=-1,
#                    scoring='roc_auc',
#                    refit=True)
#X_train.head()
#newX = numeric_imputation_mapper.transform(X_train)
#grid.fit(X_train, y_train)
eclf.fit(X_train, y_train)

In [None]:
#cv_keys = ('mean_test_score', 'std_test_score', 'params')
#for r, _ in enumerate(grid.cv_results_['mean_test_score']):
#    print("%0.3f +/- %0.2f %r"
#          % (grid.cv_results_[cv_keys[0]][r],
#             grid.cv_results_[cv_keys[1]][r] / 2.0,
#             grid.cv_results_[cv_keys[2]][r]))
#newXtest = numeric_imputation_mapper.transform(X_test)
#devstack = grid.predict(newXtest)
#testScore = roc_auc_score(devstack, y_test)
#newXtest = numeric_imputation_mapper.transform(X_test)
#devstack = grid.predict(X_test)
devstack = eclf.predict(X_test)
testScore = roc_auc_score(devstack, y_test)
#print('Best parameters: %s' % grid.best_params_)
print(f'test AUC: {testScore}')

In [None]:
#newXtest = numeric_imputation_mapper.transform(testDF)
#stackPreds = grid.predict(newXtest)
votePreds = eclf.predict(testDF)
submit(votePreds, "xgb_rf_voting2.csv", "xgb rf voting2")

Adding RF and XGB brought the score down: private, public: 
        0.721574, 0.722799


## Single Layer perceptron

In [None]:
# Multilayer perceptron
import keras
from keras.models import Sequential
from keras.models import Model
#layers
from keras.layers import Dense, Dropout, Input
from keras import regularizers
from keras.wrappers.scikit_learn import KerasClassifier
#import tensorflow as tf
#sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

In [None]:
# fix random seed for reproducibility
np.random.seed(7)

# split into input (X) and output (Y) variables
X_train_matrix = X_train.as_matrix(columns=[X_train.columns[:]])
X_train_matrix[0,:]

In [None]:
from sklearn.utils import class_weight
class_weight = class_weight.compute_class_weight('balanced', np.unique(y_train), y_train)
class_weight

In [None]:
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=16, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=2000, verbose=1)
kfold = StratifiedKFold(y_train.values, n_folds=5, shuffle=True, random_state=123)
results = cross_val_score(estimator, X_train_matrix, y_train.values, cv=kfold)

In [None]:
print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

In [None]:
estimator.fit(X_train_matrix, y_train)

In [None]:
estimatorLayer = estimator.predict(X_test_matrix)
accuracy = float(np.sum(estimatorLayer.flatten() == y_test)/y_test.shape[0])
np.unique(estimatorLayer.flatten())

Severely overfitting, time to do some feature engineering and ensembling

# Multilayer perceptron

In [None]:
from keras.callbacks import EarlyStopping
from sklearn.cross_validation import StratifiedKFold
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=0, mode='auto')

#def create_model():
    # create model
model = Sequential()
model.add(Dense(128, input_shape=(16,), activation='relu'))
model.add(Dropout(0.5))
#model.add(Dense(256, activation='relu', activity_regularizer=regularizers.l1(10e-5)))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.25))
#model.add(Dense(32, activation='relu', activity_regularizer=regularizers.l1(10e-5)))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid')) #output
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

nb_epoch = 100
batch_size = 2000

#def train_and_evaluate__model(model, data, label, data_test, label_test):

model.fit(
        X_train_matrix, y_train.values, validation_split=0.1, 
        #callbacks=[early_stopping],
        class_weight=[0.5, 100], 
        epochs = nb_epoch,
        batch_size = batch_size
)
# skf = StratifiedKFold(y_train, n_folds=10, shuffle=True)
# for i, (train, test) in enumerate(skf):
#     print "Running Fold", i+1, "/", n_folds
#     model = None # Clearing the NN.
#     model = create_model()
#     train_and_evaluate_model(model, X_train[train], y_train[train], X_test[test], y_test[test))

In [None]:
model.summary()

In [None]:
X_test_matrix.shape, np.sum(y_train.values)

In [None]:
X_test_matrix = X_test.as_matrix(columns=[X_test.columns[:]])
preds = model.predict(X_train_matrix)
preds
#fail the neural net still thinks everyone is 0
#accuracy = float(np.sum(preds.flatten() == y_test)/y_test.shape[0])
#accuracy

# Autoencoder

(didnt work the loss is incredulous)
this is based from the inspiration i got from this post by Veneline Valkov

https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd

In [None]:
X_train, X_test, y_train, y_test, testDF = loadData(logTransform=False, 
                                                    impute=False, 
                                                    preprocessed=True, continuous=True)

In [None]:
X_train.columns

In [None]:
%%R -i X_train 

suppressWarnings({ 
        library(tidyverse)
        library(GGally)
        X_train %>% #select(-DebtRatio, -NumberOfTime30.59DaysPastDueNotWorse, -MonthlyIncome, 
#                                    -NumberOfOpenCreditLinesAndLoans, -NumberOfTimes90DaysLate, 
#                                    -NumberRealEstateLoansOrLines, -NumberOfTime60.89DaysPastDueNotWorse) %>% 
        ggpairs() %>%
        ggsave(filename="ggpairs_selected.png", w=30, h=30, dpi=300) 
})

<img src="./ggpairs_selected.png">
<caption><center> **Figure 3**: Pairs plot</center></caption><br>

In [None]:
defaulting = X_train[y_train == 1]
normal = X_train[y_train == 0]
normal_test = X_test[y_test == 0]

In [None]:
normal = normal.values
normal_test = normal_test.values
normal_test.shape, normal.shape, 

In [None]:
#from sklearn.preprocessing import StandardScaler
#data['Amount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1, 1))

input_dim = X_train.shape[1]
encoding_dim = 18 

input_layer = Input(shape=(input_dim, ))

encoder = Dense(encoding_dim, activation="tanh", 
                activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoder = Dense(int(encoding_dim / 2), activation="relu")(encoder)

decoder = Dense(int(encoding_dim / 2), activation='tanh')(encoder)
decoder = Dense(input_dim, activation='relu')(decoder)

autoencoder = Model(input=input_layer, output=decoder)

In [None]:
from keras.callbacks import TensorBoard
from keras.callbacks import ModelCheckpoint 
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=0, mode='auto')

nb_epoch = 100
batch_size = 1000

autoencoder.compile(optimizer='adam', 
                    loss='mean_squared_error', 
                    metrics=['accuracy'])
autoencoder.summary()
checkpointer = ModelCheckpoint(filepath="model2.h5",
                               verbose=0,
                               save_best_only=True)
tensorboard = TensorBoard(log_dir='./logs',
                          histogram_freq=0,
                          write_graph=True,
                          write_images=True)

In [None]:
history = autoencoder.fit(normal, normal,
                    nb_epoch=nb_epoch,
                    batch_size=batch_size,
                    shuffle=True,
                    validation_data=(normal_test, normal_test),
                    verbose=1,
                    callbacks=[tensorboard, checkpointer]).history

#gave up the loss is too fucking big

In [None]:
plt.plot(history['loss'])
plt.plot(history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right');