**The idea is:**

 - Feature reduction with PCA
 - Data transformation (log, hot encoding, nan)
 - Test different regression models

**Things found:**

- Applying log transformation really increases the accuracy.
- Using PCA with 36 components makes the learning and testing much (much much) faster.
- Removing columns with more than 1000 NaNs gives better result than applying "mean" to them.
- There are outliers. Instead of removing them, using Huber seems to provide a good result. Huber is a model robust to outliers.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.decomposition import PCA
from sklearn.preprocessing import Imputer
from sklearn.model_selection import KFold
from sklearn import linear_model
from sklearn.metrics import make_scorer
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import svm
from sklearn.metrics import r2_score
from sklearn.ensemble import AdaBoostRegressor
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import tflearn
import tensorflow as tf
import seaborn
import warnings
warnings.filterwarnings('ignore')

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

sample_submission.csv
test.csv
train.csv



## Data Load ##

I mix data and test to manipulate all the data just once. SalePrice is extracted to its own variable "labels". Finally, SalesPrice is remove from data.

In [2]:
train = pd.read_csv('../input/train.csv')
labels=train["SalePrice"]
test = pd.read_csv('../input/test.csv')
data = pd.concat([train,test],ignore_index=True)
data = data.drop("SalePrice", 1)
ids = test["Id"]

In [3]:
train.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [4]:
# Count the number of rows in train
train.shape[0]

1460

In [6]:
# Count the number of rows in total
data.shape[0]

2919

In [5]:
# Count the number of NaNs each column has.
nans=pd.isnull(data).sum()
nans[nans>0]

Alley           2721
BsmtCond          82
BsmtExposure      82
BsmtFinSF1         1
BsmtFinSF2         1
BsmtFinType1      79
BsmtFinType2      80
BsmtFullBath       2
BsmtHalfBath       2
BsmtQual          81
BsmtUnfSF          1
Electrical         1
Exterior1st        1
Exterior2nd        1
Fence           2348
FireplaceQu     1420
Functional         2
GarageArea         1
GarageCars         1
GarageCond       159
GarageFinish     159
GarageQual       159
GarageType       157
GarageYrBlt      159
KitchenQual        1
LotFrontage      486
MSZoning           4
MasVnrArea        23
MasVnrType        24
MiscFeature     2814
PoolQC          2909
SaleType           1
TotalBsmtSF        1
Utilities          2
dtype: int64

In [7]:
# Remove id and columns with more than a thousand missing values
data=data.drop("Id", 1)
data=data.drop("Alley", 1)
data=data.drop("Fence", 1)
data=data.drop("MiscFeature", 1)
data=data.drop("PoolQC", 1)
data=data.drop("FireplaceQu", 1)

In [8]:
# Count the column types
data.dtypes.value_counts()

object     38
int64      25
float64    11
dtype: int64

## Data Manipulation ##

- Apply hot encoding, convert categorical variable into dummy/indicator variables.
- Fill NaN with median for that column.
- Log transformation.
- Change -inf to 0.

In [9]:
all_columns = data.columns.values
non_categorical = ["LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", 
                   "BsmtFinSF2", "BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", 
                   "2ndFlrSF", "LowQualFinSF", "GrLivArea", "GarageArea", 
                   "WoodDeckSF", "OpenPorchSF", "EnclosedPorch", "3SsnPorch", 
                   "ScreenPorch","PoolArea", "MiscVal"]

categorical = [value for value in all_columns if value not in non_categorical]

In [10]:
# One Hot Encoding and nan transformation
data = pd.get_dummies(data)

imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
data = imp.fit_transform(data)

# Log transformation
data = np.log(data)
labels = np.log(labels)

# Change -inf to 0 again
data[data==-np.inf]=0

## Feature reduction ##

There are many features, so I am going to use PCA to reduce them. The idea is to start with n_components = number of columns. Then select the number of components that add up to 1 variance_ratio.

In [11]:
pca = PCA(whiten=True)
pca.fit(data)
variance = pd.DataFrame(pca.explained_variance_ratio_)
np.cumsum(pca.explained_variance_ratio_)

array([ 0.2248857 ,  0.40281429,  0.52425789,  0.62418823,  0.69580422,
        0.75944463,  0.8116806 ,  0.85647038,  0.89178708,  0.92273755,
        0.94898868,  0.95842727,  0.96637545,  0.97380464,  0.97971901,
        0.98501952,  0.98918839,  0.99199181,  0.99386559,  0.99520919,
        0.99611479,  0.99695667,  0.99771023,  0.99842564,  0.9989402 ,
        0.99933882,  0.99959949,  0.99978254,  0.99988174,  0.99993998,
        0.99998599,  0.99999658,  0.99999871,  0.99999943,  0.99999999,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.  

In [12]:
pca = PCA(n_components=36,whiten=True)
pca = pca.fit(data)
dataPCA = pca.transform(data)

## Data Model Selection ##

Simple test to run multiple models against our data. First, with raw features. No PCA.

In [13]:
# Split traing and test
train = data[:1460]
test = data[1460:]

In [None]:
# R2 Score

def lets_try(train,labels):
    results={}
    def test_model(clf):
        
        cv = KFold(n_splits=5,shuffle=True,random_state=45)
        r2 = make_scorer(r2_score)
        r2_val_score = cross_val_score(clf, train, labels, cv=cv,scoring=r2)
        scores=[r2_val_score.mean()]
        return scores

    clf = linear_model.LinearRegression()
    results["Linear"]=test_model(clf)
    
    clf = linear_model.Ridge()
    results["Ridge"]=test_model(clf)
    
    clf = linear_model.BayesianRidge()
    results["Bayesian Ridge"]=test_model(clf)
    
    clf = linear_model.HuberRegressor()
    results["Hubber"]=test_model(clf)
    
    clf = linear_model.Lasso(alpha=1e-4)
    results["Lasso"]=test_model(clf)
    
    clf = BaggingRegressor()
    results["Bagging"]=test_model(clf)
    
    clf = RandomForestRegressor()
    results["RandomForest"]=test_model(clf)
    
    clf = AdaBoostRegressor()
    results["AdaBoost"]=test_model(clf)
    
    clf = svm.SVR()
    results["SVM RBF"]=test_model(clf)
    
    clf = svm.SVR(kernel="linear")
    results["SVM Linear"]=test_model(clf)
    
    results = pd.DataFrame.from_dict(results,orient='index')
    results.columns=["R Square Score"] 
    results=results.sort(columns=["R Square Score"],ascending=False)
    results.plot(kind="bar",title="Model Scores")
    axes = plt.gca()
    axes.set_ylim([0.5,1])
    return results

lets_try(train,labels)

Now, let's try the same but using data with PCA applied.

In [14]:
# Split traing and test
train = dataPCA[:1460]
test = dataPCA[1460:]

lets_try(train,labels)

NameError: name 'lets_try' is not defined

In [None]:
cv = KFold(n_splits=5,shuffle=True,random_state=45)

parameters = {'alpha': [1000,100,10],
              'epsilon' : [1.2,1.25,1.50],
              'tol' : [1e-10]}

clf = linear_model.HuberRegressor()
r2 = make_scorer(r2_score)
grid_obj = GridSearchCV(clf, parameters, cv=cv,scoring=r2)
grid_fit = grid_obj.fit(train, labels)
best_clf = grid_fit.best_estimator_ 

best_clf.fit(train,labels)

Simple Neural Network
---------------------

Now I am going to try a simple neural network, to see if i can improve the result.

In [15]:
# Shape the labels
labels_nl = labels
labels_nl = labels_nl.reshape(-1,1)

In [43]:
tf.reset_default_graph()
r2 = tflearn.R2()
net = tflearn.input_data(shape=[None, train.shape[1]])
net = tflearn.fully_connected(net, 30, activation='linear')
net = tflearn.fully_connected(net, 30, activation='linear')
net = tflearn.fully_connected(net, 20, activation='linear')
net = tflearn.fully_connected(net, 20, activation='linear')
net = tflearn.fully_connected(net, 10, activation='linear')
net = tflearn.fully_connected(net, 10, activation='linear')
net = tflearn.fully_connected(net, 1, activation='linear')
sgd = tflearn.SGD(learning_rate=0.1, lr_decay=0.01, decay_step=100)
net = tflearn.regression(net, optimizer=sgd, loss='mean_square',metric=r2)
model = tflearn.DNN(net)

In [44]:
model.fit(train, labels_nl,show_metric=True,validation_set=0.2,shuffle=True,n_epoch=50)

---------------------------------
Run id: XYUAEX
Log directory: /tmp/tflearn_logs/
INFO:tensorflow:Summary name StandardError/ (raw) is illegal; using StandardError/__raw_ instead.
---------------------------------
Training samples: 1168
Validation samples: 292
--
Training Step: 1  | total loss: [1m[32m132.00549[0m[0m | time: 0.037s
| SGD | epoch: 001 | loss: 132.00549 - R2: 0.0000 -- iter: 0020/1168
Training Step: 2  | total loss: [1m[32m142.06914[0m[0m | time: 0.044s
| SGD | epoch: 001 | loss: 142.06914 - R2: 0.0000 -- iter: 0040/1168
Training Step: 3  | total loss: [1m[32m143.14580[0m[0m | time: 0.049s
| SGD | epoch: 001 | loss: 143.14580 - R2: 0.0001 -- iter: 0060/1168
Training Step: 4  | total loss: [1m[32m141.59778[0m[0m | time: 0.057s
| SGD | epoch: 001 | loss: 141.59778 - R2: 0.0001 -- iter: 0080/1168
Training Step: 5  | total loss: [1m[32m141.59778[0m[0m | time: 0.064s
| SGD | epoch: 001 | loss: 141.59778 - R2: 0.0001 -- iter: 0100/1168
Training Step: 6  | 

Training Step: 57  | total loss: [1m[32m91.51981[0m[0m | time: 0.291s
| SGD | epoch: 001 | loss: 91.51981 - R2: 0.0426 -- iter: 1140/1168
Training Step: 58  | total loss: [1m[32m89.66460[0m[0m | time: 0.294s
| SGD | epoch: 001 | loss: 89.66460 - R2: 0.0460 -- iter: 1160/1168
Training Step: 59  | total loss: [1m[32m88.88686[0m[0m | time: 1.316s
| SGD | epoch: 001 | loss: 88.88686 - R2: 0.0477 | val_loss: 81.62693 - val_acc: 0.0617 -- iter: 1168/1168
--
Training Step: 60  | total loss: [1m[32m88.08608[0m[0m | time: 0.054s
| SGD | epoch: 002 | loss: 88.08608 - R2: 0.0495 -- iter: 0020/1168
Training Step: 61  | total loss: [1m[32m86.81692[0m[0m | time: 0.063s
| SGD | epoch: 002 | loss: 86.81692 - R2: 0.0515 -- iter: 0040/1168
Training Step: 62  | total loss: [1m[32m85.67319[0m[0m | time: 0.068s
| SGD | epoch: 002 | loss: 85.67319 - R2: 0.0536 -- iter: 0060/1168
Training Step: 63  | total loss: [1m[32m84.95834[0m[0m | time: 0.072s
| SGD | epoch: 002 | loss: 84.95

Training Step: 115  | total loss: [1m[32m27.83018[0m[0m | time: 0.367s
| SGD | epoch: 002 | loss: 27.83018 - R2: 0.3305 -- iter: 1120/1168
Training Step: 116  | total loss: [1m[32m26.70437[0m[0m | time: 0.371s
| SGD | epoch: 002 | loss: 26.70437 - R2: 0.3413 -- iter: 1140/1168
Training Step: 117  | total loss: [1m[32m26.70437[0m[0m | time: 0.375s
| SGD | epoch: 002 | loss: 26.70437 - R2: 0.3413 -- iter: 1160/1168
Training Step: 118  | total loss: [1m[32m25.62698[0m[0m | time: 1.380s
| SGD | epoch: 002 | loss: 25.62698 - R2: 0.3523 | val_loss: 13.28438 - val_acc: 0.4872 -- iter: 1168/1168
--
Training Step: 119  | total loss: [1m[32m24.54867[0m[0m | time: 0.021s
| SGD | epoch: 003 | loss: 24.54867 - R2: 0.3765 -- iter: 0020/1168
Training Step: 120  | total loss: [1m[32m22.19878[0m[0m | time: 0.024s
| SGD | epoch: 003 | loss: 22.19878 - R2: 0.3900 -- iter: 0040/1168
Training Step: 121  | total loss: [1m[32m21.05578[0m[0m | time: 0.027s
| SGD | epoch: 003 | loss

Training Step: 173  | total loss: [1m[32m0.21417[0m[0m | time: 0.245s
| SGD | epoch: 003 | loss: 0.21417 - R2: 0.9901 -- iter: 1100/1168
Training Step: 174  | total loss: [1m[32m0.17586[0m[0m | time: 0.248s
| SGD | epoch: 003 | loss: 0.17586 - R2: 0.9920 -- iter: 1120/1168
Training Step: 175  | total loss: [1m[32m0.17586[0m[0m | time: 0.251s
| SGD | epoch: 003 | loss: 0.17586 - R2: 0.9920 -- iter: 1140/1168
Training Step: 176  | total loss: [1m[32m0.14538[0m[0m | time: 0.253s
| SGD | epoch: 003 | loss: 0.14538 - R2: 0.9931 -- iter: 1160/1168
Training Step: 177  | total loss: [1m[32m0.13345[0m[0m | time: 1.266s
| SGD | epoch: 003 | loss: 0.13345 - R2: 0.9943 | val_loss: 0.03061 - val_acc: 1.0001 -- iter: 1168/1168
--
Training Step: 178  | total loss: [1m[32m0.12247[0m[0m | time: 0.040s
| SGD | epoch: 004 | loss: 0.12247 - R2: 0.9941 -- iter: 0020/1168
Training Step: 179  | total loss: [1m[32m0.12247[0m[0m | time: 0.043s
| SGD | epoch: 004 | loss: 0.12247 - R2

Training Step: 231  | total loss: [1m[32m0.02748[0m[0m | time: 0.274s
| SGD | epoch: 004 | loss: 0.02748 - R2: 0.9996 -- iter: 1080/1168
Training Step: 232  | total loss: [1m[32m0.02516[0m[0m | time: 0.277s
| SGD | epoch: 004 | loss: 0.02516 - R2: 0.9998 -- iter: 1100/1168
Training Step: 233  | total loss: [1m[32m0.02336[0m[0m | time: 0.281s
| SGD | epoch: 004 | loss: 0.02336 - R2: 1.0000 -- iter: 1120/1168
Training Step: 234  | total loss: [1m[32m0.02336[0m[0m | time: 0.300s
| SGD | epoch: 004 | loss: 0.02336 - R2: 1.0000 -- iter: 1140/1168
Training Step: 235  | total loss: [1m[32m0.02129[0m[0m | time: 0.305s
| SGD | epoch: 004 | loss: 0.02129 - R2: 0.9997 -- iter: 1160/1168
Training Step: 236  | total loss: [1m[32m0.02129[0m[0m | time: 1.312s
| SGD | epoch: 004 | loss: 0.02129 - R2: 0.9997 | val_loss: 0.03318 - val_acc: 0.9993 -- iter: 1168/1168
--
Training Step: 237  | total loss: [1m[32m0.02162[0m[0m | time: 0.070s
| SGD | epoch: 005 | loss: 0.02162 - R2

Training Step: 289  | total loss: [1m[32m0.01639[0m[0m | time: 0.472s
| SGD | epoch: 005 | loss: 0.01639 - R2: 0.9999 -- iter: 1060/1168
Training Step: 290  | total loss: [1m[32m0.02368[0m[0m | time: 0.476s
| SGD | epoch: 005 | loss: 0.02368 - R2: 1.0004 -- iter: 1080/1168
Training Step: 291  | total loss: [1m[32m0.02368[0m[0m | time: 0.479s
| SGD | epoch: 005 | loss: 0.02368 - R2: 1.0004 -- iter: 1100/1168
Training Step: 292  | total loss: [1m[32m0.02321[0m[0m | time: 0.481s
| SGD | epoch: 005 | loss: 0.02321 - R2: 0.9994 -- iter: 1120/1168
Training Step: 293  | total loss: [1m[32m0.02247[0m[0m | time: 0.484s
| SGD | epoch: 005 | loss: 0.02247 - R2: 0.9989 -- iter: 1140/1168
Training Step: 294  | total loss: [1m[32m0.02025[0m[0m | time: 0.489s
| SGD | epoch: 005 | loss: 0.02025 - R2: 0.9988 -- iter: 1160/1168
Training Step: 295  | total loss: [1m[32m0.02135[0m[0m | time: 1.509s
| SGD | epoch: 005 | loss: 0.02135 - R2: 0.9988 | val_loss: 0.03197 - val_acc: 1

Training Step: 347  | total loss: [1m[32m0.02273[0m[0m | time: 0.265s
| SGD | epoch: 006 | loss: 0.02273 - R2: 1.0000 -- iter: 1040/1168
Training Step: 348  | total loss: [1m[32m0.02226[0m[0m | time: 0.268s
| SGD | epoch: 006 | loss: 0.02226 - R2: 0.9995 -- iter: 1060/1168
Training Step: 349  | total loss: [1m[32m0.02290[0m[0m | time: 0.271s
| SGD | epoch: 006 | loss: 0.02290 - R2: 0.9998 -- iter: 1080/1168
Training Step: 350  | total loss: [1m[32m0.02158[0m[0m | time: 0.275s
| SGD | epoch: 006 | loss: 0.02158 - R2: 0.9994 -- iter: 1100/1168
Training Step: 351  | total loss: [1m[32m0.02193[0m[0m | time: 0.279s
| SGD | epoch: 006 | loss: 0.02193 - R2: 0.9992 -- iter: 1120/1168
Training Step: 352  | total loss: [1m[32m0.02193[0m[0m | time: 0.281s
| SGD | epoch: 006 | loss: 0.02193 - R2: 0.9990 -- iter: 1140/1168
Training Step: 353  | total loss: [1m[32m0.02056[0m[0m | time: 0.283s
| SGD | epoch: 006 | loss: 0.02056 - R2: 0.9989 -- iter: 1160/1168
Training Step

Training Step: 405  | total loss: [1m[32m0.01602[0m[0m | time: 0.256s
| SGD | epoch: 007 | loss: 0.01602 - R2: 1.0001 -- iter: 1020/1168
Training Step: 406  | total loss: [1m[32m0.01563[0m[0m | time: 0.259s
| SGD | epoch: 007 | loss: 0.01563 - R2: 0.9997 -- iter: 1040/1168
Training Step: 407  | total loss: [1m[32m0.01556[0m[0m | time: 0.265s
| SGD | epoch: 007 | loss: 0.01556 - R2: 0.9992 -- iter: 1060/1168
Training Step: 408  | total loss: [1m[32m0.01603[0m[0m | time: 0.268s
| SGD | epoch: 007 | loss: 0.01603 - R2: 1.0009 -- iter: 1080/1168
Training Step: 409  | total loss: [1m[32m0.01574[0m[0m | time: 0.274s
| SGD | epoch: 007 | loss: 0.01574 - R2: 1.0009 -- iter: 1100/1168
Training Step: 410  | total loss: [1m[32m0.01574[0m[0m | time: 0.276s
| SGD | epoch: 007 | loss: 0.01574 - R2: 1.0009 -- iter: 1120/1168
Training Step: 411  | total loss: [1m[32m0.01569[0m[0m | time: 0.278s
| SGD | epoch: 007 | loss: 0.01569 - R2: 1.0013 -- iter: 1140/1168
Training Step

Training Step: 463  | total loss: [1m[32m0.02075[0m[0m | time: 0.210s
| SGD | epoch: 008 | loss: 0.02075 - R2: 1.0004 -- iter: 1000/1168
Training Step: 464  | total loss: [1m[32m0.02075[0m[0m | time: 0.212s
| SGD | epoch: 008 | loss: 0.02075 - R2: 1.0004 -- iter: 1020/1168
Training Step: 465  | total loss: [1m[32m0.02223[0m[0m | time: 0.215s
| SGD | epoch: 008 | loss: 0.02223 - R2: 0.9994 -- iter: 1040/1168
Training Step: 466  | total loss: [1m[32m0.02094[0m[0m | time: 0.218s
| SGD | epoch: 008 | loss: 0.02094 - R2: 0.9999 -- iter: 1060/1168
Training Step: 467  | total loss: [1m[32m0.02046[0m[0m | time: 0.220s
| SGD | epoch: 008 | loss: 0.02046 - R2: 1.0000 -- iter: 1080/1168
Training Step: 468  | total loss: [1m[32m0.02046[0m[0m | time: 0.223s
| SGD | epoch: 008 | loss: 0.02046 - R2: 1.0000 -- iter: 1100/1168
Training Step: 469  | total loss: [1m[32m0.02157[0m[0m | time: 0.227s
| SGD | epoch: 008 | loss: 0.02157 - R2: 1.0001 -- iter: 1120/1168
Training Step

Training Step: 521  | total loss: [1m[32m0.02457[0m[0m | time: 0.207s
| SGD | epoch: 009 | loss: 0.02457 - R2: 0.9997 -- iter: 0980/1168
Training Step: 522  | total loss: [1m[32m0.02457[0m[0m | time: 0.211s
| SGD | epoch: 009 | loss: 0.02457 - R2: 0.9997 -- iter: 1000/1168
Training Step: 523  | total loss: [1m[32m0.02344[0m[0m | time: 0.214s
| SGD | epoch: 009 | loss: 0.02344 - R2: 0.9999 -- iter: 1020/1168
Training Step: 524  | total loss: [1m[32m0.02344[0m[0m | time: 0.217s
| SGD | epoch: 009 | loss: 0.02344 - R2: 0.9997 -- iter: 1040/1168
Training Step: 525  | total loss: [1m[32m0.02184[0m[0m | time: 0.221s
| SGD | epoch: 009 | loss: 0.02184 - R2: 1.0006 -- iter: 1060/1168
Training Step: 526  | total loss: [1m[32m0.02117[0m[0m | time: 0.225s
| SGD | epoch: 009 | loss: 0.02117 - R2: 1.0010 -- iter: 1080/1168
Training Step: 527  | total loss: [1m[32m0.02117[0m[0m | time: 0.232s
| SGD | epoch: 009 | loss: 0.02117 - R2: 1.0010 -- iter: 1100/1168
Training Step

Training Step: 579  | total loss: [1m[32m0.02221[0m[0m | time: 0.219s
| SGD | epoch: 010 | loss: 0.02221 - R2: 0.9997 -- iter: 0960/1168
Training Step: 580  | total loss: [1m[32m0.02099[0m[0m | time: 0.222s
| SGD | epoch: 010 | loss: 0.02099 - R2: 1.0000 -- iter: 0980/1168
Training Step: 581  | total loss: [1m[32m0.01950[0m[0m | time: 0.225s
| SGD | epoch: 010 | loss: 0.01950 - R2: 1.0000 -- iter: 1000/1168
Training Step: 582  | total loss: [1m[32m0.01994[0m[0m | time: 0.229s
| SGD | epoch: 010 | loss: 0.01994 - R2: 0.9995 -- iter: 1020/1168
Training Step: 583  | total loss: [1m[32m0.01994[0m[0m | time: 0.232s
| SGD | epoch: 010 | loss: 0.01994 - R2: 0.9997 -- iter: 1040/1168
Training Step: 584  | total loss: [1m[32m0.01858[0m[0m | time: 0.235s
| SGD | epoch: 010 | loss: 0.01858 - R2: 0.9997 -- iter: 1060/1168
Training Step: 585  | total loss: [1m[32m0.01774[0m[0m | time: 0.237s
| SGD | epoch: 010 | loss: 0.01774 - R2: 0.9994 -- iter: 1080/1168
Training Step

Training Step: 637  | total loss: [1m[32m0.03030[0m[0m | time: 0.198s
| SGD | epoch: 011 | loss: 0.03030 - R2: 0.9997 -- iter: 0940/1168
Training Step: 638  | total loss: [1m[32m0.03082[0m[0m | time: 0.204s
| SGD | epoch: 011 | loss: 0.03082 - R2: 1.0011 -- iter: 0960/1168
Training Step: 639  | total loss: [1m[32m0.02935[0m[0m | time: 0.210s
| SGD | epoch: 011 | loss: 0.02935 - R2: 1.0014 -- iter: 0980/1168
Training Step: 640  | total loss: [1m[32m0.02742[0m[0m | time: 0.213s
| SGD | epoch: 011 | loss: 0.02742 - R2: 1.0016 -- iter: 1000/1168
Training Step: 641  | total loss: [1m[32m0.02704[0m[0m | time: 0.216s
| SGD | epoch: 011 | loss: 0.02704 - R2: 0.9999 -- iter: 1020/1168
Training Step: 642  | total loss: [1m[32m0.02543[0m[0m | time: 0.227s
| SGD | epoch: 011 | loss: 0.02543 - R2: 1.0007 -- iter: 1040/1168
Training Step: 643  | total loss: [1m[32m0.02343[0m[0m | time: 0.237s
| SGD | epoch: 011 | loss: 0.02343 - R2: 1.0005 -- iter: 1060/1168
Training Step

Training Step: 695  | total loss: [1m[32m0.02057[0m[0m | time: 0.206s
| SGD | epoch: 012 | loss: 0.02057 - R2: 1.0005 -- iter: 0920/1168
Training Step: 696  | total loss: [1m[32m0.02057[0m[0m | time: 0.209s
| SGD | epoch: 012 | loss: 0.02057 - R2: 1.0005 -- iter: 0940/1168
Training Step: 697  | total loss: [1m[32m0.02204[0m[0m | time: 0.214s
| SGD | epoch: 012 | loss: 0.02204 - R2: 1.0013 -- iter: 0960/1168
Training Step: 698  | total loss: [1m[32m0.02204[0m[0m | time: 0.224s
| SGD | epoch: 012 | loss: 0.02204 - R2: 1.0013 -- iter: 0980/1168
Training Step: 699  | total loss: [1m[32m0.01988[0m[0m | time: 0.228s
| SGD | epoch: 012 | loss: 0.01988 - R2: 1.0008 -- iter: 1000/1168
Training Step: 700  | total loss: [1m[32m0.01988[0m[0m | time: 0.231s
| SGD | epoch: 012 | loss: 0.01988 - R2: 1.0008 -- iter: 1020/1168
Training Step: 701  | total loss: [1m[32m0.01970[0m[0m | time: 0.233s
| SGD | epoch: 012 | loss: 0.01970 - R2: 0.9998 -- iter: 1040/1168
Training Step

Training Step: 753  | total loss: [1m[32m0.02699[0m[0m | time: 0.233s
| SGD | epoch: 013 | loss: 0.02699 - R2: 1.0002 -- iter: 0900/1168
Training Step: 754  | total loss: [1m[32m0.02567[0m[0m | time: 0.235s
| SGD | epoch: 013 | loss: 0.02567 - R2: 0.9990 -- iter: 0920/1168
Training Step: 755  | total loss: [1m[32m0.02571[0m[0m | time: 0.237s
| SGD | epoch: 013 | loss: 0.02571 - R2: 1.0002 -- iter: 0940/1168
Training Step: 756  | total loss: [1m[32m0.02527[0m[0m | time: 0.239s
| SGD | epoch: 013 | loss: 0.02527 - R2: 0.9998 -- iter: 0960/1168
Training Step: 757  | total loss: [1m[32m0.02441[0m[0m | time: 0.242s
| SGD | epoch: 013 | loss: 0.02441 - R2: 1.0001 -- iter: 0980/1168
Training Step: 758  | total loss: [1m[32m0.02493[0m[0m | time: 0.244s
| SGD | epoch: 013 | loss: 0.02493 - R2: 1.0003 -- iter: 1000/1168
Training Step: 759  | total loss: [1m[32m0.02490[0m[0m | time: 0.248s
| SGD | epoch: 013 | loss: 0.02490 - R2: 0.9989 -- iter: 1020/1168
Training Step

Training Step: 811  | total loss: [1m[32m0.01501[0m[0m | time: 0.291s
| SGD | epoch: 014 | loss: 0.01501 - R2: 0.9992 -- iter: 0880/1168
Training Step: 812  | total loss: [1m[32m0.01578[0m[0m | time: 0.300s
| SGD | epoch: 014 | loss: 0.01578 - R2: 1.0005 -- iter: 0900/1168
Training Step: 813  | total loss: [1m[32m0.01636[0m[0m | time: 0.302s
| SGD | epoch: 014 | loss: 0.01636 - R2: 1.0002 -- iter: 0920/1168
Training Step: 814  | total loss: [1m[32m0.01580[0m[0m | time: 0.305s
| SGD | epoch: 014 | loss: 0.01580 - R2: 0.9996 -- iter: 0940/1168
Training Step: 815  | total loss: [1m[32m0.01548[0m[0m | time: 0.307s
| SGD | epoch: 014 | loss: 0.01548 - R2: 0.9997 -- iter: 0960/1168
Training Step: 816  | total loss: [1m[32m0.01515[0m[0m | time: 0.308s
| SGD | epoch: 014 | loss: 0.01515 - R2: 0.9990 -- iter: 0980/1168
Training Step: 817  | total loss: [1m[32m0.01427[0m[0m | time: 0.312s
| SGD | epoch: 014 | loss: 0.01427 - R2: 0.9990 -- iter: 1000/1168
Training Step

Training Step: 869  | total loss: [1m[32m0.02066[0m[0m | time: 0.180s
| SGD | epoch: 015 | loss: 0.02066 - R2: 0.9984 -- iter: 0860/1168
Training Step: 870  | total loss: [1m[32m0.02066[0m[0m | time: 0.185s
| SGD | epoch: 015 | loss: 0.02066 - R2: 0.9984 -- iter: 0880/1168
Training Step: 871  | total loss: [1m[32m0.01942[0m[0m | time: 0.188s
| SGD | epoch: 015 | loss: 0.01942 - R2: 0.9982 -- iter: 0900/1168
Training Step: 872  | total loss: [1m[32m0.01873[0m[0m | time: 0.193s
| SGD | epoch: 015 | loss: 0.01873 - R2: 0.9986 -- iter: 0920/1168
Training Step: 873  | total loss: [1m[32m0.01935[0m[0m | time: 0.197s
| SGD | epoch: 015 | loss: 0.01935 - R2: 0.9981 -- iter: 0940/1168
Training Step: 874  | total loss: [1m[32m0.01882[0m[0m | time: 0.201s
| SGD | epoch: 015 | loss: 0.01882 - R2: 0.9986 -- iter: 0960/1168
Training Step: 875  | total loss: [1m[32m0.01954[0m[0m | time: 0.207s
| SGD | epoch: 015 | loss: 0.01954 - R2: 0.9989 -- iter: 0980/1168
Training Step

Training Step: 927  | total loss: [1m[32m0.01809[0m[0m | time: 0.231s
| SGD | epoch: 016 | loss: 0.01809 - R2: 1.0001 -- iter: 0840/1168
Training Step: 928  | total loss: [1m[32m0.01653[0m[0m | time: 0.233s
| SGD | epoch: 016 | loss: 0.01653 - R2: 0.9997 -- iter: 0860/1168
Training Step: 929  | total loss: [1m[32m0.01653[0m[0m | time: 0.236s
| SGD | epoch: 016 | loss: 0.01653 - R2: 0.9997 -- iter: 0880/1168
Training Step: 930  | total loss: [1m[32m0.01519[0m[0m | time: 0.239s
| SGD | epoch: 016 | loss: 0.01519 - R2: 1.0001 -- iter: 0900/1168
Training Step: 931  | total loss: [1m[32m0.01534[0m[0m | time: 0.243s
| SGD | epoch: 016 | loss: 0.01534 - R2: 0.9996 -- iter: 0920/1168
Training Step: 932  | total loss: [1m[32m0.01534[0m[0m | time: 0.249s
| SGD | epoch: 016 | loss: 0.01534 - R2: 0.9996 -- iter: 0940/1168
Training Step: 933  | total loss: [1m[32m0.01481[0m[0m | time: 0.251s
| SGD | epoch: 016 | loss: 0.01481 - R2: 0.9999 -- iter: 0960/1168
Training Step

Training Step: 985  | total loss: [1m[32m0.02159[0m[0m | time: 0.183s
| SGD | epoch: 017 | loss: 0.02159 - R2: 0.9976 -- iter: 0820/1168
Training Step: 986  | total loss: [1m[32m0.02036[0m[0m | time: 0.190s
| SGD | epoch: 017 | loss: 0.02036 - R2: 0.9983 -- iter: 0840/1168
Training Step: 987  | total loss: [1m[32m0.01988[0m[0m | time: 0.193s
| SGD | epoch: 017 | loss: 0.01988 - R2: 0.9991 -- iter: 0860/1168
Training Step: 988  | total loss: [1m[32m0.01944[0m[0m | time: 0.196s
| SGD | epoch: 017 | loss: 0.01944 - R2: 0.9981 -- iter: 0880/1168
Training Step: 989  | total loss: [1m[32m0.01865[0m[0m | time: 0.200s
| SGD | epoch: 017 | loss: 0.01865 - R2: 0.9982 -- iter: 0900/1168
Training Step: 990  | total loss: [1m[32m0.01799[0m[0m | time: 0.206s
| SGD | epoch: 017 | loss: 0.01799 - R2: 0.9986 -- iter: 0920/1168
Training Step: 991  | total loss: [1m[32m0.01799[0m[0m | time: 0.216s
| SGD | epoch: 017 | loss: 0.01799 - R2: 0.9986 -- iter: 0940/1168
Training Step

Training Step: 1043  | total loss: [1m[32m0.01703[0m[0m | time: 0.230s
| SGD | epoch: 018 | loss: 0.01703 - R2: 0.9997 -- iter: 0800/1168
Training Step: 1044  | total loss: [1m[32m0.01605[0m[0m | time: 0.239s
| SGD | epoch: 018 | loss: 0.01605 - R2: 0.9995 -- iter: 0820/1168
Training Step: 1045  | total loss: [1m[32m0.01537[0m[0m | time: 0.245s
| SGD | epoch: 018 | loss: 0.01537 - R2: 0.9994 -- iter: 0840/1168
Training Step: 1046  | total loss: [1m[32m0.01979[0m[0m | time: 0.249s
| SGD | epoch: 018 | loss: 0.01979 - R2: 1.0001 -- iter: 0860/1168
Training Step: 1047  | total loss: [1m[32m0.01882[0m[0m | time: 0.253s
| SGD | epoch: 018 | loss: 0.01882 - R2: 1.0002 -- iter: 0880/1168
Training Step: 1048  | total loss: [1m[32m0.01840[0m[0m | time: 0.264s
| SGD | epoch: 018 | loss: 0.01840 - R2: 1.0000 -- iter: 0900/1168
Training Step: 1049  | total loss: [1m[32m0.01781[0m[0m | time: 0.270s
| SGD | epoch: 018 | loss: 0.01781 - R2: 0.9995 -- iter: 0920/1168
Traini

Training Step: 1101  | total loss: [1m[32m0.02312[0m[0m | time: 0.160s
| SGD | epoch: 019 | loss: 0.02312 - R2: 1.0010 -- iter: 0780/1168
Training Step: 1102  | total loss: [1m[32m0.02817[0m[0m | time: 0.162s
| SGD | epoch: 019 | loss: 0.02817 - R2: 1.0017 -- iter: 0800/1168
Training Step: 1103  | total loss: [1m[32m0.02817[0m[0m | time: 0.164s
| SGD | epoch: 019 | loss: 0.02817 - R2: 1.0017 -- iter: 0820/1168
Training Step: 1104  | total loss: [1m[32m0.02605[0m[0m | time: 0.169s
| SGD | epoch: 019 | loss: 0.02605 - R2: 1.0015 -- iter: 0840/1168
Training Step: 1105  | total loss: [1m[32m0.02605[0m[0m | time: 0.175s
| SGD | epoch: 019 | loss: 0.02605 - R2: 1.0015 -- iter: 0860/1168
Training Step: 1106  | total loss: [1m[32m0.02390[0m[0m | time: 0.178s
| SGD | epoch: 019 | loss: 0.02390 - R2: 1.0015 -- iter: 0880/1168
Training Step: 1107  | total loss: [1m[32m0.02245[0m[0m | time: 0.181s
| SGD | epoch: 019 | loss: 0.02245 - R2: 1.0013 -- iter: 0900/1168
Traini

Training Step: 1159  | total loss: [1m[32m0.01992[0m[0m | time: 0.292s
| SGD | epoch: 020 | loss: 0.01992 - R2: 1.0011 -- iter: 0760/1168
Training Step: 1160  | total loss: [1m[32m0.02029[0m[0m | time: 0.296s
| SGD | epoch: 020 | loss: 0.02029 - R2: 0.9999 -- iter: 0780/1168
Training Step: 1161  | total loss: [1m[32m0.01935[0m[0m | time: 0.298s
| SGD | epoch: 020 | loss: 0.01935 - R2: 1.0000 -- iter: 0800/1168
Training Step: 1162  | total loss: [1m[32m0.01852[0m[0m | time: 0.303s
| SGD | epoch: 020 | loss: 0.01852 - R2: 0.9996 -- iter: 0820/1168
Training Step: 1163  | total loss: [1m[32m0.01817[0m[0m | time: 0.305s
| SGD | epoch: 020 | loss: 0.01817 - R2: 1.0004 -- iter: 0840/1168
Training Step: 1164  | total loss: [1m[32m0.01735[0m[0m | time: 0.308s
| SGD | epoch: 020 | loss: 0.01735 - R2: 1.0002 -- iter: 0860/1168
Training Step: 1165  | total loss: [1m[32m0.01720[0m[0m | time: 0.310s
| SGD | epoch: 020 | loss: 0.01720 - R2: 1.0002 -- iter: 0880/1168
Traini

Training Step: 1217  | total loss: [1m[32m0.01780[0m[0m | time: 0.196s
| SGD | epoch: 021 | loss: 0.01780 - R2: 0.9988 -- iter: 0740/1168
Training Step: 1218  | total loss: [1m[32m0.01797[0m[0m | time: 0.198s
| SGD | epoch: 021 | loss: 0.01797 - R2: 0.9994 -- iter: 0760/1168
Training Step: 1219  | total loss: [1m[32m0.01758[0m[0m | time: 0.201s
| SGD | epoch: 021 | loss: 0.01758 - R2: 0.9987 -- iter: 0780/1168
Training Step: 1220  | total loss: [1m[32m0.01719[0m[0m | time: 0.203s
| SGD | epoch: 021 | loss: 0.01719 - R2: 0.9997 -- iter: 0800/1168
Training Step: 1221  | total loss: [1m[32m0.01719[0m[0m | time: 0.205s
| SGD | epoch: 021 | loss: 0.01719 - R2: 0.9997 -- iter: 0820/1168
Training Step: 1222  | total loss: [1m[32m0.01733[0m[0m | time: 0.208s
| SGD | epoch: 021 | loss: 0.01733 - R2: 0.9996 -- iter: 0840/1168
Training Step: 1223  | total loss: [1m[32m0.02274[0m[0m | time: 0.210s
| SGD | epoch: 021 | loss: 0.02274 - R2: 0.9995 -- iter: 0860/1168
Traini

Training Step: 1275  | total loss: [1m[32m0.01833[0m[0m | time: 0.230s
| SGD | epoch: 022 | loss: 0.01833 - R2: 1.0017 -- iter: 0720/1168
Training Step: 1276  | total loss: [1m[32m0.01833[0m[0m | time: 0.232s
| SGD | epoch: 022 | loss: 0.01833 - R2: 1.0017 -- iter: 0740/1168
Training Step: 1277  | total loss: [1m[32m0.01823[0m[0m | time: 0.233s
| SGD | epoch: 022 | loss: 0.01823 - R2: 1.0010 -- iter: 0760/1168
Training Step: 1278  | total loss: [1m[32m0.01903[0m[0m | time: 0.236s
| SGD | epoch: 022 | loss: 0.01903 - R2: 1.0011 -- iter: 0780/1168
Training Step: 1279  | total loss: [1m[32m0.01870[0m[0m | time: 0.238s
| SGD | epoch: 022 | loss: 0.01870 - R2: 0.9999 -- iter: 0800/1168
Training Step: 1280  | total loss: [1m[32m0.01822[0m[0m | time: 0.254s
| SGD | epoch: 022 | loss: 0.01822 - R2: 0.9999 -- iter: 0820/1168
Training Step: 1281  | total loss: [1m[32m0.01812[0m[0m | time: 0.256s
| SGD | epoch: 022 | loss: 0.01812 - R2: 0.9997 -- iter: 0840/1168
Traini

Training Step: 1333  | total loss: [1m[32m0.01980[0m[0m | time: 0.223s
| SGD | epoch: 023 | loss: 0.01980 - R2: 0.9998 -- iter: 0700/1168
Training Step: 1334  | total loss: [1m[32m0.01971[0m[0m | time: 0.227s
| SGD | epoch: 023 | loss: 0.01971 - R2: 1.0001 -- iter: 0720/1168
Training Step: 1335  | total loss: [1m[32m0.02119[0m[0m | time: 0.229s
| SGD | epoch: 023 | loss: 0.02119 - R2: 0.9991 -- iter: 0740/1168
Training Step: 1336  | total loss: [1m[32m0.02158[0m[0m | time: 0.232s
| SGD | epoch: 023 | loss: 0.02158 - R2: 0.9997 -- iter: 0760/1168
Training Step: 1337  | total loss: [1m[32m0.02065[0m[0m | time: 0.236s
| SGD | epoch: 023 | loss: 0.02065 - R2: 0.9999 -- iter: 0780/1168
Training Step: 1338  | total loss: [1m[32m0.01975[0m[0m | time: 0.238s
| SGD | epoch: 023 | loss: 0.01975 - R2: 0.9998 -- iter: 0800/1168
Training Step: 1339  | total loss: [1m[32m0.01765[0m[0m | time: 0.241s
| SGD | epoch: 023 | loss: 0.01765 - R2: 0.9991 -- iter: 0820/1168
Traini

Training Step: 1391  | total loss: [1m[32m0.01714[0m[0m | time: 0.139s
| SGD | epoch: 024 | loss: 0.01714 - R2: 1.0013 -- iter: 0680/1168
Training Step: 1392  | total loss: [1m[32m0.01703[0m[0m | time: 0.143s
| SGD | epoch: 024 | loss: 0.01703 - R2: 1.0008 -- iter: 0700/1168
Training Step: 1393  | total loss: [1m[32m0.01674[0m[0m | time: 0.145s
| SGD | epoch: 024 | loss: 0.01674 - R2: 1.0003 -- iter: 0720/1168
Training Step: 1394  | total loss: [1m[32m0.02284[0m[0m | time: 0.147s
| SGD | epoch: 024 | loss: 0.02284 - R2: 1.0013 -- iter: 0740/1168
Training Step: 1395  | total loss: [1m[32m0.02190[0m[0m | time: 0.149s
| SGD | epoch: 024 | loss: 0.02190 - R2: 1.0013 -- iter: 0760/1168
Training Step: 1396  | total loss: [1m[32m0.02054[0m[0m | time: 0.151s
| SGD | epoch: 024 | loss: 0.02054 - R2: 1.0013 -- iter: 0780/1168
Training Step: 1397  | total loss: [1m[32m0.02069[0m[0m | time: 0.153s
| SGD | epoch: 024 | loss: 0.02069 - R2: 1.0013 -- iter: 0800/1168
Traini

Training Step: 1449  | total loss: [1m[32m0.01878[0m[0m | time: 0.187s
| SGD | epoch: 025 | loss: 0.01878 - R2: 1.0004 -- iter: 0660/1168
Training Step: 1450  | total loss: [1m[32m0.01861[0m[0m | time: 0.188s
| SGD | epoch: 025 | loss: 0.01861 - R2: 0.9994 -- iter: 0680/1168
Training Step: 1451  | total loss: [1m[32m0.01820[0m[0m | time: 0.199s
| SGD | epoch: 025 | loss: 0.01820 - R2: 0.9996 -- iter: 0700/1168
Training Step: 1452  | total loss: [1m[32m0.01704[0m[0m | time: 0.202s
| SGD | epoch: 025 | loss: 0.01704 - R2: 1.0001 -- iter: 0720/1168
Training Step: 1453  | total loss: [1m[32m0.01704[0m[0m | time: 0.205s
| SGD | epoch: 025 | loss: 0.01704 - R2: 1.0001 -- iter: 0740/1168
Training Step: 1454  | total loss: [1m[32m0.01672[0m[0m | time: 0.208s
| SGD | epoch: 025 | loss: 0.01672 - R2: 1.0007 -- iter: 0760/1168
Training Step: 1455  | total loss: [1m[32m0.01566[0m[0m | time: 0.211s
| SGD | epoch: 025 | loss: 0.01566 - R2: 1.0003 -- iter: 0780/1168
Traini

Training Step: 1507  | total loss: [1m[32m0.01687[0m[0m | time: 0.192s
| SGD | epoch: 026 | loss: 0.01687 - R2: 0.9988 -- iter: 0640/1168
Training Step: 1508  | total loss: [1m[32m0.01687[0m[0m | time: 0.195s
| SGD | epoch: 026 | loss: 0.01687 - R2: 0.9994 -- iter: 0660/1168
Training Step: 1509  | total loss: [1m[32m0.01728[0m[0m | time: 0.197s
| SGD | epoch: 026 | loss: 0.01728 - R2: 1.0003 -- iter: 0680/1168
Training Step: 1510  | total loss: [1m[32m0.01692[0m[0m | time: 0.199s
| SGD | epoch: 026 | loss: 0.01692 - R2: 1.0002 -- iter: 0700/1168
Training Step: 1511  | total loss: [1m[32m0.01743[0m[0m | time: 0.202s
| SGD | epoch: 026 | loss: 0.01743 - R2: 1.0000 -- iter: 0720/1168
Training Step: 1512  | total loss: [1m[32m0.01743[0m[0m | time: 0.203s
| SGD | epoch: 026 | loss: 0.01743 - R2: 1.0000 -- iter: 0740/1168
Training Step: 1513  | total loss: [1m[32m0.01680[0m[0m | time: 0.205s
| SGD | epoch: 026 | loss: 0.01680 - R2: 1.0007 -- iter: 0760/1168
Traini

Training Step: 1565  | total loss: [1m[32m0.01793[0m[0m | time: 0.181s
| SGD | epoch: 027 | loss: 0.01793 - R2: 0.9990 -- iter: 0620/1168
Training Step: 1566  | total loss: [1m[32m0.01793[0m[0m | time: 0.183s
| SGD | epoch: 027 | loss: 0.01793 - R2: 0.9987 -- iter: 0640/1168
Training Step: 1567  | total loss: [1m[32m0.01637[0m[0m | time: 0.200s
| SGD | epoch: 027 | loss: 0.01637 - R2: 0.9986 -- iter: 0660/1168
Training Step: 1568  | total loss: [1m[32m0.01637[0m[0m | time: 0.202s
| SGD | epoch: 027 | loss: 0.01637 - R2: 0.9986 -- iter: 0680/1168
Training Step: 1569  | total loss: [1m[32m0.01624[0m[0m | time: 0.204s
| SGD | epoch: 027 | loss: 0.01624 - R2: 0.9986 -- iter: 0700/1168
Training Step: 1570  | total loss: [1m[32m0.01679[0m[0m | time: 0.206s
| SGD | epoch: 027 | loss: 0.01679 - R2: 0.9986 -- iter: 0720/1168
Training Step: 1571  | total loss: [1m[32m0.01923[0m[0m | time: 0.210s
| SGD | epoch: 027 | loss: 0.01923 - R2: 0.9988 -- iter: 0740/1168
Traini

Training Step: 1623  | total loss: [1m[32m0.02605[0m[0m | time: 0.166s
| SGD | epoch: 028 | loss: 0.02605 - R2: 1.0023 -- iter: 0600/1168
Training Step: 1624  | total loss: [1m[32m0.02437[0m[0m | time: 0.168s
| SGD | epoch: 028 | loss: 0.02437 - R2: 1.0026 -- iter: 0620/1168
Training Step: 1625  | total loss: [1m[32m0.02580[0m[0m | time: 0.170s
| SGD | epoch: 028 | loss: 0.02580 - R2: 1.0025 -- iter: 0640/1168
Training Step: 1626  | total loss: [1m[32m0.02569[0m[0m | time: 0.174s
| SGD | epoch: 028 | loss: 0.02569 - R2: 1.0025 -- iter: 0660/1168
Training Step: 1627  | total loss: [1m[32m0.02442[0m[0m | time: 0.176s
| SGD | epoch: 028 | loss: 0.02442 - R2: 1.0016 -- iter: 0680/1168
Training Step: 1628  | total loss: [1m[32m0.02325[0m[0m | time: 0.178s
| SGD | epoch: 028 | loss: 0.02325 - R2: 1.0016 -- iter: 0700/1168
Training Step: 1629  | total loss: [1m[32m0.02206[0m[0m | time: 0.180s
| SGD | epoch: 028 | loss: 0.02206 - R2: 1.0006 -- iter: 0720/1168
Traini

Training Step: 1681  | total loss: [1m[32m0.01802[0m[0m | time: 0.230s
| SGD | epoch: 029 | loss: 0.01802 - R2: 1.0001 -- iter: 0580/1168
Training Step: 1682  | total loss: [1m[32m0.01767[0m[0m | time: 0.234s
| SGD | epoch: 029 | loss: 0.01767 - R2: 1.0000 -- iter: 0600/1168
Training Step: 1683  | total loss: [1m[32m0.01767[0m[0m | time: 0.237s
| SGD | epoch: 029 | loss: 0.01767 - R2: 1.0000 -- iter: 0620/1168
Training Step: 1684  | total loss: [1m[32m0.01755[0m[0m | time: 0.239s
| SGD | epoch: 029 | loss: 0.01755 - R2: 0.9994 -- iter: 0640/1168
Training Step: 1685  | total loss: [1m[32m0.01699[0m[0m | time: 0.245s
| SGD | epoch: 029 | loss: 0.01699 - R2: 0.9996 -- iter: 0660/1168
Training Step: 1686  | total loss: [1m[32m0.01610[0m[0m | time: 0.250s
| SGD | epoch: 029 | loss: 0.01610 - R2: 1.0001 -- iter: 0680/1168
Training Step: 1687  | total loss: [1m[32m0.01610[0m[0m | time: 0.258s
| SGD | epoch: 029 | loss: 0.01610 - R2: 1.0001 -- iter: 0700/1168
Traini

Training Step: 1739  | total loss: [1m[32m0.01920[0m[0m | time: 0.154s
| SGD | epoch: 030 | loss: 0.01920 - R2: 1.0003 -- iter: 0560/1168
Training Step: 1740  | total loss: [1m[32m0.02004[0m[0m | time: 0.156s
| SGD | epoch: 030 | loss: 0.02004 - R2: 0.9991 -- iter: 0580/1168
Training Step: 1741  | total loss: [1m[32m0.01963[0m[0m | time: 0.159s
| SGD | epoch: 030 | loss: 0.01963 - R2: 0.9984 -- iter: 0600/1168
Training Step: 1742  | total loss: [1m[32m0.01963[0m[0m | time: 0.161s
| SGD | epoch: 030 | loss: 0.01963 - R2: 0.9984 -- iter: 0620/1168
Training Step: 1743  | total loss: [1m[32m0.02014[0m[0m | time: 0.163s
| SGD | epoch: 030 | loss: 0.02014 - R2: 0.9993 -- iter: 0640/1168
Training Step: 1744  | total loss: [1m[32m0.02069[0m[0m | time: 0.168s
| SGD | epoch: 030 | loss: 0.02069 - R2: 1.0002 -- iter: 0660/1168
Training Step: 1745  | total loss: [1m[32m0.01899[0m[0m | time: 0.170s
| SGD | epoch: 030 | loss: 0.01899 - R2: 1.0002 -- iter: 0680/1168
Traini

Training Step: 1797  | total loss: [1m[32m0.01556[0m[0m | time: 0.100s
| SGD | epoch: 031 | loss: 0.01556 - R2: 0.9994 -- iter: 0540/1168
Training Step: 1798  | total loss: [1m[32m0.01614[0m[0m | time: 0.109s
| SGD | epoch: 031 | loss: 0.01614 - R2: 0.9991 -- iter: 0560/1168
Training Step: 1799  | total loss: [1m[32m0.01736[0m[0m | time: 0.113s
| SGD | epoch: 031 | loss: 0.01736 - R2: 1.0002 -- iter: 0580/1168
Training Step: 1800  | total loss: [1m[32m0.01736[0m[0m | time: 0.115s
| SGD | epoch: 031 | loss: 0.01736 - R2: 1.0002 -- iter: 0600/1168
Training Step: 1801  | total loss: [1m[32m0.01823[0m[0m | time: 0.117s
| SGD | epoch: 031 | loss: 0.01823 - R2: 1.0010 -- iter: 0620/1168
Training Step: 1802  | total loss: [1m[32m0.02440[0m[0m | time: 0.119s
| SGD | epoch: 031 | loss: 0.02440 - R2: 1.0015 -- iter: 0640/1168
Training Step: 1803  | total loss: [1m[32m0.02479[0m[0m | time: 0.121s
| SGD | epoch: 031 | loss: 0.02479 - R2: 1.0015 -- iter: 0660/1168
Traini

Training Step: 1855  | total loss: [1m[32m0.02020[0m[0m | time: 0.124s
| SGD | epoch: 032 | loss: 0.02020 - R2: 0.9991 -- iter: 0520/1168
Training Step: 1856  | total loss: [1m[32m0.01909[0m[0m | time: 0.126s
| SGD | epoch: 032 | loss: 0.01909 - R2: 0.9991 -- iter: 0540/1168
Training Step: 1857  | total loss: [1m[32m0.01799[0m[0m | time: 0.128s
| SGD | epoch: 032 | loss: 0.01799 - R2: 0.9987 -- iter: 0560/1168
Training Step: 1858  | total loss: [1m[32m0.01733[0m[0m | time: 0.131s
| SGD | epoch: 032 | loss: 0.01733 - R2: 0.9987 -- iter: 0580/1168
Training Step: 1859  | total loss: [1m[32m0.01654[0m[0m | time: 0.133s
| SGD | epoch: 032 | loss: 0.01654 - R2: 0.9978 -- iter: 0600/1168
Training Step: 1860  | total loss: [1m[32m0.01531[0m[0m | time: 0.143s
| SGD | epoch: 032 | loss: 0.01531 - R2: 0.9978 -- iter: 0620/1168
Training Step: 1861  | total loss: [1m[32m0.01415[0m[0m | time: 0.145s
| SGD | epoch: 032 | loss: 0.01415 - R2: 0.9973 -- iter: 0640/1168
Traini

Training Step: 1913  | total loss: [1m[32m0.01605[0m[0m | time: 0.093s
| SGD | epoch: 033 | loss: 0.01605 - R2: 1.0008 -- iter: 0500/1168
Training Step: 1914  | total loss: [1m[32m0.01605[0m[0m | time: 0.095s
| SGD | epoch: 033 | loss: 0.01605 - R2: 1.0008 -- iter: 0520/1168
Training Step: 1915  | total loss: [1m[32m0.01576[0m[0m | time: 0.097s
| SGD | epoch: 033 | loss: 0.01576 - R2: 0.9994 -- iter: 0540/1168
Training Step: 1916  | total loss: [1m[32m0.01525[0m[0m | time: 0.099s
| SGD | epoch: 033 | loss: 0.01525 - R2: 0.9991 -- iter: 0560/1168
Training Step: 1917  | total loss: [1m[32m0.01451[0m[0m | time: 0.101s
| SGD | epoch: 033 | loss: 0.01451 - R2: 0.9991 -- iter: 0580/1168
Training Step: 1918  | total loss: [1m[32m0.01602[0m[0m | time: 0.102s
| SGD | epoch: 033 | loss: 0.01602 - R2: 0.9991 -- iter: 0600/1168
Training Step: 1919  | total loss: [1m[32m0.01534[0m[0m | time: 0.104s
| SGD | epoch: 033 | loss: 0.01534 - R2: 0.9996 -- iter: 0620/1168
Traini

Training Step: 1971  | total loss: [1m[32m0.01788[0m[0m | time: 0.110s
| SGD | epoch: 034 | loss: 0.01788 - R2: 0.9990 -- iter: 0480/1168
Training Step: 1972  | total loss: [1m[32m0.01793[0m[0m | time: 0.114s
| SGD | epoch: 034 | loss: 0.01793 - R2: 0.9987 -- iter: 0500/1168
Training Step: 1973  | total loss: [1m[32m0.02076[0m[0m | time: 0.119s
| SGD | epoch: 034 | loss: 0.02076 - R2: 0.9998 -- iter: 0520/1168
Training Step: 1974  | total loss: [1m[32m0.02001[0m[0m | time: 0.123s
| SGD | epoch: 034 | loss: 0.02001 - R2: 0.9993 -- iter: 0540/1168
Training Step: 1975  | total loss: [1m[32m0.02001[0m[0m | time: 0.125s
| SGD | epoch: 034 | loss: 0.02001 - R2: 0.9993 -- iter: 0560/1168
Training Step: 1976  | total loss: [1m[32m0.01921[0m[0m | time: 0.129s
| SGD | epoch: 034 | loss: 0.01921 - R2: 0.9992 -- iter: 0580/1168
Training Step: 1977  | total loss: [1m[32m0.01921[0m[0m | time: 0.132s
| SGD | epoch: 034 | loss: 0.01921 - R2: 0.9992 -- iter: 0600/1168
Traini

Training Step: 2029  | total loss: [1m[32m0.01663[0m[0m | time: 0.067s
| SGD | epoch: 035 | loss: 0.01663 - R2: 0.9982 -- iter: 0460/1168
Training Step: 2030  | total loss: [1m[32m0.01669[0m[0m | time: 0.069s
| SGD | epoch: 035 | loss: 0.01669 - R2: 0.9980 -- iter: 0480/1168
Training Step: 2031  | total loss: [1m[32m0.01602[0m[0m | time: 0.072s
| SGD | epoch: 035 | loss: 0.01602 - R2: 0.9984 -- iter: 0500/1168
Training Step: 2032  | total loss: [1m[32m0.01602[0m[0m | time: 0.074s
| SGD | epoch: 035 | loss: 0.01602 - R2: 0.9988 -- iter: 0520/1168
Training Step: 2033  | total loss: [1m[32m0.01530[0m[0m | time: 0.076s
| SGD | epoch: 035 | loss: 0.01530 - R2: 0.9992 -- iter: 0540/1168
Training Step: 2034  | total loss: [1m[32m0.01526[0m[0m | time: 0.078s
| SGD | epoch: 035 | loss: 0.01526 - R2: 0.9992 -- iter: 0560/1168
Training Step: 2035  | total loss: [1m[32m0.01453[0m[0m | time: 0.080s
| SGD | epoch: 035 | loss: 0.01453 - R2: 0.9995 -- iter: 0580/1168
Traini

Training Step: 2087  | total loss: [1m[32m0.01993[0m[0m | time: 0.102s
| SGD | epoch: 036 | loss: 0.01993 - R2: 1.0001 -- iter: 0440/1168
Training Step: 2088  | total loss: [1m[32m0.01907[0m[0m | time: 0.106s
| SGD | epoch: 036 | loss: 0.01907 - R2: 0.9999 -- iter: 0460/1168
Training Step: 2089  | total loss: [1m[32m0.01898[0m[0m | time: 0.110s
| SGD | epoch: 036 | loss: 0.01898 - R2: 1.0010 -- iter: 0480/1168
Training Step: 2090  | total loss: [1m[32m0.01898[0m[0m | time: 0.112s
| SGD | epoch: 036 | loss: 0.01898 - R2: 1.0004 -- iter: 0500/1168
Training Step: 2091  | total loss: [1m[32m0.01853[0m[0m | time: 0.116s
| SGD | epoch: 036 | loss: 0.01853 - R2: 1.0004 -- iter: 0520/1168
Training Step: 2092  | total loss: [1m[32m0.01780[0m[0m | time: 0.118s
| SGD | epoch: 036 | loss: 0.01780 - R2: 0.9998 -- iter: 0540/1168
Training Step: 2093  | total loss: [1m[32m0.02087[0m[0m | time: 0.120s
| SGD | epoch: 036 | loss: 0.02087 - R2: 0.9999 -- iter: 0560/1168
Traini

Training Step: 2145  | total loss: [1m[32m0.02623[0m[0m | time: 0.112s
| SGD | epoch: 037 | loss: 0.02623 - R2: 0.9987 -- iter: 0420/1168
Training Step: 2146  | total loss: [1m[32m0.02492[0m[0m | time: 0.114s
| SGD | epoch: 037 | loss: 0.02492 - R2: 0.9995 -- iter: 0440/1168
Training Step: 2147  | total loss: [1m[32m0.02506[0m[0m | time: 0.117s
| SGD | epoch: 037 | loss: 0.02506 - R2: 0.9995 -- iter: 0460/1168
Training Step: 2148  | total loss: [1m[32m0.02684[0m[0m | time: 0.119s
| SGD | epoch: 037 | loss: 0.02684 - R2: 0.9996 -- iter: 0480/1168
Training Step: 2149  | total loss: [1m[32m0.02476[0m[0m | time: 0.122s
| SGD | epoch: 037 | loss: 0.02476 - R2: 1.0007 -- iter: 0500/1168
Training Step: 2150  | total loss: [1m[32m0.02735[0m[0m | time: 0.124s
| SGD | epoch: 037 | loss: 0.02735 - R2: 1.0007 -- iter: 0520/1168
Training Step: 2151  | total loss: [1m[32m0.02635[0m[0m | time: 0.126s
| SGD | epoch: 037 | loss: 0.02635 - R2: 1.0010 -- iter: 0540/1168
Traini

Training Step: 2203  | total loss: [1m[32m0.01900[0m[0m | time: 0.079s
| SGD | epoch: 038 | loss: 0.01900 - R2: 0.9998 -- iter: 0400/1168
Training Step: 2204  | total loss: [1m[32m0.02211[0m[0m | time: 0.087s
| SGD | epoch: 038 | loss: 0.02211 - R2: 1.0002 -- iter: 0420/1168
Training Step: 2205  | total loss: [1m[32m0.02105[0m[0m | time: 0.090s
| SGD | epoch: 038 | loss: 0.02105 - R2: 1.0000 -- iter: 0440/1168
Training Step: 2206  | total loss: [1m[32m0.02105[0m[0m | time: 0.093s
| SGD | epoch: 038 | loss: 0.02105 - R2: 1.0000 -- iter: 0460/1168
Training Step: 2207  | total loss: [1m[32m0.01927[0m[0m | time: 0.096s
| SGD | epoch: 038 | loss: 0.01927 - R2: 0.9998 -- iter: 0480/1168
Training Step: 2208  | total loss: [1m[32m0.01927[0m[0m | time: 0.100s
| SGD | epoch: 038 | loss: 0.01927 - R2: 0.9998 -- iter: 0500/1168
Training Step: 2209  | total loss: [1m[32m0.01742[0m[0m | time: 0.103s
| SGD | epoch: 038 | loss: 0.01742 - R2: 0.9999 -- iter: 0520/1168
Traini

Training Step: 2261  | total loss: [1m[32m0.02074[0m[0m | time: 0.120s
| SGD | epoch: 039 | loss: 0.02074 - R2: 1.0013 -- iter: 0380/1168
Training Step: 2262  | total loss: [1m[32m0.02074[0m[0m | time: 0.123s
| SGD | epoch: 039 | loss: 0.02074 - R2: 1.0004 -- iter: 0400/1168
Training Step: 2263  | total loss: [1m[32m0.01978[0m[0m | time: 0.125s
| SGD | epoch: 039 | loss: 0.01978 - R2: 1.0004 -- iter: 0420/1168
Training Step: 2264  | total loss: [1m[32m0.01964[0m[0m | time: 0.128s
| SGD | epoch: 039 | loss: 0.01964 - R2: 0.9998 -- iter: 0440/1168
Training Step: 2265  | total loss: [1m[32m0.01964[0m[0m | time: 0.132s
| SGD | epoch: 039 | loss: 0.01964 - R2: 0.9998 -- iter: 0460/1168
Training Step: 2266  | total loss: [1m[32m0.01822[0m[0m | time: 0.137s
| SGD | epoch: 039 | loss: 0.01822 - R2: 0.9994 -- iter: 0480/1168
Training Step: 2267  | total loss: [1m[32m0.01822[0m[0m | time: 0.139s
| SGD | epoch: 039 | loss: 0.01822 - R2: 0.9994 -- iter: 0500/1168
Traini

Training Step: 2319  | total loss: [1m[32m0.01491[0m[0m | time: 0.137s
| SGD | epoch: 040 | loss: 0.01491 - R2: 0.9983 -- iter: 0360/1168
Training Step: 2320  | total loss: [1m[32m0.01503[0m[0m | time: 0.158s
| SGD | epoch: 040 | loss: 0.01503 - R2: 0.9990 -- iter: 0380/1168
Training Step: 2321  | total loss: [1m[32m0.01503[0m[0m | time: 0.160s
| SGD | epoch: 040 | loss: 0.01503 - R2: 0.9990 -- iter: 0400/1168
Training Step: 2322  | total loss: [1m[32m0.01899[0m[0m | time: 0.163s
| SGD | epoch: 040 | loss: 0.01899 - R2: 1.0000 -- iter: 0420/1168
Training Step: 2323  | total loss: [1m[32m0.01908[0m[0m | time: 0.165s
| SGD | epoch: 040 | loss: 0.01908 - R2: 1.0000 -- iter: 0440/1168
Training Step: 2324  | total loss: [1m[32m0.01923[0m[0m | time: 0.166s
| SGD | epoch: 040 | loss: 0.01923 - R2: 0.9991 -- iter: 0460/1168
Training Step: 2325  | total loss: [1m[32m0.02348[0m[0m | time: 0.168s
| SGD | epoch: 040 | loss: 0.02348 - R2: 0.9992 -- iter: 0480/1168
Traini

Training Step: 2377  | total loss: [1m[32m0.02176[0m[0m | time: 0.083s
| SGD | epoch: 041 | loss: 0.02176 - R2: 0.9979 -- iter: 0340/1168
Training Step: 2378  | total loss: [1m[32m0.02772[0m[0m | time: 0.086s
| SGD | epoch: 041 | loss: 0.02772 - R2: 0.9999 -- iter: 0360/1168
Training Step: 2379  | total loss: [1m[32m0.02772[0m[0m | time: 0.088s
| SGD | epoch: 041 | loss: 0.02772 - R2: 0.9999 -- iter: 0380/1168
Training Step: 2380  | total loss: [1m[32m0.02580[0m[0m | time: 0.091s
| SGD | epoch: 041 | loss: 0.02580 - R2: 1.0000 -- iter: 0400/1168
Training Step: 2381  | total loss: [1m[32m0.02471[0m[0m | time: 0.094s
| SGD | epoch: 041 | loss: 0.02471 - R2: 0.9997 -- iter: 0420/1168
Training Step: 2382  | total loss: [1m[32m0.02471[0m[0m | time: 0.096s
| SGD | epoch: 041 | loss: 0.02471 - R2: 0.9997 -- iter: 0440/1168
Training Step: 2383  | total loss: [1m[32m0.02609[0m[0m | time: 0.108s
| SGD | epoch: 041 | loss: 0.02609 - R2: 1.0010 -- iter: 0460/1168
Traini

Training Step: 2435  | total loss: [1m[32m0.03392[0m[0m | time: 0.103s
| SGD | epoch: 042 | loss: 0.03392 - R2: 1.0017 -- iter: 0320/1168
Training Step: 2436  | total loss: [1m[32m0.03067[0m[0m | time: 0.120s
| SGD | epoch: 042 | loss: 0.03067 - R2: 1.0020 -- iter: 0340/1168
Training Step: 2437  | total loss: [1m[32m0.03067[0m[0m | time: 0.122s
| SGD | epoch: 042 | loss: 0.03067 - R2: 1.0020 -- iter: 0360/1168
Training Step: 2438  | total loss: [1m[32m0.02787[0m[0m | time: 0.126s
| SGD | epoch: 042 | loss: 0.02787 - R2: 1.0011 -- iter: 0380/1168
Training Step: 2439  | total loss: [1m[32m0.02787[0m[0m | time: 0.139s
| SGD | epoch: 042 | loss: 0.02787 - R2: 1.0011 -- iter: 0400/1168
Training Step: 2440  | total loss: [1m[32m0.02485[0m[0m | time: 0.143s
| SGD | epoch: 042 | loss: 0.02485 - R2: 1.0001 -- iter: 0420/1168
Training Step: 2441  | total loss: [1m[32m0.02485[0m[0m | time: 0.151s
| SGD | epoch: 042 | loss: 0.02485 - R2: 1.0001 -- iter: 0440/1168
Traini

Training Step: 2493  | total loss: [1m[32m0.01514[0m[0m | time: 0.072s
| SGD | epoch: 043 | loss: 0.01514 - R2: 1.0005 -- iter: 0300/1168
Training Step: 2494  | total loss: [1m[32m0.01566[0m[0m | time: 0.076s
| SGD | epoch: 043 | loss: 0.01566 - R2: 1.0003 -- iter: 0320/1168
Training Step: 2495  | total loss: [1m[32m0.01566[0m[0m | time: 0.078s
| SGD | epoch: 043 | loss: 0.01566 - R2: 0.9999 -- iter: 0340/1168
Training Step: 2496  | total loss: [1m[32m0.01612[0m[0m | time: 0.081s
| SGD | epoch: 043 | loss: 0.01612 - R2: 1.0003 -- iter: 0360/1168
Training Step: 2497  | total loss: [1m[32m0.01840[0m[0m | time: 0.084s
| SGD | epoch: 043 | loss: 0.01840 - R2: 1.0005 -- iter: 0380/1168
Training Step: 2498  | total loss: [1m[32m0.01747[0m[0m | time: 0.087s
| SGD | epoch: 043 | loss: 0.01747 - R2: 1.0005 -- iter: 0400/1168
Training Step: 2499  | total loss: [1m[32m0.01676[0m[0m | time: 0.091s
| SGD | epoch: 043 | loss: 0.01676 - R2: 1.0007 -- iter: 0420/1168
Traini

Training Step: 2551  | total loss: [1m[32m0.01979[0m[0m | time: 0.114s
| SGD | epoch: 044 | loss: 0.01979 - R2: 0.9996 -- iter: 0280/1168
Training Step: 2552  | total loss: [1m[32m0.01979[0m[0m | time: 0.125s
| SGD | epoch: 044 | loss: 0.01979 - R2: 0.9997 -- iter: 0300/1168
Training Step: 2553  | total loss: [1m[32m0.01771[0m[0m | time: 0.127s
| SGD | epoch: 044 | loss: 0.01771 - R2: 0.9996 -- iter: 0320/1168
Training Step: 2554  | total loss: [1m[32m0.01771[0m[0m | time: 0.130s
| SGD | epoch: 044 | loss: 0.01771 - R2: 0.9996 -- iter: 0340/1168
Training Step: 2555  | total loss: [1m[32m0.01721[0m[0m | time: 0.132s
| SGD | epoch: 044 | loss: 0.01721 - R2: 0.9991 -- iter: 0360/1168
Training Step: 2556  | total loss: [1m[32m0.01716[0m[0m | time: 0.139s
| SGD | epoch: 044 | loss: 0.01716 - R2: 0.9992 -- iter: 0380/1168
Training Step: 2557  | total loss: [1m[32m0.01716[0m[0m | time: 0.149s
| SGD | epoch: 044 | loss: 0.01716 - R2: 0.9992 -- iter: 0400/1168
Traini

Training Step: 2609  | total loss: [1m[32m0.01865[0m[0m | time: 0.033s
| SGD | epoch: 045 | loss: 0.01865 - R2: 0.9993 -- iter: 0260/1168
Training Step: 2610  | total loss: [1m[32m0.02251[0m[0m | time: 0.035s
| SGD | epoch: 045 | loss: 0.02251 - R2: 1.0001 -- iter: 0280/1168
Training Step: 2611  | total loss: [1m[32m0.02243[0m[0m | time: 0.037s
| SGD | epoch: 045 | loss: 0.02243 - R2: 0.9996 -- iter: 0300/1168
Training Step: 2612  | total loss: [1m[32m0.02159[0m[0m | time: 0.039s
| SGD | epoch: 045 | loss: 0.02159 - R2: 0.9990 -- iter: 0320/1168
Training Step: 2613  | total loss: [1m[32m0.02071[0m[0m | time: 0.041s
| SGD | epoch: 045 | loss: 0.02071 - R2: 0.9995 -- iter: 0340/1168
Training Step: 2614  | total loss: [1m[32m0.02715[0m[0m | time: 0.043s
| SGD | epoch: 045 | loss: 0.02715 - R2: 0.9997 -- iter: 0360/1168
Training Step: 2615  | total loss: [1m[32m0.02715[0m[0m | time: 0.045s
| SGD | epoch: 045 | loss: 0.02715 - R2: 0.9997 -- iter: 0380/1168
Traini

Training Step: 2667  | total loss: [1m[32m0.01757[0m[0m | time: 0.107s
| SGD | epoch: 046 | loss: 0.01757 - R2: 1.0000 -- iter: 0240/1168
Training Step: 2668  | total loss: [1m[32m0.04863[0m[0m | time: 0.108s
| SGD | epoch: 046 | loss: 0.04863 - R2: 0.9967 -- iter: 0260/1168
Training Step: 2669  | total loss: [1m[32m0.04498[0m[0m | time: 0.113s
| SGD | epoch: 046 | loss: 0.04498 - R2: 0.9966 -- iter: 0280/1168
Training Step: 2670  | total loss: [1m[32m0.03882[0m[0m | time: 0.115s
| SGD | epoch: 046 | loss: 0.03882 - R2: 0.9970 -- iter: 0300/1168
Training Step: 2671  | total loss: [1m[32m0.03576[0m[0m | time: 0.120s
| SGD | epoch: 046 | loss: 0.03576 - R2: 0.9968 -- iter: 0320/1168
Training Step: 2672  | total loss: [1m[32m0.03576[0m[0m | time: 0.122s
| SGD | epoch: 046 | loss: 0.03576 - R2: 0.9968 -- iter: 0340/1168
Training Step: 2673  | total loss: [1m[32m0.03297[0m[0m | time: 0.124s
| SGD | epoch: 046 | loss: 0.03297 - R2: 0.9976 -- iter: 0360/1168
Traini

Training Step: 2725  | total loss: [1m[32m0.01986[0m[0m | time: 0.095s
| SGD | epoch: 047 | loss: 0.01986 - R2: 0.9993 -- iter: 0220/1168
Training Step: 2726  | total loss: [1m[32m0.01886[0m[0m | time: 0.098s
| SGD | epoch: 047 | loss: 0.01886 - R2: 0.9998 -- iter: 0240/1168
Training Step: 2727  | total loss: [1m[32m0.02217[0m[0m | time: 0.101s
| SGD | epoch: 047 | loss: 0.02217 - R2: 1.0005 -- iter: 0260/1168
Training Step: 2728  | total loss: [1m[32m0.02142[0m[0m | time: 0.103s
| SGD | epoch: 047 | loss: 0.02142 - R2: 0.9992 -- iter: 0280/1168
Training Step: 2729  | total loss: [1m[32m0.02142[0m[0m | time: 0.114s
| SGD | epoch: 047 | loss: 0.02142 - R2: 0.9992 -- iter: 0300/1168
Training Step: 2730  | total loss: [1m[32m0.02164[0m[0m | time: 0.138s
| SGD | epoch: 047 | loss: 0.02164 - R2: 0.9995 -- iter: 0320/1168
Training Step: 2731  | total loss: [1m[32m0.02132[0m[0m | time: 0.140s
| SGD | epoch: 047 | loss: 0.02132 - R2: 0.9993 -- iter: 0340/1168
Traini

Training Step: 2783  | total loss: [1m[32m0.01788[0m[0m | time: 0.053s
| SGD | epoch: 048 | loss: 0.01788 - R2: 0.9987 -- iter: 0200/1168
Training Step: 2784  | total loss: [1m[32m0.01694[0m[0m | time: 0.058s
| SGD | epoch: 048 | loss: 0.01694 - R2: 0.9983 -- iter: 0220/1168
Training Step: 2785  | total loss: [1m[32m0.01594[0m[0m | time: 0.061s
| SGD | epoch: 048 | loss: 0.01594 - R2: 0.9981 -- iter: 0240/1168
Training Step: 2786  | total loss: [1m[32m0.01652[0m[0m | time: 0.064s
| SGD | epoch: 048 | loss: 0.01652 - R2: 0.9983 -- iter: 0260/1168
Training Step: 2787  | total loss: [1m[32m0.01597[0m[0m | time: 0.067s
| SGD | epoch: 048 | loss: 0.01597 - R2: 0.9983 -- iter: 0280/1168
Training Step: 2788  | total loss: [1m[32m0.04044[0m[0m | time: 0.070s
| SGD | epoch: 048 | loss: 0.04044 - R2: 1.0001 -- iter: 0300/1168
Training Step: 2789  | total loss: [1m[32m0.03741[0m[0m | time: 0.072s
| SGD | epoch: 048 | loss: 0.03741 - R2: 1.0001 -- iter: 0320/1168
Traini

Training Step: 2841  | total loss: [1m[32m0.01721[0m[0m | time: 0.078s
| SGD | epoch: 049 | loss: 0.01721 - R2: 1.0004 -- iter: 0180/1168
Training Step: 2842  | total loss: [1m[32m0.01711[0m[0m | time: 0.080s
| SGD | epoch: 049 | loss: 0.01711 - R2: 1.0004 -- iter: 0200/1168
Training Step: 2843  | total loss: [1m[32m0.01624[0m[0m | time: 0.090s
| SGD | epoch: 049 | loss: 0.01624 - R2: 1.0004 -- iter: 0220/1168
Training Step: 2844  | total loss: [1m[32m0.01531[0m[0m | time: 0.101s
| SGD | epoch: 049 | loss: 0.01531 - R2: 1.0000 -- iter: 0240/1168
Training Step: 2845  | total loss: [1m[32m0.01546[0m[0m | time: 0.104s
| SGD | epoch: 049 | loss: 0.01546 - R2: 0.9997 -- iter: 0260/1168
Training Step: 2846  | total loss: [1m[32m0.01500[0m[0m | time: 0.108s
| SGD | epoch: 049 | loss: 0.01500 - R2: 1.0001 -- iter: 0280/1168
Training Step: 2847  | total loss: [1m[32m0.01500[0m[0m | time: 0.110s
| SGD | epoch: 049 | loss: 0.01500 - R2: 1.0001 -- iter: 0300/1168
Traini

Training Step: 2899  | total loss: [1m[32m0.01592[0m[0m | time: 0.106s
| SGD | epoch: 050 | loss: 0.01592 - R2: 0.9986 -- iter: 0160/1168
Training Step: 2900  | total loss: [1m[32m0.01973[0m[0m | time: 0.111s
| SGD | epoch: 050 | loss: 0.01973 - R2: 0.9991 -- iter: 0180/1168
Training Step: 2901  | total loss: [1m[32m0.01888[0m[0m | time: 0.114s
| SGD | epoch: 050 | loss: 0.01888 - R2: 0.9995 -- iter: 0200/1168
Training Step: 2902  | total loss: [1m[32m0.01891[0m[0m | time: 0.132s
| SGD | epoch: 050 | loss: 0.01891 - R2: 0.9990 -- iter: 0220/1168
Training Step: 2903  | total loss: [1m[32m0.01842[0m[0m | time: 0.136s
| SGD | epoch: 050 | loss: 0.01842 - R2: 0.9987 -- iter: 0240/1168
Training Step: 2904  | total loss: [1m[32m0.01915[0m[0m | time: 0.139s
| SGD | epoch: 050 | loss: 0.01915 - R2: 0.9993 -- iter: 0260/1168
Training Step: 2905  | total loss: [1m[32m0.01826[0m[0m | time: 0.141s
| SGD | epoch: 050 | loss: 0.01826 - R2: 0.9988 -- iter: 0280/1168
Traini

Training Step: 2956  | total loss: [1m[32m0.02836[0m[0m | time: 0.046s
| SGD | epoch: 051 | loss: 0.02836 - R2: 1.0007 -- iter: 0120/1168
Training Step: 2957  | total loss: [1m[32m0.02836[0m[0m | time: 0.049s
| SGD | epoch: 051 | loss: 0.02836 - R2: 1.0007 -- iter: 0140/1168
Training Step: 2958  | total loss: [1m[32m0.02654[0m[0m | time: 0.051s
| SGD | epoch: 051 | loss: 0.02654 - R2: 1.0005 -- iter: 0160/1168
Training Step: 2959  | total loss: [1m[32m0.02547[0m[0m | time: 0.054s
| SGD | epoch: 051 | loss: 0.02547 - R2: 1.0005 -- iter: 0180/1168
Training Step: 2960  | total loss: [1m[32m0.02696[0m[0m | time: 0.059s
| SGD | epoch: 051 | loss: 0.02696 - R2: 1.0008 -- iter: 0200/1168
Training Step: 2961  | total loss: [1m[32m0.02696[0m[0m | time: 0.061s
| SGD | epoch: 051 | loss: 0.02696 - R2: 1.0008 -- iter: 0220/1168
Training Step: 2962  | total loss: [1m[32m0.02538[0m[0m | time: 0.076s
| SGD | epoch: 051 | loss: 0.02538 - R2: 1.0005 -- iter: 0240/1168
Traini

Training Step: 3014  | total loss: [1m[32m0.01556[0m[0m | time: 0.088s
| SGD | epoch: 052 | loss: 0.01556 - R2: 0.9979 -- iter: 0100/1168
Training Step: 3015  | total loss: [1m[32m0.01556[0m[0m | time: 0.091s
| SGD | epoch: 052 | loss: 0.01556 - R2: 0.9979 -- iter: 0120/1168
Training Step: 3016  | total loss: [1m[32m0.01550[0m[0m | time: 0.092s
| SGD | epoch: 052 | loss: 0.01550 - R2: 0.9977 -- iter: 0140/1168
Training Step: 3017  | total loss: [1m[32m0.01649[0m[0m | time: 0.094s
| SGD | epoch: 052 | loss: 0.01649 - R2: 0.9972 -- iter: 0160/1168
Training Step: 3018  | total loss: [1m[32m0.01662[0m[0m | time: 0.095s
| SGD | epoch: 052 | loss: 0.01662 - R2: 0.9977 -- iter: 0180/1168
Training Step: 3019  | total loss: [1m[32m0.01660[0m[0m | time: 0.098s
| SGD | epoch: 052 | loss: 0.01660 - R2: 0.9985 -- iter: 0200/1168
Training Step: 3020  | total loss: [1m[32m0.01696[0m[0m | time: 0.100s
| SGD | epoch: 052 | loss: 0.01696 - R2: 0.9986 -- iter: 0220/1168
Traini

Training Step: 3072  | total loss: [1m[32m0.01811[0m[0m | time: 0.015s
| SGD | epoch: 053 | loss: 0.01811 - R2: 1.0005 -- iter: 0080/1168
Training Step: 3073  | total loss: [1m[32m0.01794[0m[0m | time: 0.022s
| SGD | epoch: 053 | loss: 0.01794 - R2: 1.0004 -- iter: 0100/1168
Training Step: 3074  | total loss: [1m[32m0.03338[0m[0m | time: 0.030s
| SGD | epoch: 053 | loss: 0.03338 - R2: 1.0028 -- iter: 0120/1168
Training Step: 3075  | total loss: [1m[32m0.03338[0m[0m | time: 0.034s
| SGD | epoch: 053 | loss: 0.03338 - R2: 1.0022 -- iter: 0140/1168
Training Step: 3076  | total loss: [1m[32m0.03356[0m[0m | time: 0.036s
| SGD | epoch: 053 | loss: 0.03356 - R2: 1.0022 -- iter: 0160/1168
Training Step: 3077  | total loss: [1m[32m0.03157[0m[0m | time: 0.038s
| SGD | epoch: 053 | loss: 0.03157 - R2: 1.0024 -- iter: 0180/1168
Training Step: 3078  | total loss: [1m[32m0.02714[0m[0m | time: 0.043s
| SGD | epoch: 053 | loss: 0.02714 - R2: 1.0027 -- iter: 0200/1168
Traini

Training Step: 3130  | total loss: [1m[32m0.02343[0m[0m | time: 0.051s
| SGD | epoch: 054 | loss: 0.02343 - R2: 0.9988 -- iter: 0060/1168
Training Step: 3131  | total loss: [1m[32m0.02285[0m[0m | time: 0.060s
| SGD | epoch: 054 | loss: 0.02285 - R2: 0.9980 -- iter: 0080/1168
Training Step: 3132  | total loss: [1m[32m0.02396[0m[0m | time: 0.063s
| SGD | epoch: 054 | loss: 0.02396 - R2: 0.9983 -- iter: 0100/1168
Training Step: 3133  | total loss: [1m[32m0.02303[0m[0m | time: 0.066s
| SGD | epoch: 054 | loss: 0.02303 - R2: 0.9983 -- iter: 0120/1168
Training Step: 3134  | total loss: [1m[32m0.02160[0m[0m | time: 0.069s
| SGD | epoch: 054 | loss: 0.02160 - R2: 0.9983 -- iter: 0140/1168
Training Step: 3135  | total loss: [1m[32m0.02160[0m[0m | time: 0.071s
| SGD | epoch: 054 | loss: 0.02160 - R2: 0.9986 -- iter: 0160/1168
Training Step: 3136  | total loss: [1m[32m0.02074[0m[0m | time: 0.073s
| SGD | epoch: 054 | loss: 0.02074 - R2: 0.9984 -- iter: 0180/1168
Traini

Training Step: 3188  | total loss: [1m[32m0.01887[0m[0m | time: 0.041s
| SGD | epoch: 055 | loss: 0.01887 - R2: 1.0003 -- iter: 0040/1168
Training Step: 3189  | total loss: [1m[32m0.01887[0m[0m | time: 0.045s
| SGD | epoch: 055 | loss: 0.01887 - R2: 1.0003 -- iter: 0060/1168
Training Step: 3190  | total loss: [1m[32m0.02146[0m[0m | time: 0.047s
| SGD | epoch: 055 | loss: 0.02146 - R2: 0.9992 -- iter: 0080/1168
Training Step: 3191  | total loss: [1m[32m0.02356[0m[0m | time: 0.050s
| SGD | epoch: 055 | loss: 0.02356 - R2: 0.9982 -- iter: 0100/1168
Training Step: 3192  | total loss: [1m[32m0.02140[0m[0m | time: 0.053s
| SGD | epoch: 055 | loss: 0.02140 - R2: 0.9981 -- iter: 0120/1168
Training Step: 3193  | total loss: [1m[32m0.02077[0m[0m | time: 0.057s
| SGD | epoch: 055 | loss: 0.02077 - R2: 0.9993 -- iter: 0140/1168
Training Step: 3194  | total loss: [1m[32m0.02029[0m[0m | time: 0.060s
| SGD | epoch: 055 | loss: 0.02029 - R2: 0.9992 -- iter: 0160/1168
Traini

Training Step: 3246  | total loss: [1m[32m0.02127[0m[0m | time: 0.084s
| SGD | epoch: 056 | loss: 0.02127 - R2: 0.9995 -- iter: 0020/1168
Training Step: 3247  | total loss: [1m[32m0.02042[0m[0m | time: 0.087s
| SGD | epoch: 056 | loss: 0.02042 - R2: 0.9995 -- iter: 0040/1168
Training Step: 3248  | total loss: [1m[32m0.02050[0m[0m | time: 0.090s
| SGD | epoch: 056 | loss: 0.02050 - R2: 0.9993 -- iter: 0060/1168
Training Step: 3249  | total loss: [1m[32m0.02050[0m[0m | time: 0.092s
| SGD | epoch: 056 | loss: 0.02050 - R2: 0.9993 -- iter: 0080/1168
Training Step: 3250  | total loss: [1m[32m0.01791[0m[0m | time: 0.095s
| SGD | epoch: 056 | loss: 0.01791 - R2: 0.9995 -- iter: 0100/1168
Training Step: 3251  | total loss: [1m[32m0.02215[0m[0m | time: 0.098s
| SGD | epoch: 056 | loss: 0.02215 - R2: 0.9991 -- iter: 0120/1168
Training Step: 3252  | total loss: [1m[32m0.02215[0m[0m | time: 0.100s
| SGD | epoch: 056 | loss: 0.02215 - R2: 0.9991 -- iter: 0140/1168
Traini

Training Step: 3304  | total loss: [1m[32m0.02819[0m[0m | time: 1.301s
| SGD | epoch: 056 | loss: 0.02819 - R2: 1.0008 | val_loss: 0.02285 - val_acc: 0.9984 -- iter: 1168/1168
--
Training Step: 3305  | total loss: [1m[32m0.02819[0m[0m | time: 0.038s
| SGD | epoch: 057 | loss: 0.02819 - R2: 1.0008 -- iter: 0020/1168
Training Step: 3306  | total loss: [1m[32m0.02769[0m[0m | time: 0.042s
| SGD | epoch: 057 | loss: 0.02769 - R2: 0.9994 -- iter: 0040/1168
Training Step: 3307  | total loss: [1m[32m0.02414[0m[0m | time: 0.047s
| SGD | epoch: 057 | loss: 0.02414 - R2: 0.9984 -- iter: 0060/1168
Training Step: 3308  | total loss: [1m[32m0.02414[0m[0m | time: 0.057s
| SGD | epoch: 057 | loss: 0.02414 - R2: 0.9984 -- iter: 0080/1168
Training Step: 3309  | total loss: [1m[32m0.02300[0m[0m | time: 0.061s
| SGD | epoch: 057 | loss: 0.02300 - R2: 0.9990 -- iter: 0100/1168
Training Step: 3310  | total loss: [1m[32m0.02123[0m[0m | time: 0.063s
| SGD | epoch: 057 | loss: 0.021

Training Step: 3362  | total loss: [1m[32m0.02127[0m[0m | time: 0.204s
| SGD | epoch: 057 | loss: 0.02127 - R2: 1.0012 -- iter: 1160/1168
Training Step: 3363  | total loss: [1m[32m0.02045[0m[0m | time: 1.210s
| SGD | epoch: 057 | loss: 0.02045 - R2: 1.0007 | val_loss: 0.02305 - val_acc: 0.9980 -- iter: 1168/1168
--
Training Step: 3364  | total loss: [1m[32m0.01909[0m[0m | time: 0.011s
| SGD | epoch: 058 | loss: 0.01909 - R2: 1.0005 -- iter: 0020/1168
Training Step: 3365  | total loss: [1m[32m0.02044[0m[0m | time: 0.015s
| SGD | epoch: 058 | loss: 0.02044 - R2: 0.9991 -- iter: 0040/1168
Training Step: 3366  | total loss: [1m[32m0.02001[0m[0m | time: 0.020s
| SGD | epoch: 058 | loss: 0.02001 - R2: 0.9982 -- iter: 0060/1168
Training Step: 3367  | total loss: [1m[32m0.02001[0m[0m | time: 0.023s
| SGD | epoch: 058 | loss: 0.02001 - R2: 0.9982 -- iter: 0080/1168
Training Step: 3368  | total loss: [1m[32m0.01912[0m[0m | time: 0.026s
| SGD | epoch: 058 | loss: 0.019

Training Step: 3420  | total loss: [1m[32m0.01793[0m[0m | time: 0.151s
| SGD | epoch: 058 | loss: 0.01793 - R2: 0.9995 -- iter: 1140/1168
Training Step: 3421  | total loss: [1m[32m0.01793[0m[0m | time: 0.157s
| SGD | epoch: 058 | loss: 0.01793 - R2: 0.9995 -- iter: 1160/1168
Training Step: 3422  | total loss: [1m[32m0.01795[0m[0m | time: 1.163s
| SGD | epoch: 058 | loss: 0.01795 - R2: 0.9999 | val_loss: 0.02309 - val_acc: 0.9983 -- iter: 1168/1168
--
Training Step: 3423  | total loss: [1m[32m0.01782[0m[0m | time: 0.036s
| SGD | epoch: 059 | loss: 0.01782 - R2: 1.0004 -- iter: 0020/1168
Training Step: 3424  | total loss: [1m[32m0.01838[0m[0m | time: 0.041s
| SGD | epoch: 059 | loss: 0.01838 - R2: 0.9997 -- iter: 0040/1168
Training Step: 3425  | total loss: [1m[32m0.01730[0m[0m | time: 0.044s
| SGD | epoch: 059 | loss: 0.01730 - R2: 0.9995 -- iter: 0060/1168
Training Step: 3426  | total loss: [1m[32m0.01730[0m[0m | time: 0.047s
| SGD | epoch: 059 | loss: 0.017

Training Step: 3478  | total loss: [1m[32m0.02015[0m[0m | time: 0.243s
| SGD | epoch: 059 | loss: 0.02015 - R2: 0.9996 -- iter: 1120/1168
Training Step: 3479  | total loss: [1m[32m0.02118[0m[0m | time: 0.247s
| SGD | epoch: 059 | loss: 0.02118 - R2: 0.9999 -- iter: 1140/1168
Training Step: 3480  | total loss: [1m[32m0.01974[0m[0m | time: 0.250s
| SGD | epoch: 059 | loss: 0.01974 - R2: 0.9993 -- iter: 1160/1168
Training Step: 3481  | total loss: [1m[32m0.01974[0m[0m | time: 1.256s
| SGD | epoch: 059 | loss: 0.01974 - R2: 0.9991 | val_loss: 0.02318 - val_acc: 0.9983 -- iter: 1168/1168
--
Training Step: 3482  | total loss: [1m[32m0.01961[0m[0m | time: 0.079s
| SGD | epoch: 060 | loss: 0.01961 - R2: 0.9994 -- iter: 0020/1168
Training Step: 3483  | total loss: [1m[32m0.01953[0m[0m | time: 0.082s
| SGD | epoch: 060 | loss: 0.01953 - R2: 0.9988 -- iter: 0040/1168
Training Step: 3484  | total loss: [1m[32m0.01885[0m[0m | time: 0.086s
| SGD | epoch: 060 | loss: 0.018

Training Step: 3536  | total loss: [1m[32m0.02864[0m[0m | time: 0.267s
| SGD | epoch: 060 | loss: 0.02864 - R2: 1.0028 -- iter: 1100/1168
Training Step: 3537  | total loss: [1m[32m0.02864[0m[0m | time: 0.270s
| SGD | epoch: 060 | loss: 0.02864 - R2: 1.0028 -- iter: 1120/1168
Training Step: 3538  | total loss: [1m[32m0.02760[0m[0m | time: 0.273s
| SGD | epoch: 060 | loss: 0.02760 - R2: 1.0022 -- iter: 1140/1168
Training Step: 3539  | total loss: [1m[32m0.02555[0m[0m | time: 0.277s
| SGD | epoch: 060 | loss: 0.02555 - R2: 1.0012 -- iter: 1160/1168
Training Step: 3540  | total loss: [1m[32m0.02400[0m[0m | time: 1.289s
| SGD | epoch: 060 | loss: 0.02400 - R2: 1.0009 | val_loss: 0.02328 - val_acc: 0.9979 -- iter: 1168/1168
--
Training Step: 3541  | total loss: [1m[32m0.02400[0m[0m | time: 0.033s
| SGD | epoch: 061 | loss: 0.02400 - R2: 1.0006 -- iter: 0020/1168
Training Step: 3542  | total loss: [1m[32m0.02440[0m[0m | time: 0.039s
| SGD | epoch: 061 | loss: 0.024

Training Step: 3594  | total loss: [1m[32m0.02960[0m[0m | time: 0.223s
| SGD | epoch: 061 | loss: 0.02960 - R2: 0.9998 -- iter: 1080/1168
Training Step: 3595  | total loss: [1m[32m0.02759[0m[0m | time: 0.227s
| SGD | epoch: 061 | loss: 0.02759 - R2: 0.9991 -- iter: 1100/1168
Training Step: 3596  | total loss: [1m[32m0.02759[0m[0m | time: 0.229s
| SGD | epoch: 061 | loss: 0.02759 - R2: 0.9987 -- iter: 1120/1168
Training Step: 3597  | total loss: [1m[32m0.02602[0m[0m | time: 0.232s
| SGD | epoch: 061 | loss: 0.02602 - R2: 0.9998 -- iter: 1140/1168
Training Step: 3598  | total loss: [1m[32m0.02487[0m[0m | time: 0.236s
| SGD | epoch: 061 | loss: 0.02487 - R2: 0.9996 -- iter: 1160/1168
Training Step: 3599  | total loss: [1m[32m0.02367[0m[0m | time: 1.243s
| SGD | epoch: 061 | loss: 0.02367 - R2: 0.9989 | val_loss: 0.02328 - val_acc: 0.9985 -- iter: 1168/1168
--
Training Step: 3600  | total loss: [1m[32m0.02367[0m[0m | time: 0.052s
| SGD | epoch: 062 | loss: 0.023

Training Step: 3652  | total loss: [1m[32m0.02445[0m[0m | time: 0.264s
| SGD | epoch: 062 | loss: 0.02445 - R2: 1.0004 -- iter: 1060/1168
Training Step: 3653  | total loss: [1m[32m0.02404[0m[0m | time: 0.266s
| SGD | epoch: 062 | loss: 0.02404 - R2: 1.0003 -- iter: 1080/1168
Training Step: 3654  | total loss: [1m[32m0.02348[0m[0m | time: 0.276s
| SGD | epoch: 062 | loss: 0.02348 - R2: 1.0009 -- iter: 1100/1168
Training Step: 3655  | total loss: [1m[32m0.02311[0m[0m | time: 0.279s
| SGD | epoch: 062 | loss: 0.02311 - R2: 1.0008 -- iter: 1120/1168
Training Step: 3656  | total loss: [1m[32m0.02166[0m[0m | time: 0.283s
| SGD | epoch: 062 | loss: 0.02166 - R2: 0.9999 -- iter: 1140/1168
Training Step: 3657  | total loss: [1m[32m0.02166[0m[0m | time: 0.287s
| SGD | epoch: 062 | loss: 0.02166 - R2: 0.9998 -- iter: 1160/1168
Training Step: 3658  | total loss: [1m[32m0.02054[0m[0m | time: 1.293s
| SGD | epoch: 062 | loss: 0.02054 - R2: 1.0003 | val_loss: 0.02334 - val

Training Step: 3710  | total loss: [1m[32m0.02395[0m[0m | time: 0.196s
| SGD | epoch: 063 | loss: 0.02395 - R2: 1.0013 -- iter: 1040/1168
Training Step: 3711  | total loss: [1m[32m0.02245[0m[0m | time: 0.200s
| SGD | epoch: 063 | loss: 0.02245 - R2: 1.0014 -- iter: 1060/1168
Training Step: 3712  | total loss: [1m[32m0.02142[0m[0m | time: 0.202s
| SGD | epoch: 063 | loss: 0.02142 - R2: 1.0014 -- iter: 1080/1168
Training Step: 3713  | total loss: [1m[32m0.02015[0m[0m | time: 0.203s
| SGD | epoch: 063 | loss: 0.02015 - R2: 1.0009 -- iter: 1100/1168
Training Step: 3714  | total loss: [1m[32m0.02146[0m[0m | time: 0.207s
| SGD | epoch: 063 | loss: 0.02146 - R2: 1.0009 -- iter: 1120/1168
Training Step: 3715  | total loss: [1m[32m0.02263[0m[0m | time: 0.221s
| SGD | epoch: 063 | loss: 0.02263 - R2: 1.0002 -- iter: 1140/1168
Training Step: 3716  | total loss: [1m[32m0.02263[0m[0m | time: 0.223s
| SGD | epoch: 063 | loss: 0.02263 - R2: 1.0002 -- iter: 1160/1168
Traini

Training Step: 3768  | total loss: [1m[32m0.02643[0m[0m | time: 0.234s
| SGD | epoch: 064 | loss: 0.02643 - R2: 1.0026 -- iter: 1020/1168
Training Step: 3769  | total loss: [1m[32m0.02586[0m[0m | time: 0.245s
| SGD | epoch: 064 | loss: 0.02586 - R2: 1.0021 -- iter: 1040/1168
Training Step: 3770  | total loss: [1m[32m0.02468[0m[0m | time: 0.249s
| SGD | epoch: 064 | loss: 0.02468 - R2: 1.0013 -- iter: 1060/1168
Training Step: 3771  | total loss: [1m[32m0.02612[0m[0m | time: 0.253s
| SGD | epoch: 064 | loss: 0.02612 - R2: 1.0014 -- iter: 1080/1168
Training Step: 3772  | total loss: [1m[32m0.02612[0m[0m | time: 0.257s
| SGD | epoch: 064 | loss: 0.02612 - R2: 1.0014 -- iter: 1100/1168
Training Step: 3773  | total loss: [1m[32m0.02547[0m[0m | time: 0.261s
| SGD | epoch: 064 | loss: 0.02547 - R2: 1.0014 -- iter: 1120/1168
Training Step: 3774  | total loss: [1m[32m0.02449[0m[0m | time: 0.265s
| SGD | epoch: 064 | loss: 0.02449 - R2: 1.0014 -- iter: 1140/1168
Traini

Training Step: 3826  | total loss: [1m[32m0.01632[0m[0m | time: 0.178s
| SGD | epoch: 065 | loss: 0.01632 - R2: 0.9973 -- iter: 1000/1168
Training Step: 3827  | total loss: [1m[32m0.01523[0m[0m | time: 0.183s
| SGD | epoch: 065 | loss: 0.01523 - R2: 0.9971 -- iter: 1020/1168
Training Step: 3828  | total loss: [1m[32m0.01523[0m[0m | time: 0.187s
| SGD | epoch: 065 | loss: 0.01523 - R2: 0.9971 -- iter: 1040/1168
Training Step: 3829  | total loss: [1m[32m0.01487[0m[0m | time: 0.191s
| SGD | epoch: 065 | loss: 0.01487 - R2: 0.9970 -- iter: 1060/1168
Training Step: 3830  | total loss: [1m[32m0.01539[0m[0m | time: 0.197s
| SGD | epoch: 065 | loss: 0.01539 - R2: 0.9972 -- iter: 1080/1168
Training Step: 3831  | total loss: [1m[32m0.01457[0m[0m | time: 0.200s
| SGD | epoch: 065 | loss: 0.01457 - R2: 0.9974 -- iter: 1100/1168
Training Step: 3832  | total loss: [1m[32m0.01457[0m[0m | time: 0.205s
| SGD | epoch: 065 | loss: 0.01457 - R2: 0.9974 -- iter: 1120/1168
Traini

Training Step: 3884  | total loss: [1m[32m0.01462[0m[0m | time: 0.201s
| SGD | epoch: 066 | loss: 0.01462 - R2: 1.0005 -- iter: 0980/1168
Training Step: 3885  | total loss: [1m[32m0.01491[0m[0m | time: 0.205s
| SGD | epoch: 066 | loss: 0.01491 - R2: 0.9998 -- iter: 1000/1168
Training Step: 3886  | total loss: [1m[32m0.01477[0m[0m | time: 0.209s
| SGD | epoch: 066 | loss: 0.01477 - R2: 0.9996 -- iter: 1020/1168
Training Step: 3887  | total loss: [1m[32m0.01477[0m[0m | time: 0.214s
| SGD | epoch: 066 | loss: 0.01477 - R2: 0.9996 -- iter: 1040/1168
Training Step: 3888  | total loss: [1m[32m0.01585[0m[0m | time: 0.216s
| SGD | epoch: 066 | loss: 0.01585 - R2: 0.9990 -- iter: 1060/1168
Training Step: 3889  | total loss: [1m[32m0.01530[0m[0m | time: 0.220s
| SGD | epoch: 066 | loss: 0.01530 - R2: 0.9987 -- iter: 1080/1168
Training Step: 3890  | total loss: [1m[32m0.01572[0m[0m | time: 0.224s
| SGD | epoch: 066 | loss: 0.01572 - R2: 0.9993 -- iter: 1100/1168
Traini

Training Step: 3942  | total loss: [1m[32m0.02890[0m[0m | time: 0.176s
| SGD | epoch: 067 | loss: 0.02890 - R2: 1.0027 -- iter: 0960/1168
Training Step: 3943  | total loss: [1m[32m0.02630[0m[0m | time: 0.179s
| SGD | epoch: 067 | loss: 0.02630 - R2: 1.0027 -- iter: 0980/1168
Training Step: 3944  | total loss: [1m[32m0.02630[0m[0m | time: 0.182s
| SGD | epoch: 067 | loss: 0.02630 - R2: 1.0020 -- iter: 1000/1168
Training Step: 3945  | total loss: [1m[32m0.02389[0m[0m | time: 0.184s
| SGD | epoch: 067 | loss: 0.02389 - R2: 1.0021 -- iter: 1020/1168
Training Step: 3946  | total loss: [1m[32m0.02385[0m[0m | time: 0.186s
| SGD | epoch: 067 | loss: 0.02385 - R2: 1.0026 -- iter: 1040/1168
Training Step: 3947  | total loss: [1m[32m0.03296[0m[0m | time: 0.188s
| SGD | epoch: 067 | loss: 0.03296 - R2: 1.0032 -- iter: 1060/1168
Training Step: 3948  | total loss: [1m[32m0.03216[0m[0m | time: 0.191s
| SGD | epoch: 067 | loss: 0.03216 - R2: 1.0026 -- iter: 1080/1168
Traini

Training Step: 4000  | total loss: [1m[32m0.01889[0m[0m | time: 0.208s
| SGD | epoch: 068 | loss: 0.01889 - R2: 0.9992 -- iter: 0940/1168
Training Step: 4001  | total loss: [1m[32m0.01933[0m[0m | time: 0.211s
| SGD | epoch: 068 | loss: 0.01933 - R2: 0.9994 -- iter: 0960/1168
Training Step: 4002  | total loss: [1m[32m0.01849[0m[0m | time: 0.214s
| SGD | epoch: 068 | loss: 0.01849 - R2: 0.9984 -- iter: 0980/1168
Training Step: 4003  | total loss: [1m[32m0.01909[0m[0m | time: 0.218s
| SGD | epoch: 068 | loss: 0.01909 - R2: 0.9979 -- iter: 1000/1168
Training Step: 4004  | total loss: [1m[32m0.01849[0m[0m | time: 0.223s
| SGD | epoch: 068 | loss: 0.01849 - R2: 0.9981 -- iter: 1020/1168
Training Step: 4005  | total loss: [1m[32m0.01849[0m[0m | time: 0.229s
| SGD | epoch: 068 | loss: 0.01849 - R2: 0.9983 -- iter: 1040/1168
Training Step: 4006  | total loss: [1m[32m0.01798[0m[0m | time: 0.233s
| SGD | epoch: 068 | loss: 0.01798 - R2: 0.9988 -- iter: 1060/1168
Traini

Training Step: 4058  | total loss: [1m[32m0.01875[0m[0m | time: 0.245s
| SGD | epoch: 069 | loss: 0.01875 - R2: 1.0001 -- iter: 0920/1168
Training Step: 4059  | total loss: [1m[32m0.01848[0m[0m | time: 0.248s
| SGD | epoch: 069 | loss: 0.01848 - R2: 1.0001 -- iter: 0940/1168
Training Step: 4060  | total loss: [1m[32m0.01803[0m[0m | time: 0.250s
| SGD | epoch: 069 | loss: 0.01803 - R2: 0.9993 -- iter: 0960/1168
Training Step: 4061  | total loss: [1m[32m0.01749[0m[0m | time: 0.252s
| SGD | epoch: 069 | loss: 0.01749 - R2: 1.0000 -- iter: 0980/1168
Training Step: 4062  | total loss: [1m[32m0.01674[0m[0m | time: 0.254s
| SGD | epoch: 069 | loss: 0.01674 - R2: 1.0000 -- iter: 1000/1168
Training Step: 4063  | total loss: [1m[32m0.01681[0m[0m | time: 0.267s
| SGD | epoch: 069 | loss: 0.01681 - R2: 1.0000 -- iter: 1020/1168
Training Step: 4064  | total loss: [1m[32m0.01681[0m[0m | time: 0.270s
| SGD | epoch: 069 | loss: 0.01681 - R2: 1.0000 -- iter: 1040/1168
Traini

Training Step: 4116  | total loss: [1m[32m0.02446[0m[0m | time: 0.187s
| SGD | epoch: 070 | loss: 0.02446 - R2: 1.0007 -- iter: 0900/1168
Training Step: 4117  | total loss: [1m[32m0.02354[0m[0m | time: 0.192s
| SGD | epoch: 070 | loss: 0.02354 - R2: 1.0009 -- iter: 0920/1168
Training Step: 4118  | total loss: [1m[32m0.03442[0m[0m | time: 0.195s
| SGD | epoch: 070 | loss: 0.03442 - R2: 1.0021 -- iter: 0940/1168
Training Step: 4119  | total loss: [1m[32m0.03224[0m[0m | time: 0.198s
| SGD | epoch: 070 | loss: 0.03224 - R2: 1.0026 -- iter: 0960/1168
Training Step: 4120  | total loss: [1m[32m0.03222[0m[0m | time: 0.201s
| SGD | epoch: 070 | loss: 0.03222 - R2: 1.0029 -- iter: 0980/1168
Training Step: 4121  | total loss: [1m[32m0.03042[0m[0m | time: 0.205s
| SGD | epoch: 070 | loss: 0.03042 - R2: 1.0028 -- iter: 1000/1168
Training Step: 4122  | total loss: [1m[32m0.03102[0m[0m | time: 0.208s
| SGD | epoch: 070 | loss: 0.03102 - R2: 1.0018 -- iter: 1020/1168
Traini

Training Step: 42  | total loss: [1m[32m104.49445[0m[0m | time: 0.156s
| SGD | epoch: 001 | loss: 104.49445 - R2: 0.0232 -- iter: 0840/1168
Training Step: 43  | total loss: [1m[32m104.49445[0m[0m | time: 0.158s
| SGD | epoch: 001 | loss: 104.49445 - R2: 0.0232 -- iter: 0860/1168
Training Step: 44  | total loss: [1m[32m103.89248[0m[0m | time: 0.159s
| SGD | epoch: 001 | loss: 103.89248 - R2: 0.0242 -- iter: 0880/1168
Training Step: 45  | total loss: [1m[32m102.36757[0m[0m | time: 0.165s
| SGD | epoch: 001 | loss: 102.36757 - R2: 0.0268 -- iter: 0900/1168
Training Step: 46  | total loss: [1m[32m100.31351[0m[0m | time: 0.168s
| SGD | epoch: 001 | loss: 100.31351 - R2: 0.0281 -- iter: 0920/1168
Training Step: 47  | total loss: [1m[32m99.74792[0m[0m | time: 0.170s
| SGD | epoch: 001 | loss: 99.74792 - R2: 0.0293 -- iter: 0940/1168
Training Step: 48  | total loss: [1m[32m98.42319[0m[0m | time: 0.173s
| SGD | epoch: 001 | loss: 98.42319 - R2: 0.0307 -- iter: 0960/1

Training Step: 100  | total loss: [1m[32m47.93410[0m[0m | time: 0.208s
| SGD | epoch: 002 | loss: 47.93410 - R2: 0.1820 -- iter: 0820/1168
Training Step: 101  | total loss: [1m[32m46.59803[0m[0m | time: 0.211s
| SGD | epoch: 002 | loss: 46.59803 - R2: 0.1947 -- iter: 0840/1168
Training Step: 102  | total loss: [1m[32m46.59803[0m[0m | time: 0.213s
| SGD | epoch: 002 | loss: 46.59803 - R2: 0.1947 -- iter: 0860/1168
Training Step: 103  | total loss: [1m[32m45.53341[0m[0m | time: 0.214s
| SGD | epoch: 002 | loss: 45.53341 - R2: 0.2079 -- iter: 0880/1168
Training Step: 104  | total loss: [1m[32m44.33103[0m[0m | time: 0.216s
| SGD | epoch: 002 | loss: 44.33103 - R2: 0.2157 -- iter: 0900/1168
Training Step: 105  | total loss: [1m[32m42.97727[0m[0m | time: 0.219s
| SGD | epoch: 002 | loss: 42.97727 - R2: 0.2157 -- iter: 0920/1168
Training Step: 106  | total loss: [1m[32m41.66172[0m[0m | time: 0.220s
| SGD | epoch: 002 | loss: 41.66172 - R2: 0.2235 -- iter: 0940/1168

Training Step: 158  | total loss: [1m[32m1.00316[0m[0m | time: 0.149s
| SGD | epoch: 003 | loss: 1.00316 - R2: 0.9488 -- iter: 0800/1168
Training Step: 159  | total loss: [1m[32m0.90455[0m[0m | time: 0.151s
| SGD | epoch: 003 | loss: 0.90455 - R2: 0.9541 -- iter: 0820/1168
Training Step: 160  | total loss: [1m[32m0.82302[0m[0m | time: 0.159s
| SGD | epoch: 003 | loss: 0.82302 - R2: 0.9571 -- iter: 0840/1168
Training Step: 161  | total loss: [1m[32m0.74466[0m[0m | time: 0.161s
| SGD | epoch: 003 | loss: 0.74466 - R2: 0.9606 -- iter: 0860/1168
Training Step: 162  | total loss: [1m[32m0.67400[0m[0m | time: 0.165s
| SGD | epoch: 003 | loss: 0.67400 - R2: 0.9652 -- iter: 0880/1168
Training Step: 163  | total loss: [1m[32m0.61121[0m[0m | time: 0.169s
| SGD | epoch: 003 | loss: 0.61121 - R2: 0.9676 -- iter: 0900/1168
Training Step: 164  | total loss: [1m[32m0.55526[0m[0m | time: 0.172s
| SGD | epoch: 003 | loss: 0.55526 - R2: 0.9722 -- iter: 0920/1168
Training Step

Training Step: 216  | total loss: [1m[32m0.01856[0m[0m | time: 0.123s
| SGD | epoch: 004 | loss: 0.01856 - R2: 0.9990 -- iter: 0780/1168
Training Step: 217  | total loss: [1m[32m0.01856[0m[0m | time: 0.125s
| SGD | epoch: 004 | loss: 0.01856 - R2: 0.9990 -- iter: 0800/1168
Training Step: 218  | total loss: [1m[32m0.02089[0m[0m | time: 0.130s
| SGD | epoch: 004 | loss: 0.02089 - R2: 0.9993 -- iter: 0820/1168
Training Step: 219  | total loss: [1m[32m0.02096[0m[0m | time: 0.132s
| SGD | epoch: 004 | loss: 0.02096 - R2: 0.9997 -- iter: 0840/1168
Training Step: 220  | total loss: [1m[32m0.02022[0m[0m | time: 0.154s
| SGD | epoch: 004 | loss: 0.02022 - R2: 0.9994 -- iter: 0860/1168
Training Step: 221  | total loss: [1m[32m0.01922[0m[0m | time: 0.156s
| SGD | epoch: 004 | loss: 0.01922 - R2: 0.9994 -- iter: 0880/1168
Training Step: 222  | total loss: [1m[32m0.01878[0m[0m | time: 0.159s
| SGD | epoch: 004 | loss: 0.01878 - R2: 1.0003 -- iter: 0900/1168
Training Step

Training Step: 274  | total loss: [1m[32m0.03065[0m[0m | time: 0.241s
| SGD | epoch: 005 | loss: 0.03065 - R2: 0.9990 -- iter: 0760/1168
Training Step: 275  | total loss: [1m[32m0.02683[0m[0m | time: 0.244s
| SGD | epoch: 005 | loss: 0.02683 - R2: 0.9995 -- iter: 0780/1168
Training Step: 276  | total loss: [1m[32m0.02595[0m[0m | time: 0.248s
| SGD | epoch: 005 | loss: 0.02595 - R2: 0.9996 -- iter: 0800/1168
Training Step: 277  | total loss: [1m[32m0.02595[0m[0m | time: 0.249s
| SGD | epoch: 005 | loss: 0.02595 - R2: 0.9996 -- iter: 0820/1168
Training Step: 278  | total loss: [1m[32m0.02505[0m[0m | time: 0.252s
| SGD | epoch: 005 | loss: 0.02505 - R2: 1.0001 -- iter: 0840/1168
Training Step: 279  | total loss: [1m[32m0.02416[0m[0m | time: 0.255s
| SGD | epoch: 005 | loss: 0.02416 - R2: 1.0016 -- iter: 0860/1168
Training Step: 280  | total loss: [1m[32m0.02469[0m[0m | time: 0.258s
| SGD | epoch: 005 | loss: 0.02469 - R2: 1.0007 -- iter: 0880/1168
Training Step

Training Step: 332  | total loss: [1m[32m0.01677[0m[0m | time: 0.245s
| SGD | epoch: 006 | loss: 0.01677 - R2: 0.9996 -- iter: 0740/1168
Training Step: 333  | total loss: [1m[32m0.01593[0m[0m | time: 0.249s
| SGD | epoch: 006 | loss: 0.01593 - R2: 0.9996 -- iter: 0760/1168
Training Step: 334  | total loss: [1m[32m0.01586[0m[0m | time: 0.254s
| SGD | epoch: 006 | loss: 0.01586 - R2: 0.9992 -- iter: 0780/1168
Training Step: 335  | total loss: [1m[32m0.01638[0m[0m | time: 0.258s
| SGD | epoch: 006 | loss: 0.01638 - R2: 1.0002 -- iter: 0800/1168
Training Step: 336  | total loss: [1m[32m0.01609[0m[0m | time: 0.262s
| SGD | epoch: 006 | loss: 0.01609 - R2: 1.0007 -- iter: 0820/1168
Training Step: 337  | total loss: [1m[32m0.01534[0m[0m | time: 0.266s
| SGD | epoch: 006 | loss: 0.01534 - R2: 1.0006 -- iter: 0840/1168
Training Step: 338  | total loss: [1m[32m0.01534[0m[0m | time: 0.270s
| SGD | epoch: 006 | loss: 0.01534 - R2: 1.0006 -- iter: 0860/1168
Training Step

Training Step: 390  | total loss: [1m[32m0.02396[0m[0m | time: 0.127s
| SGD | epoch: 007 | loss: 0.02396 - R2: 0.9985 -- iter: 0720/1168
Training Step: 391  | total loss: [1m[32m0.02299[0m[0m | time: 0.135s
| SGD | epoch: 007 | loss: 0.02299 - R2: 0.9989 -- iter: 0740/1168
Training Step: 392  | total loss: [1m[32m0.02159[0m[0m | time: 0.142s
| SGD | epoch: 007 | loss: 0.02159 - R2: 0.9991 -- iter: 0760/1168
Training Step: 393  | total loss: [1m[32m0.03084[0m[0m | time: 0.145s
| SGD | epoch: 007 | loss: 0.03084 - R2: 1.0002 -- iter: 0780/1168
Training Step: 394  | total loss: [1m[32m0.03171[0m[0m | time: 0.152s
| SGD | epoch: 007 | loss: 0.03171 - R2: 0.9987 -- iter: 0800/1168
Training Step: 395  | total loss: [1m[32m0.03013[0m[0m | time: 0.161s
| SGD | epoch: 007 | loss: 0.03013 - R2: 0.9985 -- iter: 0820/1168
Training Step: 396  | total loss: [1m[32m0.03017[0m[0m | time: 0.164s
| SGD | epoch: 007 | loss: 0.03017 - R2: 0.9994 -- iter: 0840/1168
Training Step

Training Step: 448  | total loss: [1m[32m0.04262[0m[0m | time: 0.219s
| SGD | epoch: 008 | loss: 0.04262 - R2: 0.9969 -- iter: 0700/1168
Training Step: 449  | total loss: [1m[32m0.04065[0m[0m | time: 0.220s
| SGD | epoch: 008 | loss: 0.04065 - R2: 0.9982 -- iter: 0720/1168
Training Step: 450  | total loss: [1m[32m0.03745[0m[0m | time: 0.227s
| SGD | epoch: 008 | loss: 0.03745 - R2: 0.9983 -- iter: 0740/1168
Training Step: 451  | total loss: [1m[32m0.03657[0m[0m | time: 0.231s
| SGD | epoch: 008 | loss: 0.03657 - R2: 0.9978 -- iter: 0760/1168
Training Step: 452  | total loss: [1m[32m0.03418[0m[0m | time: 0.234s
| SGD | epoch: 008 | loss: 0.03418 - R2: 0.9985 -- iter: 0780/1168
Training Step: 453  | total loss: [1m[32m0.03222[0m[0m | time: 0.253s
| SGD | epoch: 008 | loss: 0.03222 - R2: 0.9994 -- iter: 0800/1168
Training Step: 454  | total loss: [1m[32m0.03069[0m[0m | time: 0.256s
| SGD | epoch: 008 | loss: 0.03069 - R2: 0.9997 -- iter: 0820/1168
Training Step

Training Step: 506  | total loss: [1m[32m0.03153[0m[0m | time: 0.135s
| SGD | epoch: 009 | loss: 0.03153 - R2: 0.9991 -- iter: 0680/1168
Training Step: 507  | total loss: [1m[32m0.02967[0m[0m | time: 0.138s
| SGD | epoch: 009 | loss: 0.02967 - R2: 0.9998 -- iter: 0700/1168
Training Step: 508  | total loss: [1m[32m0.02901[0m[0m | time: 0.139s
| SGD | epoch: 009 | loss: 0.02901 - R2: 0.9988 -- iter: 0720/1168
Training Step: 509  | total loss: [1m[32m0.02801[0m[0m | time: 0.142s
| SGD | epoch: 009 | loss: 0.02801 - R2: 0.9991 -- iter: 0740/1168
Training Step: 510  | total loss: [1m[32m0.03401[0m[0m | time: 0.144s
| SGD | epoch: 009 | loss: 0.03401 - R2: 0.9997 -- iter: 0760/1168
Training Step: 511  | total loss: [1m[32m0.03401[0m[0m | time: 0.145s
| SGD | epoch: 009 | loss: 0.03401 - R2: 0.9997 -- iter: 0780/1168
Training Step: 512  | total loss: [1m[32m0.02988[0m[0m | time: 0.147s
| SGD | epoch: 009 | loss: 0.02988 - R2: 0.9996 -- iter: 0800/1168
Training Step

Training Step: 564  | total loss: [1m[32m0.01579[0m[0m | time: 0.153s
| SGD | epoch: 010 | loss: 0.01579 - R2: 1.0002 -- iter: 0660/1168
Training Step: 565  | total loss: [1m[32m0.01579[0m[0m | time: 0.155s
| SGD | epoch: 010 | loss: 0.01579 - R2: 1.0002 -- iter: 0680/1168
Training Step: 566  | total loss: [1m[32m0.03007[0m[0m | time: 0.157s
| SGD | epoch: 010 | loss: 0.03007 - R2: 1.0011 -- iter: 0700/1168
Training Step: 567  | total loss: [1m[32m0.03007[0m[0m | time: 0.159s
| SGD | epoch: 010 | loss: 0.03007 - R2: 1.0011 -- iter: 0720/1168
Training Step: 568  | total loss: [1m[32m0.02827[0m[0m | time: 0.161s
| SGD | epoch: 010 | loss: 0.02827 - R2: 0.9996 -- iter: 0740/1168
Training Step: 569  | total loss: [1m[32m0.02827[0m[0m | time: 0.163s
| SGD | epoch: 010 | loss: 0.02827 - R2: 0.9996 -- iter: 0760/1168
Training Step: 570  | total loss: [1m[32m0.02656[0m[0m | time: 0.165s
| SGD | epoch: 010 | loss: 0.02656 - R2: 0.9997 -- iter: 0780/1168
Training Step

Training Step: 622  | total loss: [1m[32m0.02536[0m[0m | time: 0.185s
| SGD | epoch: 011 | loss: 0.02536 - R2: 0.9997 -- iter: 0640/1168
Training Step: 623  | total loss: [1m[32m0.02470[0m[0m | time: 0.187s
| SGD | epoch: 011 | loss: 0.02470 - R2: 0.9991 -- iter: 0660/1168
Training Step: 624  | total loss: [1m[32m0.02470[0m[0m | time: 0.190s
| SGD | epoch: 011 | loss: 0.02470 - R2: 0.9991 -- iter: 0680/1168
Training Step: 625  | total loss: [1m[32m0.02437[0m[0m | time: 0.193s
| SGD | epoch: 011 | loss: 0.02437 - R2: 0.9994 -- iter: 0700/1168
Training Step: 626  | total loss: [1m[32m0.02444[0m[0m | time: 0.195s
| SGD | epoch: 011 | loss: 0.02444 - R2: 0.9990 -- iter: 0720/1168
Training Step: 627  | total loss: [1m[32m0.02343[0m[0m | time: 0.196s
| SGD | epoch: 011 | loss: 0.02343 - R2: 0.9990 -- iter: 0740/1168
Training Step: 628  | total loss: [1m[32m0.02715[0m[0m | time: 0.199s
| SGD | epoch: 011 | loss: 0.02715 - R2: 1.0001 -- iter: 0760/1168
Training Step

Training Step: 680  | total loss: [1m[32m0.01812[0m[0m | time: 0.163s
| SGD | epoch: 012 | loss: 0.01812 - R2: 0.9988 -- iter: 0620/1168
Training Step: 681  | total loss: [1m[32m0.01765[0m[0m | time: 0.165s
| SGD | epoch: 012 | loss: 0.01765 - R2: 0.9993 -- iter: 0640/1168
Training Step: 682  | total loss: [1m[32m0.01756[0m[0m | time: 0.167s
| SGD | epoch: 012 | loss: 0.01756 - R2: 1.0006 -- iter: 0660/1168
Training Step: 683  | total loss: [1m[32m0.03397[0m[0m | time: 0.169s
| SGD | epoch: 012 | loss: 0.03397 - R2: 1.0008 -- iter: 0680/1168
Training Step: 684  | total loss: [1m[32m0.03586[0m[0m | time: 0.171s
| SGD | epoch: 012 | loss: 0.03586 - R2: 1.0008 -- iter: 0700/1168
Training Step: 685  | total loss: [1m[32m0.03349[0m[0m | time: 0.173s
| SGD | epoch: 012 | loss: 0.03349 - R2: 1.0001 -- iter: 0720/1168
Training Step: 686  | total loss: [1m[32m0.03309[0m[0m | time: 0.175s
| SGD | epoch: 012 | loss: 0.03309 - R2: 1.0012 -- iter: 0740/1168
Training Step

Training Step: 738  | total loss: [1m[32m0.02422[0m[0m | time: 0.173s
| SGD | epoch: 013 | loss: 0.02422 - R2: 0.9997 -- iter: 0600/1168
Training Step: 739  | total loss: [1m[32m0.02422[0m[0m | time: 0.175s
| SGD | epoch: 013 | loss: 0.02422 - R2: 0.9997 -- iter: 0620/1168
Training Step: 740  | total loss: [1m[32m0.02330[0m[0m | time: 0.177s
| SGD | epoch: 013 | loss: 0.02330 - R2: 0.9996 -- iter: 0640/1168
Training Step: 741  | total loss: [1m[32m0.02090[0m[0m | time: 0.180s
| SGD | epoch: 013 | loss: 0.02090 - R2: 1.0001 -- iter: 0660/1168
Training Step: 742  | total loss: [1m[32m0.02001[0m[0m | time: 0.184s
| SGD | epoch: 013 | loss: 0.02001 - R2: 1.0003 -- iter: 0680/1168
Training Step: 743  | total loss: [1m[32m0.02001[0m[0m | time: 0.185s
| SGD | epoch: 013 | loss: 0.02001 - R2: 1.0003 -- iter: 0700/1168
Training Step: 744  | total loss: [1m[32m0.02062[0m[0m | time: 0.189s
| SGD | epoch: 013 | loss: 0.02062 - R2: 0.9997 -- iter: 0720/1168
Training Step

Training Step: 796  | total loss: [1m[32m0.02716[0m[0m | time: 0.112s
| SGD | epoch: 014 | loss: 0.02716 - R2: 0.9996 -- iter: 0580/1168
Training Step: 797  | total loss: [1m[32m0.02678[0m[0m | time: 0.114s
| SGD | epoch: 014 | loss: 0.02678 - R2: 0.9992 -- iter: 0600/1168
Training Step: 798  | total loss: [1m[32m0.02571[0m[0m | time: 0.115s
| SGD | epoch: 014 | loss: 0.02571 - R2: 0.9992 -- iter: 0620/1168
Training Step: 799  | total loss: [1m[32m0.02237[0m[0m | time: 0.118s
| SGD | epoch: 014 | loss: 0.02237 - R2: 0.9995 -- iter: 0640/1168
Training Step: 800  | total loss: [1m[32m0.02237[0m[0m | time: 0.120s
| SGD | epoch: 014 | loss: 0.02237 - R2: 0.9995 -- iter: 0660/1168
Training Step: 801  | total loss: [1m[32m0.02216[0m[0m | time: 0.121s
| SGD | epoch: 014 | loss: 0.02216 - R2: 0.9989 -- iter: 0680/1168
Training Step: 802  | total loss: [1m[32m0.02167[0m[0m | time: 0.123s
| SGD | epoch: 014 | loss: 0.02167 - R2: 0.9989 -- iter: 0700/1168
Training Step

Training Step: 854  | total loss: [1m[32m0.01717[0m[0m | time: 0.163s
| SGD | epoch: 015 | loss: 0.01717 - R2: 1.0004 -- iter: 0560/1168
Training Step: 855  | total loss: [1m[32m0.01717[0m[0m | time: 0.166s
| SGD | epoch: 015 | loss: 0.01717 - R2: 1.0004 -- iter: 0580/1168
Training Step: 856  | total loss: [1m[32m0.01710[0m[0m | time: 0.167s
| SGD | epoch: 015 | loss: 0.01710 - R2: 1.0004 -- iter: 0600/1168
Training Step: 857  | total loss: [1m[32m0.02043[0m[0m | time: 0.170s
| SGD | epoch: 015 | loss: 0.02043 - R2: 0.9993 -- iter: 0620/1168
Training Step: 858  | total loss: [1m[32m0.02043[0m[0m | time: 0.172s
| SGD | epoch: 015 | loss: 0.02043 - R2: 0.9993 -- iter: 0640/1168
Training Step: 859  | total loss: [1m[32m0.01898[0m[0m | time: 0.174s
| SGD | epoch: 015 | loss: 0.01898 - R2: 1.0000 -- iter: 0660/1168
Training Step: 860  | total loss: [1m[32m0.01829[0m[0m | time: 0.176s
| SGD | epoch: 015 | loss: 0.01829 - R2: 1.0004 -- iter: 0680/1168
Training Step

Training Step: 912  | total loss: [1m[32m0.01855[0m[0m | time: 0.093s
| SGD | epoch: 016 | loss: 0.01855 - R2: 1.0005 -- iter: 0540/1168
Training Step: 913  | total loss: [1m[32m0.01643[0m[0m | time: 0.096s
| SGD | epoch: 016 | loss: 0.01643 - R2: 1.0001 -- iter: 0560/1168
Training Step: 914  | total loss: [1m[32m0.01797[0m[0m | time: 0.098s
| SGD | epoch: 016 | loss: 0.01797 - R2: 1.0004 -- iter: 0580/1168
Training Step: 915  | total loss: [1m[32m0.01797[0m[0m | time: 0.100s
| SGD | epoch: 016 | loss: 0.01797 - R2: 1.0004 -- iter: 0600/1168
Training Step: 916  | total loss: [1m[32m0.01731[0m[0m | time: 0.102s
| SGD | epoch: 016 | loss: 0.01731 - R2: 1.0006 -- iter: 0620/1168
Training Step: 917  | total loss: [1m[32m0.01694[0m[0m | time: 0.107s
| SGD | epoch: 016 | loss: 0.01694 - R2: 1.0001 -- iter: 0640/1168
Training Step: 918  | total loss: [1m[32m0.01694[0m[0m | time: 0.116s
| SGD | epoch: 016 | loss: 0.01694 - R2: 1.0001 -- iter: 0660/1168
Training Step

Training Step: 970  | total loss: [1m[32m0.02903[0m[0m | time: 0.061s
| SGD | epoch: 017 | loss: 0.02903 - R2: 0.9995 -- iter: 0520/1168
Training Step: 971  | total loss: [1m[32m0.02903[0m[0m | time: 0.067s
| SGD | epoch: 017 | loss: 0.02903 - R2: 0.9995 -- iter: 0540/1168
Training Step: 972  | total loss: [1m[32m0.02696[0m[0m | time: 0.069s
| SGD | epoch: 017 | loss: 0.02696 - R2: 1.0000 -- iter: 0560/1168
Training Step: 973  | total loss: [1m[32m0.02713[0m[0m | time: 0.071s
| SGD | epoch: 017 | loss: 0.02713 - R2: 0.9987 -- iter: 0580/1168
Training Step: 974  | total loss: [1m[32m0.02596[0m[0m | time: 0.073s
| SGD | epoch: 017 | loss: 0.02596 - R2: 0.9995 -- iter: 0600/1168
Training Step: 975  | total loss: [1m[32m0.02495[0m[0m | time: 0.075s
| SGD | epoch: 017 | loss: 0.02495 - R2: 0.9990 -- iter: 0620/1168
Training Step: 976  | total loss: [1m[32m0.02356[0m[0m | time: 0.077s
| SGD | epoch: 017 | loss: 0.02356 - R2: 1.0005 -- iter: 0640/1168
Training Step

Training Step: 1028  | total loss: [1m[32m0.01807[0m[0m | time: 0.132s
| SGD | epoch: 018 | loss: 0.01807 - R2: 0.9993 -- iter: 0500/1168
Training Step: 1029  | total loss: [1m[32m0.01802[0m[0m | time: 0.133s
| SGD | epoch: 018 | loss: 0.01802 - R2: 1.0002 -- iter: 0520/1168
Training Step: 1030  | total loss: [1m[32m0.01768[0m[0m | time: 0.142s
| SGD | epoch: 018 | loss: 0.01768 - R2: 1.0002 -- iter: 0540/1168
Training Step: 1031  | total loss: [1m[32m0.01761[0m[0m | time: 0.144s
| SGD | epoch: 018 | loss: 0.01761 - R2: 0.9999 -- iter: 0560/1168
Training Step: 1032  | total loss: [1m[32m0.01694[0m[0m | time: 0.147s
| SGD | epoch: 018 | loss: 0.01694 - R2: 1.0009 -- iter: 0580/1168
Training Step: 1033  | total loss: [1m[32m0.01694[0m[0m | time: 0.148s
| SGD | epoch: 018 | loss: 0.01694 - R2: 1.0006 -- iter: 0600/1168
Training Step: 1034  | total loss: [1m[32m0.01597[0m[0m | time: 0.151s
| SGD | epoch: 018 | loss: 0.01597 - R2: 0.9999 -- iter: 0620/1168
Traini

Training Step: 1086  | total loss: [1m[32m0.02140[0m[0m | time: 0.094s
| SGD | epoch: 019 | loss: 0.02140 - R2: 1.0014 -- iter: 0480/1168
Training Step: 1087  | total loss: [1m[32m0.02269[0m[0m | time: 0.098s
| SGD | epoch: 019 | loss: 0.02269 - R2: 1.0011 -- iter: 0500/1168
Training Step: 1088  | total loss: [1m[32m0.02269[0m[0m | time: 0.099s
| SGD | epoch: 019 | loss: 0.02269 - R2: 1.0011 -- iter: 0520/1168
Training Step: 1089  | total loss: [1m[32m0.02284[0m[0m | time: 0.101s
| SGD | epoch: 019 | loss: 0.02284 - R2: 1.0007 -- iter: 0540/1168
Training Step: 1090  | total loss: [1m[32m0.03281[0m[0m | time: 0.104s
| SGD | epoch: 019 | loss: 0.03281 - R2: 1.0017 -- iter: 0560/1168
Training Step: 1091  | total loss: [1m[32m0.03061[0m[0m | time: 0.106s
| SGD | epoch: 019 | loss: 0.03061 - R2: 1.0019 -- iter: 0580/1168
Training Step: 1092  | total loss: [1m[32m0.03020[0m[0m | time: 0.107s
| SGD | epoch: 019 | loss: 0.03020 - R2: 1.0010 -- iter: 0600/1168
Traini

Training Step: 1144  | total loss: [1m[32m0.02457[0m[0m | time: 0.091s
| SGD | epoch: 020 | loss: 0.02457 - R2: 1.0008 -- iter: 0460/1168
Training Step: 1145  | total loss: [1m[32m0.02348[0m[0m | time: 0.096s
| SGD | epoch: 020 | loss: 0.02348 - R2: 0.9997 -- iter: 0480/1168
Training Step: 1146  | total loss: [1m[32m0.02160[0m[0m | time: 0.099s
| SGD | epoch: 020 | loss: 0.02160 - R2: 0.9995 -- iter: 0500/1168
Training Step: 1147  | total loss: [1m[32m0.02117[0m[0m | time: 0.101s
| SGD | epoch: 020 | loss: 0.02117 - R2: 1.0003 -- iter: 0520/1168
Training Step: 1148  | total loss: [1m[32m0.02163[0m[0m | time: 0.103s
| SGD | epoch: 020 | loss: 0.02163 - R2: 1.0006 -- iter: 0540/1168
Training Step: 1149  | total loss: [1m[32m0.02077[0m[0m | time: 0.105s
| SGD | epoch: 020 | loss: 0.02077 - R2: 1.0006 -- iter: 0560/1168
Training Step: 1150  | total loss: [1m[32m0.02037[0m[0m | time: 0.107s
| SGD | epoch: 020 | loss: 0.02037 - R2: 1.0001 -- iter: 0580/1168
Traini

Training Step: 20  | total loss: [1m[32m126.52018[0m[0m | time: 0.139s
| SGD | epoch: 001 | loss: 126.52018 - R2: 0.0050 -- iter: 1000/1168
Training Step: 21  | total loss: [1m[32m125.32157[0m[0m | time: 0.145s
| SGD | epoch: 001 | loss: 125.32157 - R2: 0.0050 -- iter: 1050/1168
Training Step: 22  | total loss: [1m[32m122.21944[0m[0m | time: 0.149s
| SGD | epoch: 001 | loss: 122.21944 - R2: 0.0056 -- iter: 1100/1168
Training Step: 23  | total loss: [1m[32m121.22050[0m[0m | time: 0.153s
| SGD | epoch: 001 | loss: 121.22050 - R2: 0.0068 -- iter: 1150/1168
Training Step: 24  | total loss: [1m[32m119.43989[0m[0m | time: 1.164s
| SGD | epoch: 001 | loss: 119.43989 - R2: 0.0075 | val_loss: 118.12394 - val_acc: 0.0097 -- iter: 1168/1168
--
Training Step: 25  | total loss: [1m[32m119.43989[0m[0m | time: 0.005s
| SGD | epoch: 002 | loss: 119.43989 - R2: 0.0082 -- iter: 0050/1168
Training Step: 26  | total loss: [1m[32m117.90457[0m[0m | time: 0.010s
| SGD | epoch: 002

Training Step: 77  | total loss: [1m[32m73.34504[0m[0m | time: 0.079s
| SGD | epoch: 004 | loss: 73.34504 - R2: 0.0848 -- iter: 0250/1168
Training Step: 78  | total loss: [1m[32m71.31438[0m[0m | time: 0.083s
| SGD | epoch: 004 | loss: 71.31438 - R2: 0.0906 -- iter: 0300/1168
Training Step: 79  | total loss: [1m[32m70.28136[0m[0m | time: 0.087s
| SGD | epoch: 004 | loss: 70.28136 - R2: 0.0937 -- iter: 0350/1168
Training Step: 80  | total loss: [1m[32m69.23603[0m[0m | time: 0.093s
| SGD | epoch: 004 | loss: 69.23603 - R2: 0.0969 -- iter: 0400/1168
Training Step: 81  | total loss: [1m[32m68.14595[0m[0m | time: 0.098s
| SGD | epoch: 004 | loss: 68.14595 - R2: 0.1002 -- iter: 0450/1168
Training Step: 82  | total loss: [1m[32m67.13463[0m[0m | time: 0.101s
| SGD | epoch: 004 | loss: 67.13463 - R2: 0.1035 -- iter: 0500/1168
Training Step: 83  | total loss: [1m[32m67.13463[0m[0m | time: 0.104s
| SGD | epoch: 004 | loss: 67.13463 - R2: 0.1035 -- iter: 0550/1168
Traini

Training Step: 134  | total loss: [1m[32m10.73291[0m[0m | time: 0.061s
| SGD | epoch: 006 | loss: 10.73291 - R2: 0.5880 -- iter: 0700/1168
Training Step: 135  | total loss: [1m[32m9.15298[0m[0m | time: 0.064s
| SGD | epoch: 006 | loss: 9.15298 - R2: 0.6071 -- iter: 0750/1168
Training Step: 136  | total loss: [1m[32m9.15298[0m[0m | time: 0.066s
| SGD | epoch: 006 | loss: 9.15298 - R2: 0.6071 -- iter: 0800/1168
Training Step: 137  | total loss: [1m[32m7.69646[0m[0m | time: 0.071s
| SGD | epoch: 006 | loss: 7.69646 - R2: 0.6467 -- iter: 0850/1168
Training Step: 138  | total loss: [1m[32m6.99029[0m[0m | time: 0.076s
| SGD | epoch: 006 | loss: 6.99029 - R2: 0.6702 -- iter: 0900/1168
Training Step: 139  | total loss: [1m[32m6.34728[0m[0m | time: 0.080s
| SGD | epoch: 006 | loss: 6.34728 - R2: 0.6922 -- iter: 0950/1168
Training Step: 140  | total loss: [1m[32m5.76537[0m[0m | time: 0.085s
| SGD | epoch: 006 | loss: 5.76537 - R2: 0.7130 -- iter: 1000/1168
Training St

Training Step: 192  | total loss: [1m[32m0.05623[0m[0m | time: 1.176s
| SGD | epoch: 008 | loss: 0.05623 - R2: 0.9982 | val_loss: 0.03310 - val_acc: 1.0004 -- iter: 1168/1168
--
Training Step: 193  | total loss: [1m[32m0.05263[0m[0m | time: 0.046s
| SGD | epoch: 009 | loss: 0.05263 - R2: 0.9985 -- iter: 0050/1168
Training Step: 194  | total loss: [1m[32m0.05263[0m[0m | time: 0.048s
| SGD | epoch: 009 | loss: 0.05263 - R2: 0.9985 -- iter: 0100/1168
Training Step: 195  | total loss: [1m[32m0.05148[0m[0m | time: 0.055s
| SGD | epoch: 009 | loss: 0.05148 - R2: 0.9992 -- iter: 0150/1168
Training Step: 196  | total loss: [1m[32m0.04838[0m[0m | time: 0.057s
| SGD | epoch: 009 | loss: 0.04838 - R2: 0.9988 -- iter: 0200/1168
Training Step: 197  | total loss: [1m[32m0.04301[0m[0m | time: 0.060s
| SGD | epoch: 009 | loss: 0.04301 - R2: 0.9989 -- iter: 0250/1168
Training Step: 198  | total loss: [1m[32m0.04301[0m[0m | time: 0.064s
| SGD | epoch: 009 | loss: 0.04301 - R2

Training Step: 250  | total loss: [1m[32m0.01965[0m[0m | time: 0.078s
| SGD | epoch: 011 | loss: 0.01965 - R2: 1.0007 -- iter: 0500/1168
Training Step: 251  | total loss: [1m[32m0.01922[0m[0m | time: 0.081s
| SGD | epoch: 011 | loss: 0.01922 - R2: 1.0012 -- iter: 0550/1168
Training Step: 252  | total loss: [1m[32m0.01893[0m[0m | time: 0.083s
| SGD | epoch: 011 | loss: 0.01893 - R2: 1.0006 -- iter: 0600/1168
Training Step: 253  | total loss: [1m[32m0.01800[0m[0m | time: 0.085s
| SGD | epoch: 011 | loss: 0.01800 - R2: 1.0006 -- iter: 0650/1168
Training Step: 254  | total loss: [1m[32m0.02015[0m[0m | time: 0.086s
| SGD | epoch: 011 | loss: 0.02015 - R2: 1.0009 -- iter: 0700/1168
Training Step: 255  | total loss: [1m[32m0.02241[0m[0m | time: 0.090s
| SGD | epoch: 011 | loss: 0.02241 - R2: 0.9999 -- iter: 0750/1168
Training Step: 256  | total loss: [1m[32m0.02209[0m[0m | time: 0.093s
| SGD | epoch: 011 | loss: 0.02209 - R2: 1.0000 -- iter: 0800/1168
Training Step

Training Step: 308  | total loss: [1m[32m0.02952[0m[0m | time: 0.118s
| SGD | epoch: 013 | loss: 0.02952 - R2: 0.9994 -- iter: 1000/1168
Training Step: 309  | total loss: [1m[32m0.02977[0m[0m | time: 0.122s
| SGD | epoch: 013 | loss: 0.02977 - R2: 0.9997 -- iter: 1050/1168
Training Step: 310  | total loss: [1m[32m0.02895[0m[0m | time: 0.128s
| SGD | epoch: 013 | loss: 0.02895 - R2: 1.0000 -- iter: 1100/1168
Training Step: 311  | total loss: [1m[32m0.02895[0m[0m | time: 0.130s
| SGD | epoch: 013 | loss: 0.02895 - R2: 1.0004 -- iter: 1150/1168
Training Step: 312  | total loss: [1m[32m0.02637[0m[0m | time: 1.136s
| SGD | epoch: 013 | loss: 0.02637 - R2: 1.0004 | val_loss: 0.02804 - val_acc: 0.9969 -- iter: 1168/1168
--
Training Step: 313  | total loss: [1m[32m0.02521[0m[0m | time: 0.004s
| SGD | epoch: 014 | loss: 0.02521 - R2: 0.9996 -- iter: 0050/1168
Training Step: 314  | total loss: [1m[32m0.02528[0m[0m | time: 0.012s
| SGD | epoch: 014 | loss: 0.02528 - R2

Training Step: 366  | total loss: [1m[32m0.01821[0m[0m | time: 0.055s
| SGD | epoch: 016 | loss: 0.01821 - R2: 0.9994 -- iter: 0300/1168
Training Step: 367  | total loss: [1m[32m0.01821[0m[0m | time: 0.057s
| SGD | epoch: 016 | loss: 0.01821 - R2: 0.9999 -- iter: 0350/1168
Training Step: 368  | total loss: [1m[32m0.01821[0m[0m | time: 0.060s
| SGD | epoch: 016 | loss: 0.01821 - R2: 0.9994 -- iter: 0400/1168
Training Step: 369  | total loss: [1m[32m0.01865[0m[0m | time: 0.069s
| SGD | epoch: 016 | loss: 0.01865 - R2: 0.9998 -- iter: 0450/1168
Training Step: 370  | total loss: [1m[32m0.02051[0m[0m | time: 0.076s
| SGD | epoch: 016 | loss: 0.02051 - R2: 1.0002 -- iter: 0500/1168
Training Step: 371  | total loss: [1m[32m0.02051[0m[0m | time: 0.079s
| SGD | epoch: 016 | loss: 0.02051 - R2: 1.0002 -- iter: 0550/1168
Training Step: 372  | total loss: [1m[32m0.02039[0m[0m | time: 0.082s
| SGD | epoch: 016 | loss: 0.02039 - R2: 1.0000 -- iter: 0600/1168
Training Step

Training Step: 424  | total loss: [1m[32m0.02189[0m[0m | time: 0.081s
| SGD | epoch: 018 | loss: 0.02189 - R2: 0.9999 -- iter: 0800/1168
Training Step: 425  | total loss: [1m[32m0.02194[0m[0m | time: 0.083s
| SGD | epoch: 018 | loss: 0.02194 - R2: 1.0000 -- iter: 0850/1168
Training Step: 426  | total loss: [1m[32m0.02178[0m[0m | time: 0.085s
| SGD | epoch: 018 | loss: 0.02178 - R2: 1.0001 -- iter: 0900/1168
Training Step: 427  | total loss: [1m[32m0.02086[0m[0m | time: 0.088s
| SGD | epoch: 018 | loss: 0.02086 - R2: 0.9998 -- iter: 0950/1168
Training Step: 428  | total loss: [1m[32m0.02282[0m[0m | time: 0.091s
| SGD | epoch: 018 | loss: 0.02282 - R2: 0.9999 -- iter: 1000/1168
Training Step: 429  | total loss: [1m[32m0.02152[0m[0m | time: 0.095s
| SGD | epoch: 018 | loss: 0.02152 - R2: 0.9999 -- iter: 1050/1168
Training Step: 430  | total loss: [1m[32m0.02152[0m[0m | time: 0.100s
| SGD | epoch: 018 | loss: 0.02152 - R2: 1.0003 -- iter: 1100/1168
Training Step

---------------------------------
Training samples: 1168
Validation samples: 292
--
Training Step: 1  | time: 0.020s
| SGD | epoch: 001 | loss: 0.00000 - R2: 0.0000 -- iter: 0064/1168
Training Step: 2  | total loss: [1m[32m131.53171[0m[0m | time: 0.022s
| SGD | epoch: 001 | loss: 131.53171 - R2: 0.0000 -- iter: 0128/1168
Training Step: 3  | total loss: [1m[32m142.00473[0m[0m | time: 0.030s
| SGD | epoch: 001 | loss: 142.00473 - R2: 0.0001 -- iter: 0192/1168
Training Step: 4  | total loss: [1m[32m142.00473[0m[0m | time: 0.054s
| SGD | epoch: 001 | loss: 142.00473 - R2: 0.0001 -- iter: 0256/1168
Training Step: 5  | total loss: [1m[32m141.43661[0m[0m | time: 0.064s
| SGD | epoch: 001 | loss: 141.43661 - R2: 0.0001 -- iter: 0320/1168
Training Step: 6  | total loss: [1m[32m140.51355[0m[0m | time: 0.068s
| SGD | epoch: 001 | loss: 140.51355 - R2: 0.0002 -- iter: 0384/1168
Training Step: 7  | total loss: [1m[32m138.87376[0m[0m | time: 0.070s
| SGD | epoch: 001 | loss: 

Training Step: 58  | total loss: [1m[32m91.21967[0m[0m | time: 0.070s
| SGD | epoch: 004 | loss: 91.21967 - R2: 0.0433 -- iter: 0064/1168
Training Step: 59  | total loss: [1m[32m90.50325[0m[0m | time: 0.073s
| SGD | epoch: 004 | loss: 90.50325 - R2: 0.0449 -- iter: 0128/1168
Training Step: 60  | total loss: [1m[32m88.18044[0m[0m | time: 0.076s
| SGD | epoch: 004 | loss: 88.18044 - R2: 0.0486 -- iter: 0192/1168
Training Step: 61  | total loss: [1m[32m87.40452[0m[0m | time: 0.079s
| SGD | epoch: 004 | loss: 87.40452 - R2: 0.0502 -- iter: 0256/1168
Training Step: 62  | total loss: [1m[32m87.40452[0m[0m | time: 0.082s
| SGD | epoch: 004 | loss: 87.40452 - R2: 0.0519 -- iter: 0320/1168
Training Step: 63  | total loss: [1m[32m86.68460[0m[0m | time: 0.084s
| SGD | epoch: 004 | loss: 86.68460 - R2: 0.0519 -- iter: 0384/1168
Training Step: 64  | total loss: [1m[32m85.01126[0m[0m | time: 0.088s
| SGD | epoch: 004 | loss: 85.01126 - R2: 0.0556 -- iter: 0448/1168
Traini

Training Step: 115  | total loss: [1m[32m32.87838[0m[0m | time: 0.082s
| SGD | epoch: 007 | loss: 32.87838 - R2: 0.2854 -- iter: 0064/1168
Training Step: 116  | total loss: [1m[32m31.72815[0m[0m | time: 0.087s
| SGD | epoch: 007 | loss: 31.72815 - R2: 0.2949 -- iter: 0128/1168
Training Step: 117  | total loss: [1m[32m30.53178[0m[0m | time: 0.089s
| SGD | epoch: 007 | loss: 30.53178 - R2: 0.3051 -- iter: 0192/1168
Training Step: 118  | total loss: [1m[32m29.44820[0m[0m | time: 0.095s
| SGD | epoch: 007 | loss: 29.44820 - R2: 0.3147 -- iter: 0256/1168
Training Step: 119  | total loss: [1m[32m27.17241[0m[0m | time: 0.100s
| SGD | epoch: 007 | loss: 27.17241 - R2: 0.3361 -- iter: 0320/1168
Training Step: 120  | total loss: [1m[32m27.17241[0m[0m | time: 0.104s
| SGD | epoch: 007 | loss: 27.17241 - R2: 0.3472 -- iter: 0384/1168
Training Step: 121  | total loss: [1m[32m24.97681[0m[0m | time: 0.108s
| SGD | epoch: 007 | loss: 24.97681 - R2: 0.3583 -- iter: 0448/1168

Training Step: 172  | total loss: [1m[32m0.30995[0m[0m | time: 0.078s
| SGD | epoch: 010 | loss: 0.30995 - R2: 0.9860 -- iter: 0064/1168
Training Step: 173  | total loss: [1m[32m0.30995[0m[0m | time: 0.083s
| SGD | epoch: 010 | loss: 0.30995 - R2: 0.9860 -- iter: 0128/1168
Training Step: 174  | total loss: [1m[32m0.25599[0m[0m | time: 0.088s
| SGD | epoch: 010 | loss: 0.25599 - R2: 0.9884 -- iter: 0192/1168
Training Step: 175  | total loss: [1m[32m0.23286[0m[0m | time: 0.095s
| SGD | epoch: 010 | loss: 0.23286 - R2: 0.9892 -- iter: 0256/1168
Training Step: 176  | total loss: [1m[32m0.21199[0m[0m | time: 0.100s
| SGD | epoch: 010 | loss: 0.21199 - R2: 0.9900 -- iter: 0320/1168
Training Step: 177  | total loss: [1m[32m0.19780[0m[0m | time: 0.105s
| SGD | epoch: 010 | loss: 0.19780 - R2: 0.9914 -- iter: 0384/1168
Training Step: 178  | total loss: [1m[32m0.18064[0m[0m | time: 0.109s
| SGD | epoch: 010 | loss: 0.18064 - R2: 0.9924 -- iter: 0448/1168
Training Step

Training Step: 230  | total loss: [1m[32m0.04833[0m[0m | time: 0.076s
| SGD | epoch: 013 | loss: 0.04833 - R2: 0.9989 -- iter: 0128/1168
Training Step: 231  | total loss: [1m[32m0.04833[0m[0m | time: 0.081s
| SGD | epoch: 013 | loss: 0.04833 - R2: 0.9989 -- iter: 0192/1168
Training Step: 232  | total loss: [1m[32m0.04267[0m[0m | time: 0.085s
| SGD | epoch: 013 | loss: 0.04267 - R2: 0.9993 -- iter: 0256/1168
Training Step: 233  | total loss: [1m[32m0.04267[0m[0m | time: 0.087s
| SGD | epoch: 013 | loss: 0.04267 - R2: 0.9998 -- iter: 0320/1168
Training Step: 234  | total loss: [1m[32m0.04208[0m[0m | time: 0.090s
| SGD | epoch: 013 | loss: 0.04208 - R2: 0.9998 -- iter: 0384/1168
Training Step: 235  | total loss: [1m[32m0.04147[0m[0m | time: 0.093s
| SGD | epoch: 013 | loss: 0.04147 - R2: 0.9996 -- iter: 0448/1168
Training Step: 236  | total loss: [1m[32m0.03880[0m[0m | time: 0.095s
| SGD | epoch: 013 | loss: 0.03880 - R2: 0.9998 -- iter: 0512/1168
Training Step

Training Step: 288  | total loss: [1m[32m0.05178[0m[0m | time: 0.091s
| SGD | epoch: 016 | loss: 0.05178 - R2: 1.0011 -- iter: 0192/1168
Training Step: 289  | total loss: [1m[32m0.05178[0m[0m | time: 0.093s
| SGD | epoch: 016 | loss: 0.05178 - R2: 1.0011 -- iter: 0256/1168
Training Step: 290  | total loss: [1m[32m0.04821[0m[0m | time: 0.096s
| SGD | epoch: 016 | loss: 0.04821 - R2: 1.0003 -- iter: 0320/1168
Training Step: 291  | total loss: [1m[32m0.04708[0m[0m | time: 0.098s
| SGD | epoch: 016 | loss: 0.04708 - R2: 1.0002 -- iter: 0384/1168
Training Step: 292  | total loss: [1m[32m0.04409[0m[0m | time: 0.102s
| SGD | epoch: 016 | loss: 0.04409 - R2: 0.9998 -- iter: 0448/1168
Training Step: 293  | total loss: [1m[32m0.04100[0m[0m | time: 0.105s
| SGD | epoch: 016 | loss: 0.04100 - R2: 0.9998 -- iter: 0512/1168
Training Step: 294  | total loss: [1m[32m0.03865[0m[0m | time: 0.108s
| SGD | epoch: 016 | loss: 0.03865 - R2: 0.9996 -- iter: 0576/1168
Training Step

Training Step: 346  | total loss: [1m[32m0.02480[0m[0m | time: 0.065s
| SGD | epoch: 019 | loss: 0.02480 - R2: 0.9991 -- iter: 0256/1168
Training Step: 347  | total loss: [1m[32m0.02424[0m[0m | time: 0.069s
| SGD | epoch: 019 | loss: 0.02424 - R2: 0.9991 -- iter: 0320/1168
Training Step: 348  | total loss: [1m[32m0.02365[0m[0m | time: 0.073s
| SGD | epoch: 019 | loss: 0.02365 - R2: 0.9995 -- iter: 0384/1168
Training Step: 349  | total loss: [1m[32m0.02380[0m[0m | time: 0.076s
| SGD | epoch: 019 | loss: 0.02380 - R2: 0.9995 -- iter: 0448/1168
Training Step: 350  | total loss: [1m[32m0.02503[0m[0m | time: 0.078s
| SGD | epoch: 019 | loss: 0.02503 - R2: 1.0002 -- iter: 0512/1168
Training Step: 351  | total loss: [1m[32m0.02395[0m[0m | time: 0.079s
| SGD | epoch: 019 | loss: 0.02395 - R2: 1.0001 -- iter: 0576/1168
Training Step: 352  | total loss: [1m[32m0.02317[0m[0m | time: 0.081s
| SGD | epoch: 019 | loss: 0.02317 - R2: 0.9992 -- iter: 0640/1168
Training Step

Training Step: 21  | total loss: [1m[32m125.28558[0m[0m | time: 0.056s
| SGD | epoch: 002 | loss: 125.28558 - R2: 0.0055 -- iter: 0128/1168
Training Step: 22  | total loss: [1m[32m123.73078[0m[0m | time: 0.064s
| SGD | epoch: 002 | loss: 123.73078 - R2: 0.0061 -- iter: 0192/1168
Training Step: 23  | total loss: [1m[32m123.73078[0m[0m | time: 0.067s
| SGD | epoch: 002 | loss: 123.73078 - R2: 0.0061 -- iter: 0256/1168
Training Step: 24  | total loss: [1m[32m122.69962[0m[0m | time: 0.070s
| SGD | epoch: 002 | loss: 122.69962 - R2: 0.0067 -- iter: 0320/1168
Training Step: 25  | total loss: [1m[32m121.28933[0m[0m | time: 0.073s
| SGD | epoch: 002 | loss: 121.28933 - R2: 0.0073 -- iter: 0384/1168
Training Step: 26  | total loss: [1m[32m120.18050[0m[0m | time: 0.075s
| SGD | epoch: 002 | loss: 120.18050 - R2: 0.0080 -- iter: 0448/1168
Training Step: 27  | total loss: [1m[32m119.08270[0m[0m | time: 0.078s
| SGD | epoch: 002 | loss: 119.08270 - R2: 0.0087 -- iter: 05

Training Step: 78  | total loss: [1m[32m72.82771[0m[0m | time: 0.012s
| SGD | epoch: 005 | loss: 72.82771 - R2: 0.0860 -- iter: 0128/1168
Training Step: 79  | total loss: [1m[32m71.97852[0m[0m | time: 0.015s
| SGD | epoch: 005 | loss: 71.97852 - R2: 0.0887 -- iter: 0192/1168
Training Step: 80  | total loss: [1m[32m70.23663[0m[0m | time: 0.020s
| SGD | epoch: 005 | loss: 70.23663 - R2: 0.0942 -- iter: 0256/1168
Training Step: 81  | total loss: [1m[32m69.35363[0m[0m | time: 0.025s
| SGD | epoch: 005 | loss: 69.35363 - R2: 0.0972 -- iter: 0320/1168
Training Step: 82  | total loss: [1m[32m68.32396[0m[0m | time: 0.028s
| SGD | epoch: 005 | loss: 68.32396 - R2: 0.1004 -- iter: 0384/1168
Training Step: 83  | total loss: [1m[32m68.32396[0m[0m | time: 0.032s
| SGD | epoch: 005 | loss: 68.32396 - R2: 0.1004 -- iter: 0448/1168
Training Step: 84  | total loss: [1m[32m66.25510[0m[0m | time: 0.035s
| SGD | epoch: 005 | loss: 66.25510 - R2: 0.1072 -- iter: 0512/1168
Traini

Training Step: 135  | total loss: [1m[32m11.19360[0m[0m | time: 0.109s
| SGD | epoch: 008 | loss: 11.19360 - R2: 0.5605 -- iter: 0128/1168
Training Step: 136  | total loss: [1m[32m9.58802[0m[0m | time: 0.115s
| SGD | epoch: 008 | loss: 9.58802 - R2: 0.5971 -- iter: 0192/1168
Training Step: 137  | total loss: [1m[32m9.58802[0m[0m | time: 0.118s
| SGD | epoch: 008 | loss: 9.58802 - R2: 0.5971 -- iter: 0256/1168
Training Step: 138  | total loss: [1m[32m8.07572[0m[0m | time: 0.121s
| SGD | epoch: 008 | loss: 8.07572 - R2: 0.6369 -- iter: 0320/1168
Training Step: 139  | total loss: [1m[32m7.39605[0m[0m | time: 0.124s
| SGD | epoch: 008 | loss: 7.39605 - R2: 0.6561 -- iter: 0384/1168
Training Step: 140  | total loss: [1m[32m6.74696[0m[0m | time: 0.127s
| SGD | epoch: 008 | loss: 6.74696 - R2: 0.6761 -- iter: 0448/1168
Training Step: 141  | total loss: [1m[32m6.74696[0m[0m | time: 0.130s
| SGD | epoch: 008 | loss: 6.74696 - R2: 0.6761 -- iter: 0512/1168
Training St

Training Step: 193  | total loss: [1m[32m0.05801[0m[0m | time: 0.011s
| SGD | epoch: 011 | loss: 0.05801 - R2: 0.9985 -- iter: 0192/1168
Training Step: 194  | total loss: [1m[32m0.05457[0m[0m | time: 0.015s
| SGD | epoch: 011 | loss: 0.05457 - R2: 0.9988 -- iter: 0256/1168
Training Step: 195  | total loss: [1m[32m0.05457[0m[0m | time: 0.018s
| SGD | epoch: 011 | loss: 0.05457 - R2: 0.9988 -- iter: 0320/1168
Training Step: 196  | total loss: [1m[32m0.05096[0m[0m | time: 0.021s
| SGD | epoch: 011 | loss: 0.05096 - R2: 0.9992 -- iter: 0384/1168
Training Step: 197  | total loss: [1m[32m0.04819[0m[0m | time: 0.024s
| SGD | epoch: 011 | loss: 0.04819 - R2: 0.9987 -- iter: 0448/1168
Training Step: 198  | total loss: [1m[32m0.04537[0m[0m | time: 0.029s
| SGD | epoch: 011 | loss: 0.04537 - R2: 0.9988 -- iter: 0512/1168
Training Step: 199  | total loss: [1m[32m0.04537[0m[0m | time: 0.031s
| SGD | epoch: 011 | loss: 0.04537 - R2: 0.9988 -- iter: 0576/1168
Training Step

Training Step: 251  | total loss: [1m[32m0.03781[0m[0m | time: 0.056s
| SGD | epoch: 014 | loss: 0.03781 - R2: 1.0002 -- iter: 0256/1168
Training Step: 252  | total loss: [1m[32m0.03514[0m[0m | time: 0.060s
| SGD | epoch: 014 | loss: 0.03514 - R2: 0.9997 -- iter: 0320/1168
Training Step: 253  | total loss: [1m[32m0.03514[0m[0m | time: 0.064s
| SGD | epoch: 014 | loss: 0.03514 - R2: 0.9997 -- iter: 0384/1168
Training Step: 254  | total loss: [1m[32m0.03290[0m[0m | time: 0.066s
| SGD | epoch: 014 | loss: 0.03290 - R2: 0.9998 -- iter: 0448/1168
Training Step: 255  | total loss: [1m[32m0.03096[0m[0m | time: 0.069s
| SGD | epoch: 014 | loss: 0.03096 - R2: 0.9991 -- iter: 0512/1168
Training Step: 256  | total loss: [1m[32m0.03048[0m[0m | time: 0.072s
| SGD | epoch: 014 | loss: 0.03048 - R2: 0.9994 -- iter: 0576/1168
Training Step: 257  | total loss: [1m[32m0.03048[0m[0m | time: 0.074s
| SGD | epoch: 014 | loss: 0.03048 - R2: 0.9994 -- iter: 0640/1168
Training Step

Training Step: 309  | total loss: [1m[32m0.04570[0m[0m | time: 0.027s
| SGD | epoch: 017 | loss: 0.04570 - R2: 1.0007 -- iter: 0320/1168
Training Step: 310  | total loss: [1m[32m0.04269[0m[0m | time: 0.032s
| SGD | epoch: 017 | loss: 0.04269 - R2: 1.0000 -- iter: 0384/1168
Training Step: 311  | total loss: [1m[32m0.03958[0m[0m | time: 0.035s
| SGD | epoch: 017 | loss: 0.03958 - R2: 0.9995 -- iter: 0448/1168
Training Step: 312  | total loss: [1m[32m0.03958[0m[0m | time: 0.037s
| SGD | epoch: 017 | loss: 0.03958 - R2: 0.9995 -- iter: 0512/1168
Training Step: 313  | total loss: [1m[32m0.03832[0m[0m | time: 0.040s
| SGD | epoch: 017 | loss: 0.03832 - R2: 0.9993 -- iter: 0576/1168
Training Step: 314  | total loss: [1m[32m0.03369[0m[0m | time: 0.043s
| SGD | epoch: 017 | loss: 0.03369 - R2: 0.9998 -- iter: 0640/1168
Training Step: 315  | total loss: [1m[32m0.03241[0m[0m | time: 0.046s
| SGD | epoch: 017 | loss: 0.03241 - R2: 0.9995 -- iter: 0704/1168
Training Step

Training Step: 367  | total loss: [1m[32m0.05014[0m[0m | time: 0.047s
| SGD | epoch: 020 | loss: 0.05014 - R2: 0.9990 -- iter: 0384/1168
Training Step: 368  | total loss: [1m[32m0.04738[0m[0m | time: 0.050s
| SGD | epoch: 020 | loss: 0.04738 - R2: 0.9985 -- iter: 0448/1168
Training Step: 369  | total loss: [1m[32m0.04738[0m[0m | time: 0.052s
| SGD | epoch: 020 | loss: 0.04738 - R2: 0.9985 -- iter: 0512/1168
Training Step: 370  | total loss: [1m[32m0.04129[0m[0m | time: 0.055s
| SGD | epoch: 020 | loss: 0.04129 - R2: 0.9989 -- iter: 0576/1168
Training Step: 371  | total loss: [1m[32m0.03986[0m[0m | time: 0.058s
| SGD | epoch: 020 | loss: 0.03986 - R2: 0.9995 -- iter: 0640/1168
Training Step: 372  | total loss: [1m[32m0.03787[0m[0m | time: 0.062s
| SGD | epoch: 020 | loss: 0.03787 - R2: 0.9995 -- iter: 0704/1168
Training Step: 373  | total loss: [1m[32m0.03832[0m[0m | time: 0.066s
| SGD | epoch: 020 | loss: 0.03832 - R2: 0.9997 -- iter: 0768/1168
Training Step

Training Step: 42  | total loss: [1m[32m106.20991[0m[0m | time: 0.093s
| SGD | epoch: 003 | loss: 106.20991 - R2: 0.0219 -- iter: 0256/1168
Training Step: 43  | total loss: [1m[32m103.97104[0m[0m | time: 0.098s
| SGD | epoch: 003 | loss: 103.97104 - R2: 0.0242 -- iter: 0320/1168
Training Step: 44  | total loss: [1m[32m103.97104[0m[0m | time: 0.104s
| SGD | epoch: 003 | loss: 103.97104 - R2: 0.0242 -- iter: 0384/1168
Training Step: 45  | total loss: [1m[32m102.86520[0m[0m | time: 0.107s
| SGD | epoch: 003 | loss: 102.86520 - R2: 0.0255 -- iter: 0448/1168
Training Step: 46  | total loss: [1m[32m100.74028[0m[0m | time: 0.110s
| SGD | epoch: 003 | loss: 100.74028 - R2: 0.0280 -- iter: 0512/1168
Training Step: 47  | total loss: [1m[32m99.62471[0m[0m | time: 0.112s
| SGD | epoch: 003 | loss: 99.62471 - R2: 0.0293 -- iter: 0576/1168
Training Step: 48  | total loss: [1m[32m98.63583[0m[0m | time: 0.114s
| SGD | epoch: 003 | loss: 98.63583 - R2: 0.0307 -- iter: 0640/1

Training Step: 99  | total loss: [1m[32m48.87430[0m[0m | time: 0.022s
| SGD | epoch: 006 | loss: 48.87430 - R2: 0.1815 -- iter: 0256/1168
Training Step: 100  | total loss: [1m[32m47.66395[0m[0m | time: 0.025s
| SGD | epoch: 006 | loss: 47.66395 - R2: 0.1880 -- iter: 0320/1168
Training Step: 101  | total loss: [1m[32m46.44298[0m[0m | time: 0.028s
| SGD | epoch: 006 | loss: 46.44298 - R2: 0.1946 -- iter: 0384/1168
Training Step: 102  | total loss: [1m[32m45.28674[0m[0m | time: 0.031s
| SGD | epoch: 006 | loss: 45.28674 - R2: 0.2012 -- iter: 0448/1168
Training Step: 103  | total loss: [1m[32m45.28674[0m[0m | time: 0.033s
| SGD | epoch: 006 | loss: 45.28674 - R2: 0.2081 -- iter: 0512/1168
Training Step: 104  | total loss: [1m[32m44.10451[0m[0m | time: 0.035s
| SGD | epoch: 006 | loss: 44.10451 - R2: 0.2081 -- iter: 0576/1168
Training Step: 105  | total loss: [1m[32m42.95579[0m[0m | time: 0.036s
| SGD | epoch: 006 | loss: 42.95579 - R2: 0.2230 -- iter: 0640/1168


Training Step: 156  | total loss: [1m[32m1.31199[0m[0m | time: 0.088s
| SGD | epoch: 009 | loss: 1.31199 - R2: 0.9332 -- iter: 0256/1168
Training Step: 157  | total loss: [1m[32m1.18835[0m[0m | time: 0.090s
| SGD | epoch: 009 | loss: 1.18835 - R2: 0.9399 -- iter: 0320/1168
Training Step: 158  | total loss: [1m[32m1.07782[0m[0m | time: 0.092s
| SGD | epoch: 009 | loss: 1.07782 - R2: 0.9443 -- iter: 0384/1168
Training Step: 159  | total loss: [1m[32m0.97596[0m[0m | time: 0.094s
| SGD | epoch: 009 | loss: 0.97596 - R2: 0.9548 -- iter: 0448/1168
Training Step: 160  | total loss: [1m[32m0.88095[0m[0m | time: 0.096s
| SGD | epoch: 009 | loss: 0.88095 - R2: 0.9548 -- iter: 0512/1168
Training Step: 161  | total loss: [1m[32m0.79489[0m[0m | time: 0.098s
| SGD | epoch: 009 | loss: 0.79489 - R2: 0.9590 -- iter: 0576/1168
Training Step: 162  | total loss: [1m[32m0.65455[0m[0m | time: 0.100s
| SGD | epoch: 009 | loss: 0.65455 - R2: 0.9680 -- iter: 0640/1168
Training Step

Training Step: 214  | total loss: [1m[32m0.03913[0m[0m | time: 0.087s
| SGD | epoch: 012 | loss: 0.03913 - R2: 0.9996 -- iter: 0320/1168
Training Step: 215  | total loss: [1m[32m0.03655[0m[0m | time: 0.090s
| SGD | epoch: 012 | loss: 0.03655 - R2: 0.9998 -- iter: 0384/1168
Training Step: 216  | total loss: [1m[32m0.03571[0m[0m | time: 0.093s
| SGD | epoch: 012 | loss: 0.03571 - R2: 0.9996 -- iter: 0448/1168
Training Step: 217  | total loss: [1m[32m0.03374[0m[0m | time: 0.097s
| SGD | epoch: 012 | loss: 0.03374 - R2: 0.9996 -- iter: 0512/1168
Training Step: 218  | total loss: [1m[32m0.03374[0m[0m | time: 0.101s
| SGD | epoch: 012 | loss: 0.03374 - R2: 0.9996 -- iter: 0576/1168
Training Step: 219  | total loss: [1m[32m0.03100[0m[0m | time: 0.104s
| SGD | epoch: 012 | loss: 0.03100 - R2: 1.0002 -- iter: 0640/1168
Training Step: 220  | total loss: [1m[32m0.02898[0m[0m | time: 0.107s
| SGD | epoch: 012 | loss: 0.02898 - R2: 1.0004 -- iter: 0704/1168
Training Step

Training Step: 272  | total loss: [1m[32m0.03990[0m[0m | time: 0.027s
| SGD | epoch: 015 | loss: 0.03990 - R2: 0.9995 -- iter: 0384/1168
Training Step: 273  | total loss: [1m[32m0.03735[0m[0m | time: 0.041s
| SGD | epoch: 015 | loss: 0.03735 - R2: 0.9997 -- iter: 0448/1168
Training Step: 274  | total loss: [1m[32m0.03735[0m[0m | time: 0.047s
| SGD | epoch: 015 | loss: 0.03735 - R2: 0.9997 -- iter: 0512/1168
Training Step: 275  | total loss: [1m[32m0.03352[0m[0m | time: 0.054s
| SGD | epoch: 015 | loss: 0.03352 - R2: 0.9998 -- iter: 0576/1168
Training Step: 276  | total loss: [1m[32m0.03195[0m[0m | time: 0.059s
| SGD | epoch: 015 | loss: 0.03195 - R2: 1.0001 -- iter: 0640/1168
Training Step: 277  | total loss: [1m[32m0.03195[0m[0m | time: 0.061s
| SGD | epoch: 015 | loss: 0.03195 - R2: 1.0001 -- iter: 0704/1168
Training Step: 278  | total loss: [1m[32m0.03058[0m[0m | time: 0.063s
| SGD | epoch: 015 | loss: 0.03058 - R2: 1.0001 -- iter: 0768/1168
Training Step

Training Step: 330  | total loss: [1m[32m0.04258[0m[0m | time: 0.029s
| SGD | epoch: 018 | loss: 0.04258 - R2: 1.0001 -- iter: 0448/1168
Training Step: 331  | total loss: [1m[32m0.04009[0m[0m | time: 0.031s
| SGD | epoch: 018 | loss: 0.04009 - R2: 0.9996 -- iter: 0512/1168
Training Step: 332  | total loss: [1m[32m0.03571[0m[0m | time: 0.034s
| SGD | epoch: 018 | loss: 0.03571 - R2: 0.9995 -- iter: 0576/1168
Training Step: 333  | total loss: [1m[32m0.03355[0m[0m | time: 0.037s
| SGD | epoch: 018 | loss: 0.03355 - R2: 0.9991 -- iter: 0640/1168
Training Step: 334  | total loss: [1m[32m0.03205[0m[0m | time: 0.040s
| SGD | epoch: 018 | loss: 0.03205 - R2: 0.9991 -- iter: 0704/1168
Training Step: 335  | total loss: [1m[32m0.03205[0m[0m | time: 0.042s
| SGD | epoch: 018 | loss: 0.03205 - R2: 0.9991 -- iter: 0768/1168
Training Step: 336  | total loss: [1m[32m0.03074[0m[0m | time: 0.044s
| SGD | epoch: 018 | loss: 0.03074 - R2: 0.9991 -- iter: 0832/1168
Training Step

Training Step: 388  | total loss: [1m[32m0.02187[0m[0m | time: 0.109s
| SGD | epoch: 021 | loss: 0.02187 - R2: 0.9995 -- iter: 0512/1168
Training Step: 389  | total loss: [1m[32m0.02203[0m[0m | time: 0.113s
| SGD | epoch: 021 | loss: 0.02203 - R2: 1.0000 -- iter: 0576/1168
Training Step: 390  | total loss: [1m[32m0.02192[0m[0m | time: 0.117s
| SGD | epoch: 021 | loss: 0.02192 - R2: 1.0000 -- iter: 0640/1168
Training Step: 391  | total loss: [1m[32m0.02192[0m[0m | time: 0.120s
| SGD | epoch: 021 | loss: 0.02192 - R2: 1.0000 -- iter: 0704/1168
Training Step: 392  | total loss: [1m[32m0.02288[0m[0m | time: 0.122s
| SGD | epoch: 021 | loss: 0.02288 - R2: 1.0005 -- iter: 0768/1168
Training Step: 393  | total loss: [1m[32m0.02288[0m[0m | time: 0.124s
| SGD | epoch: 021 | loss: 0.02288 - R2: 1.0005 -- iter: 0832/1168
Training Step: 394  | total loss: [1m[32m0.02222[0m[0m | time: 0.126s
| SGD | epoch: 021 | loss: 0.02222 - R2: 1.0001 -- iter: 0896/1168
Training Step

Training Step: 446  | total loss: [1m[32m0.02309[0m[0m | time: 0.039s
| SGD | epoch: 024 | loss: 0.02309 - R2: 1.0008 -- iter: 0576/1168
Training Step: 447  | total loss: [1m[32m0.02297[0m[0m | time: 0.041s
| SGD | epoch: 024 | loss: 0.02297 - R2: 1.0004 -- iter: 0640/1168
Training Step: 448  | total loss: [1m[32m0.02285[0m[0m | time: 0.045s
| SGD | epoch: 024 | loss: 0.02285 - R2: 1.0001 -- iter: 0704/1168
Training Step: 449  | total loss: [1m[32m0.02149[0m[0m | time: 0.048s
| SGD | epoch: 024 | loss: 0.02149 - R2: 0.9995 -- iter: 0768/1168
Training Step: 450  | total loss: [1m[32m0.02107[0m[0m | time: 0.055s
| SGD | epoch: 024 | loss: 0.02107 - R2: 0.9997 -- iter: 0832/1168
Training Step: 451  | total loss: [1m[32m0.02107[0m[0m | time: 0.057s
| SGD | epoch: 024 | loss: 0.02107 - R2: 0.9997 -- iter: 0896/1168
Training Step: 452  | total loss: [1m[32m0.02049[0m[0m | time: 0.059s
| SGD | epoch: 024 | loss: 0.02049 - R2: 0.9996 -- iter: 0960/1168
Training Step

Training Step: 504  | total loss: [1m[32m0.02460[0m[0m | time: 0.118s
| SGD | epoch: 027 | loss: 0.02460 - R2: 0.9997 -- iter: 0640/1168
Training Step: 505  | total loss: [1m[32m0.02335[0m[0m | time: 0.121s
| SGD | epoch: 027 | loss: 0.02335 - R2: 1.0001 -- iter: 0704/1168
Training Step: 506  | total loss: [1m[32m0.02436[0m[0m | time: 0.124s
| SGD | epoch: 027 | loss: 0.02436 - R2: 1.0006 -- iter: 0768/1168
Training Step: 507  | total loss: [1m[32m0.02369[0m[0m | time: 0.127s
| SGD | epoch: 027 | loss: 0.02369 - R2: 1.0005 -- iter: 0832/1168
Training Step: 508  | total loss: [1m[32m0.02369[0m[0m | time: 0.130s
| SGD | epoch: 027 | loss: 0.02369 - R2: 1.0005 -- iter: 0896/1168
Training Step: 509  | total loss: [1m[32m0.02257[0m[0m | time: 0.134s
| SGD | epoch: 027 | loss: 0.02257 - R2: 0.9995 -- iter: 0960/1168
Training Step: 510  | total loss: [1m[32m0.02257[0m[0m | time: 0.136s
| SGD | epoch: 027 | loss: 0.02257 - R2: 0.9995 -- iter: 1024/1168
Training Step

Training Step: 562  | total loss: [1m[32m0.01686[0m[0m | time: 0.129s
| SGD | epoch: 030 | loss: 0.01686 - R2: 0.9992 -- iter: 0704/1168
Training Step: 563  | total loss: [1m[32m0.01686[0m[0m | time: 0.131s
| SGD | epoch: 030 | loss: 0.01686 - R2: 0.9992 -- iter: 0768/1168
Training Step: 564  | total loss: [1m[32m0.01803[0m[0m | time: 0.133s
| SGD | epoch: 030 | loss: 0.01803 - R2: 1.0000 -- iter: 0832/1168
Training Step: 565  | total loss: [1m[32m0.01974[0m[0m | time: 0.137s
| SGD | epoch: 030 | loss: 0.01974 - R2: 1.0001 -- iter: 0896/1168
Training Step: 566  | total loss: [1m[32m0.01974[0m[0m | time: 0.144s
| SGD | epoch: 030 | loss: 0.01974 - R2: 1.0001 -- iter: 0960/1168
Training Step: 567  | total loss: [1m[32m0.01979[0m[0m | time: 0.150s
| SGD | epoch: 030 | loss: 0.01979 - R2: 1.0008 -- iter: 1024/1168
Training Step: 568  | total loss: [1m[32m0.04184[0m[0m | time: 0.152s
| SGD | epoch: 030 | loss: 0.04184 - R2: 1.0006 -- iter: 1088/1168
Training Step

Training Step: 620  | total loss: [1m[32m0.01613[0m[0m | time: 0.138s
| SGD | epoch: 033 | loss: 0.01613 - R2: 0.9991 -- iter: 0768/1168
Training Step: 621  | total loss: [1m[32m0.01709[0m[0m | time: 0.140s
| SGD | epoch: 033 | loss: 0.01709 - R2: 0.9985 -- iter: 0832/1168
Training Step: 622  | total loss: [1m[32m0.01709[0m[0m | time: 0.143s
| SGD | epoch: 033 | loss: 0.01709 - R2: 0.9987 -- iter: 0896/1168
Training Step: 623  | total loss: [1m[32m0.01642[0m[0m | time: 0.146s
| SGD | epoch: 033 | loss: 0.01642 - R2: 0.9990 -- iter: 0960/1168
Training Step: 624  | total loss: [1m[32m0.01642[0m[0m | time: 0.147s
| SGD | epoch: 033 | loss: 0.01642 - R2: 0.9992 -- iter: 1024/1168
Training Step: 625  | total loss: [1m[32m0.01904[0m[0m | time: 0.151s
| SGD | epoch: 033 | loss: 0.01904 - R2: 0.9997 -- iter: 1088/1168
Training Step: 626  | total loss: [1m[32m0.02055[0m[0m | time: 0.154s
| SGD | epoch: 033 | loss: 0.02055 - R2: 1.0001 -- iter: 1152/1168
Training Step

Training Step: 678  | total loss: [1m[32m0.03063[0m[0m | time: 0.120s
| SGD | epoch: 036 | loss: 0.03063 - R2: 0.9992 -- iter: 0832/1168
Training Step: 679  | total loss: [1m[32m0.02906[0m[0m | time: 0.122s
| SGD | epoch: 036 | loss: 0.02906 - R2: 0.9991 -- iter: 0896/1168
Training Step: 680  | total loss: [1m[32m0.02698[0m[0m | time: 0.124s
| SGD | epoch: 036 | loss: 0.02698 - R2: 0.9999 -- iter: 0960/1168
Training Step: 681  | total loss: [1m[32m0.02366[0m[0m | time: 0.126s
| SGD | epoch: 036 | loss: 0.02366 - R2: 1.0002 -- iter: 1024/1168
Training Step: 682  | total loss: [1m[32m0.02366[0m[0m | time: 0.128s
| SGD | epoch: 036 | loss: 0.02366 - R2: 1.0002 -- iter: 1088/1168
Training Step: 683  | total loss: [1m[32m0.02349[0m[0m | time: 0.130s
| SGD | epoch: 036 | loss: 0.02349 - R2: 1.0006 -- iter: 1152/1168
Training Step: 684  | total loss: [1m[32m0.02222[0m[0m | time: 1.135s
| SGD | epoch: 036 | loss: 0.02222 - R2: 1.0004 | val_loss: 0.02782 - val_acc: 0

Training Step: 736  | total loss: [1m[32m0.03384[0m[0m | time: 0.064s
| SGD | epoch: 039 | loss: 0.03384 - R2: 1.0004 -- iter: 0896/1168
Training Step: 737  | total loss: [1m[32m0.03069[0m[0m | time: 0.067s
| SGD | epoch: 039 | loss: 0.03069 - R2: 1.0001 -- iter: 0960/1168
Training Step: 738  | total loss: [1m[32m0.02920[0m[0m | time: 0.070s
| SGD | epoch: 039 | loss: 0.02920 - R2: 0.9999 -- iter: 1024/1168
Training Step: 739  | total loss: [1m[32m0.02920[0m[0m | time: 0.071s
| SGD | epoch: 039 | loss: 0.02920 - R2: 0.9999 -- iter: 1088/1168
Training Step: 740  | total loss: [1m[32m0.02754[0m[0m | time: 0.073s
| SGD | epoch: 039 | loss: 0.02754 - R2: 1.0000 -- iter: 1152/1168
Training Step: 741  | total loss: [1m[32m0.02472[0m[0m | time: 1.083s
| SGD | epoch: 039 | loss: 0.02472 - R2: 0.9998 | val_loss: 0.02767 - val_acc: 1.0002 -- iter: 1168/1168
--
Training Step: 742  | total loss: [1m[32m0.02472[0m[0m | time: 0.089s
| SGD | epoch: 040 | loss: 0.02472 - R2

Training Step: 794  | total loss: [1m[32m0.02480[0m[0m | time: 0.157s
| SGD | epoch: 042 | loss: 0.02480 - R2: 1.0000 -- iter: 0960/1168
Training Step: 795  | total loss: [1m[32m0.02447[0m[0m | time: 0.162s
| SGD | epoch: 042 | loss: 0.02447 - R2: 1.0000 -- iter: 1024/1168
Training Step: 796  | total loss: [1m[32m0.02365[0m[0m | time: 0.177s
| SGD | epoch: 042 | loss: 0.02365 - R2: 0.9999 -- iter: 1088/1168
Training Step: 797  | total loss: [1m[32m0.02333[0m[0m | time: 0.182s
| SGD | epoch: 042 | loss: 0.02333 - R2: 0.9999 -- iter: 1152/1168
Training Step: 798  | total loss: [1m[32m0.02270[0m[0m | time: 1.187s
| SGD | epoch: 042 | loss: 0.02270 - R2: 1.0000 | val_loss: 0.02835 - val_acc: 1.0000 -- iter: 1168/1168
--
Training Step: 799  | total loss: [1m[32m0.02134[0m[0m | time: 0.085s
| SGD | epoch: 043 | loss: 0.02134 - R2: 0.9996 -- iter: 0064/1168
Training Step: 800  | total loss: [1m[32m0.02001[0m[0m | time: 0.089s
| SGD | epoch: 043 | loss: 0.02001 - R2

Training Step: 852  | total loss: [1m[32m0.03841[0m[0m | time: 0.086s
| SGD | epoch: 045 | loss: 0.03841 - R2: 0.9992 -- iter: 1024/1168
Training Step: 853  | total loss: [1m[32m0.03841[0m[0m | time: 0.088s
| SGD | epoch: 045 | loss: 0.03841 - R2: 0.9992 -- iter: 1088/1168
Training Step: 854  | total loss: [1m[32m0.03645[0m[0m | time: 0.090s
| SGD | epoch: 045 | loss: 0.03645 - R2: 0.9992 -- iter: 1152/1168
Training Step: 855  | total loss: [1m[32m0.03473[0m[0m | time: 1.096s
| SGD | epoch: 045 | loss: 0.03473 - R2: 0.9991 | val_loss: 0.02804 - val_acc: 1.0013 -- iter: 1168/1168
--
Training Step: 856  | total loss: [1m[32m0.03074[0m[0m | time: 0.129s
| SGD | epoch: 046 | loss: 0.03074 - R2: 0.9993 -- iter: 0064/1168
Training Step: 857  | total loss: [1m[32m0.03074[0m[0m | time: 0.131s
| SGD | epoch: 046 | loss: 0.03074 - R2: 0.9993 -- iter: 0128/1168
Training Step: 858  | total loss: [1m[32m0.02854[0m[0m | time: 0.135s
| SGD | epoch: 046 | loss: 0.02854 - R2

Training Step: 910  | total loss: [1m[32m0.04127[0m[0m | time: 0.198s
| SGD | epoch: 048 | loss: 0.04127 - R2: 0.9996 -- iter: 1088/1168
Training Step: 911  | total loss: [1m[32m0.03840[0m[0m | time: 0.201s
| SGD | epoch: 048 | loss: 0.03840 - R2: 0.9993 -- iter: 1152/1168
Training Step: 912  | total loss: [1m[32m0.03603[0m[0m | time: 1.206s
| SGD | epoch: 048 | loss: 0.03603 - R2: 0.9998 | val_loss: 0.02823 - val_acc: 1.0007 -- iter: 1168/1168
--
Training Step: 913  | total loss: [1m[32m0.03282[0m[0m | time: 0.056s
| SGD | epoch: 049 | loss: 0.03282 - R2: 0.9997 -- iter: 0064/1168
Training Step: 914  | total loss: [1m[32m0.03053[0m[0m | time: 0.063s
| SGD | epoch: 049 | loss: 0.03053 - R2: 0.9998 -- iter: 0128/1168
Training Step: 915  | total loss: [1m[32m0.02974[0m[0m | time: 0.069s
| SGD | epoch: 049 | loss: 0.02974 - R2: 0.9999 -- iter: 0192/1168
Training Step: 916  | total loss: [1m[32m0.02783[0m[0m | time: 0.072s
| SGD | epoch: 049 | loss: 0.02783 - R2

Training Step: 16  | total loss: [1m[32m143.41374[0m[0m | time: 0.083s
| SGD | epoch: 001 | loss: 143.41374 - R2: 0.0000 -- iter: 1024/1168
Training Step: 17  | total loss: [1m[32m142.94255[0m[0m | time: 0.087s
| SGD | epoch: 001 | loss: 142.94255 - R2: 0.0000 -- iter: 1088/1168
Training Step: 18  | total loss: [1m[32m142.94255[0m[0m | time: 0.091s
| SGD | epoch: 001 | loss: 142.94255 - R2: 0.0000 -- iter: 1152/1168
Training Step: 19  | total loss: [1m[32m141.60745[0m[0m | time: 1.104s
| SGD | epoch: 001 | loss: 141.60745 - R2: 0.0000 | val_loss: 142.42757 - val_acc: 0.0001 -- iter: 1168/1168
--
Training Step: 20  | total loss: [1m[32m141.60745[0m[0m | time: 0.057s
| SGD | epoch: 002 | loss: 141.60745 - R2: 0.0000 -- iter: 0064/1168
Training Step: 21  | total loss: [1m[32m141.06168[0m[0m | time: 0.060s
| SGD | epoch: 002 | loss: 141.06168 - R2: 0.0001 -- iter: 0128/1168
Training Step: 22  | total loss: [1m[32m141.00168[0m[0m | time: 0.064s
| SGD | epoch: 002

Training Step: 72  | total loss: [1m[32m137.40627[0m[0m | time: 0.050s
| SGD | epoch: 004 | loss: 137.40627 - R2: 0.0007 -- iter: 0960/1168
Training Step: 73  | total loss: [1m[32m137.08157[0m[0m | time: 0.053s
| SGD | epoch: 004 | loss: 137.08157 - R2: 0.0007 -- iter: 1024/1168
Training Step: 74  | total loss: [1m[32m137.08157[0m[0m | time: 0.056s
| SGD | epoch: 004 | loss: 137.08157 - R2: 0.0007 -- iter: 1088/1168
Training Step: 75  | total loss: [1m[32m137.01056[0m[0m | time: 0.058s
| SGD | epoch: 004 | loss: 137.01056 - R2: 0.0008 -- iter: 1152/1168
Training Step: 76  | total loss: [1m[32m136.75594[0m[0m | time: 1.062s
| SGD | epoch: 004 | loss: 136.75594 - R2: 0.0008 | val_loss: 135.69459 - val_acc: 0.0010 -- iter: 1168/1168
--
Training Step: 77  | total loss: [1m[32m136.45366[0m[0m | time: 0.015s
| SGD | epoch: 005 | loss: 136.45366 - R2: 0.0008 -- iter: 0064/1168
Training Step: 78  | total loss: [1m[32m136.33934[0m[0m | time: 0.020s
| SGD | epoch: 005

Training Step: 128  | total loss: [1m[32m131.24088[0m[0m | time: 0.119s
| SGD | epoch: 007 | loss: 131.24088 - R2: 0.0024 -- iter: 0896/1168
Training Step: 129  | total loss: [1m[32m130.96761[0m[0m | time: 0.121s
| SGD | epoch: 007 | loss: 130.96761 - R2: 0.0024 -- iter: 0960/1168
Training Step: 130  | total loss: [1m[32m130.82938[0m[0m | time: 0.124s
| SGD | epoch: 007 | loss: 130.82938 - R2: 0.0025 -- iter: 1024/1168
Training Step: 131  | total loss: [1m[32m130.59085[0m[0m | time: 0.127s
| SGD | epoch: 007 | loss: 130.59085 - R2: 0.0025 -- iter: 1088/1168
Training Step: 132  | total loss: [1m[32m130.59085[0m[0m | time: 0.129s
| SGD | epoch: 007 | loss: 130.59085 - R2: 0.0026 -- iter: 1152/1168
Training Step: 133  | total loss: [1m[32m130.33284[0m[0m | time: 1.135s
| SGD | epoch: 007 | loss: 130.33284 - R2: 0.0026 | val_loss: 129.11835 - val_acc: 0.0031 -- iter: 1168/1168
--
Training Step: 134  | total loss: [1m[32m130.27161[0m[0m | time: 0.007s
| SGD | epo

Training Step: 184  | total loss: [1m[32m124.42446[0m[0m | time: 0.126s
| SGD | epoch: 010 | loss: 124.42446 - R2: 0.0052 -- iter: 0832/1168
Training Step: 185  | total loss: [1m[32m124.18766[0m[0m | time: 0.128s
| SGD | epoch: 010 | loss: 124.18766 - R2: 0.0053 -- iter: 0896/1168
Training Step: 186  | total loss: [1m[32m123.97144[0m[0m | time: 0.131s
| SGD | epoch: 010 | loss: 123.97144 - R2: 0.0054 -- iter: 0960/1168
Training Step: 187  | total loss: [1m[32m123.97144[0m[0m | time: 0.134s
| SGD | epoch: 010 | loss: 123.97144 - R2: 0.0054 -- iter: 1024/1168
Training Step: 188  | total loss: [1m[32m123.84539[0m[0m | time: 0.141s
| SGD | epoch: 010 | loss: 123.84539 - R2: 0.0055 -- iter: 1088/1168
Training Step: 189  | total loss: [1m[32m123.71983[0m[0m | time: 0.143s
| SGD | epoch: 010 | loss: 123.71983 - R2: 0.0055 -- iter: 1152/1168
Training Step: 190  | total loss: [1m[32m123.56448[0m[0m | time: 1.151s
| SGD | epoch: 010 | loss: 123.56448 - R2: 0.0056 | va

Training Step: 240  | total loss: [1m[32m118.65435[0m[0m | time: 0.123s
| SGD | epoch: 013 | loss: 118.65435 - R2: 0.0091 -- iter: 0768/1168
Training Step: 241  | total loss: [1m[32m118.65435[0m[0m | time: 0.125s
| SGD | epoch: 013 | loss: 118.65435 - R2: 0.0092 -- iter: 0832/1168
Training Step: 242  | total loss: [1m[32m118.47023[0m[0m | time: 0.127s
| SGD | epoch: 013 | loss: 118.47023 - R2: 0.0093 -- iter: 0896/1168
Training Step: 243  | total loss: [1m[32m118.18983[0m[0m | time: 0.139s
| SGD | epoch: 013 | loss: 118.18983 - R2: 0.0094 -- iter: 0960/1168
Training Step: 244  | total loss: [1m[32m118.08816[0m[0m | time: 0.143s
| SGD | epoch: 013 | loss: 118.08816 - R2: 0.0095 -- iter: 1024/1168
Training Step: 245  | total loss: [1m[32m117.74529[0m[0m | time: 0.148s
| SGD | epoch: 013 | loss: 117.74529 - R2: 0.0095 -- iter: 1088/1168
Training Step: 246  | total loss: [1m[32m117.74529[0m[0m | time: 0.156s
| SGD | epoch: 013 | loss: 117.74529 - R2: 0.0096 -- i

Training Step: 296  | total loss: [1m[32m112.02618[0m[0m | time: 0.128s
| SGD | epoch: 016 | loss: 112.02618 - R2: 0.0143 -- iter: 0704/1168
Training Step: 297  | total loss: [1m[32m111.96439[0m[0m | time: 0.130s
| SGD | epoch: 016 | loss: 111.96439 - R2: 0.0143 -- iter: 0768/1168
Training Step: 298  | total loss: [1m[32m111.96439[0m[0m | time: 0.131s
| SGD | epoch: 016 | loss: 111.96439 - R2: 0.0144 -- iter: 0832/1168
Training Step: 299  | total loss: [1m[32m112.17057[0m[0m | time: 0.133s
| SGD | epoch: 016 | loss: 112.17057 - R2: 0.0145 -- iter: 0896/1168
Training Step: 300  | total loss: [1m[32m112.17057[0m[0m | time: 0.135s
| SGD | epoch: 016 | loss: 112.17057 - R2: 0.0146 -- iter: 0960/1168
Training Step: 301  | total loss: [1m[32m112.30653[0m[0m | time: 0.138s
| SGD | epoch: 016 | loss: 112.30653 - R2: 0.0147 -- iter: 1024/1168
Training Step: 302  | total loss: [1m[32m112.30653[0m[0m | time: 0.139s
| SGD | epoch: 016 | loss: 112.30653 - R2: 0.0148 -- i

Training Step: 352  | total loss: [1m[32m106.09842[0m[0m | time: 0.052s
| SGD | epoch: 019 | loss: 106.09842 - R2: 0.0205 -- iter: 0640/1168
Training Step: 353  | total loss: [1m[32m106.09842[0m[0m | time: 0.055s
| SGD | epoch: 019 | loss: 106.09842 - R2: 0.0206 -- iter: 0704/1168
Training Step: 354  | total loss: [1m[32m105.97124[0m[0m | time: 0.058s
| SGD | epoch: 019 | loss: 105.97124 - R2: 0.0207 -- iter: 0768/1168
Training Step: 355  | total loss: [1m[32m106.06054[0m[0m | time: 0.061s
| SGD | epoch: 019 | loss: 106.06054 - R2: 0.0208 -- iter: 0832/1168
Training Step: 356  | total loss: [1m[32m105.87783[0m[0m | time: 0.065s
| SGD | epoch: 019 | loss: 105.87783 - R2: 0.0209 -- iter: 0896/1168
Training Step: 357  | total loss: [1m[32m105.87783[0m[0m | time: 0.067s
| SGD | epoch: 019 | loss: 105.87783 - R2: 0.0211 -- iter: 0960/1168
Training Step: 358  | total loss: [1m[32m105.71082[0m[0m | time: 0.069s
| SGD | epoch: 019 | loss: 105.71082 - R2: 0.0212 -- i

Training Step: 408  | total loss: [1m[32m100.53463[0m[0m | time: 0.093s
| SGD | epoch: 022 | loss: 100.53463 - R2: 0.0280 -- iter: 0576/1168
Training Step: 409  | total loss: [1m[32m100.42049[0m[0m | time: 0.095s
| SGD | epoch: 022 | loss: 100.42049 - R2: 0.0281 -- iter: 0640/1168
Training Step: 410  | total loss: [1m[32m100.19537[0m[0m | time: 0.097s
| SGD | epoch: 022 | loss: 100.19537 - R2: 0.0283 -- iter: 0704/1168
Training Step: 411  | total loss: [1m[32m100.15783[0m[0m | time: 0.100s
| SGD | epoch: 022 | loss: 100.15783 - R2: 0.0284 -- iter: 0768/1168
Training Step: 412  | total loss: [1m[32m100.03988[0m[0m | time: 0.103s
| SGD | epoch: 022 | loss: 100.03988 - R2: 0.0286 -- iter: 0832/1168
Training Step: 413  | total loss: [1m[32m100.05118[0m[0m | time: 0.105s
| SGD | epoch: 022 | loss: 100.05118 - R2: 0.0287 -- iter: 0896/1168
Training Step: 414  | total loss: [1m[32m99.87553[0m[0m | time: 0.108s
| SGD | epoch: 022 | loss: 99.87553 - R2: 0.0288 -- ite

Training Step: 465  | total loss: [1m[32m94.05252[0m[0m | time: 0.106s
| SGD | epoch: 025 | loss: 94.05252 - R2: 0.0372 -- iter: 0576/1168
Training Step: 466  | total loss: [1m[32m94.05252[0m[0m | time: 0.108s
| SGD | epoch: 025 | loss: 94.05252 - R2: 0.0373 -- iter: 0640/1168
Training Step: 467  | total loss: [1m[32m93.81573[0m[0m | time: 0.110s
| SGD | epoch: 025 | loss: 93.81573 - R2: 0.0376 -- iter: 0704/1168
Training Step: 468  | total loss: [1m[32m93.67560[0m[0m | time: 0.112s
| SGD | epoch: 025 | loss: 93.67560 - R2: 0.0377 -- iter: 0768/1168
Training Step: 469  | total loss: [1m[32m93.57709[0m[0m | time: 0.114s
| SGD | epoch: 025 | loss: 93.57709 - R2: 0.0379 -- iter: 0832/1168
Training Step: 470  | total loss: [1m[32m93.54059[0m[0m | time: 0.116s
| SGD | epoch: 025 | loss: 93.54059 - R2: 0.0381 -- iter: 0896/1168
Training Step: 471  | total loss: [1m[32m93.50279[0m[0m | time: 0.117s
| SGD | epoch: 025 | loss: 93.50279 - R2: 0.0382 -- iter: 0960/1168

Training Step: 522  | total loss: [1m[32m88.71466[0m[0m | time: 0.083s
| SGD | epoch: 028 | loss: 88.71466 - R2: 0.0478 -- iter: 0576/1168
Training Step: 523  | total loss: [1m[32m88.51614[0m[0m | time: 0.084s
| SGD | epoch: 028 | loss: 88.51614 - R2: 0.0480 -- iter: 0640/1168
Training Step: 524  | total loss: [1m[32m88.34012[0m[0m | time: 0.086s
| SGD | epoch: 028 | loss: 88.34012 - R2: 0.0482 -- iter: 0704/1168
Training Step: 525  | total loss: [1m[32m88.25346[0m[0m | time: 0.088s
| SGD | epoch: 028 | loss: 88.25346 - R2: 0.0484 -- iter: 0768/1168
Training Step: 526  | total loss: [1m[32m87.96184[0m[0m | time: 0.089s
| SGD | epoch: 028 | loss: 87.96184 - R2: 0.0487 -- iter: 0832/1168
Training Step: 527  | total loss: [1m[32m87.94492[0m[0m | time: 0.091s
| SGD | epoch: 028 | loss: 87.94492 - R2: 0.0489 -- iter: 0896/1168
Training Step: 528  | total loss: [1m[32m87.87538[0m[0m | time: 0.093s
| SGD | epoch: 028 | loss: 87.87538 - R2: 0.0491 -- iter: 0960/1168

Training Step: 579  | total loss: [1m[32m82.21805[0m[0m | time: 0.032s
| SGD | epoch: 031 | loss: 82.21805 - R2: 0.0609 -- iter: 0576/1168
Training Step: 580  | total loss: [1m[32m81.45316[0m[0m | time: 0.036s
| SGD | epoch: 031 | loss: 81.45316 - R2: 0.0613 -- iter: 0640/1168
Training Step: 581  | total loss: [1m[32m81.37074[0m[0m | time: 0.039s
| SGD | epoch: 031 | loss: 81.37074 - R2: 0.0617 -- iter: 0704/1168
Training Step: 582  | total loss: [1m[32m81.37074[0m[0m | time: 0.041s
| SGD | epoch: 031 | loss: 81.37074 - R2: 0.0620 -- iter: 0768/1168
Training Step: 583  | total loss: [1m[32m81.22315[0m[0m | time: 0.043s
| SGD | epoch: 031 | loss: 81.22315 - R2: 0.0622 -- iter: 0832/1168
Training Step: 584  | total loss: [1m[32m81.19643[0m[0m | time: 0.045s
| SGD | epoch: 031 | loss: 81.19643 - R2: 0.0624 -- iter: 0896/1168
Training Step: 585  | total loss: [1m[32m81.11991[0m[0m | time: 0.046s
| SGD | epoch: 031 | loss: 81.11991 - R2: 0.0627 -- iter: 0960/1168

Training Step: 636  | total loss: [1m[32m75.69452[0m[0m | time: 0.084s
| SGD | epoch: 034 | loss: 75.69452 - R2: 0.0768 -- iter: 0576/1168
Training Step: 637  | total loss: [1m[32m75.55344[0m[0m | time: 0.086s
| SGD | epoch: 034 | loss: 75.55344 - R2: 0.0770 -- iter: 0640/1168
Training Step: 638  | total loss: [1m[32m75.55344[0m[0m | time: 0.088s
| SGD | epoch: 034 | loss: 75.55344 - R2: 0.0773 -- iter: 0704/1168
Training Step: 639  | total loss: [1m[32m75.42263[0m[0m | time: 0.094s
| SGD | epoch: 034 | loss: 75.42263 - R2: 0.0776 -- iter: 0768/1168
Training Step: 640  | total loss: [1m[32m75.43690[0m[0m | time: 0.098s
| SGD | epoch: 034 | loss: 75.43690 - R2: 0.0778 -- iter: 0832/1168
Training Step: 641  | total loss: [1m[32m75.42479[0m[0m | time: 0.103s
| SGD | epoch: 034 | loss: 75.42479 - R2: 0.0781 -- iter: 0896/1168
Training Step: 642  | total loss: [1m[32m75.37476[0m[0m | time: 0.108s
| SGD | epoch: 034 | loss: 75.37476 - R2: 0.0783 -- iter: 0960/1168

Training Step: 693  | total loss: [1m[32m68.68979[0m[0m | time: 0.137s
| SGD | epoch: 037 | loss: 68.68979 - R2: 0.0964 -- iter: 0576/1168
Training Step: 694  | total loss: [1m[32m68.63064[0m[0m | time: 0.140s
| SGD | epoch: 037 | loss: 68.63064 - R2: 0.0967 -- iter: 0640/1168
Training Step: 695  | total loss: [1m[32m68.58636[0m[0m | time: 0.142s
| SGD | epoch: 037 | loss: 68.58636 - R2: 0.0970 -- iter: 0704/1168
Training Step: 696  | total loss: [1m[32m68.41179[0m[0m | time: 0.146s
| SGD | epoch: 037 | loss: 68.41179 - R2: 0.0975 -- iter: 0768/1168
Training Step: 697  | total loss: [1m[32m68.41179[0m[0m | time: 0.150s
| SGD | epoch: 037 | loss: 68.41179 - R2: 0.0977 -- iter: 0832/1168
Training Step: 698  | total loss: [1m[32m68.15466[0m[0m | time: 0.153s
| SGD | epoch: 037 | loss: 68.15466 - R2: 0.0981 -- iter: 0896/1168
Training Step: 699  | total loss: [1m[32m68.15466[0m[0m | time: 0.156s
| SGD | epoch: 037 | loss: 68.15466 - R2: 0.0985 -- iter: 0960/1168

Training Step: 750  | total loss: [1m[32m61.89136[0m[0m | time: 0.025s
| SGD | epoch: 040 | loss: 61.89136 - R2: 0.1202 -- iter: 0576/1168
Training Step: 751  | total loss: [1m[32m61.87102[0m[0m | time: 0.028s
| SGD | epoch: 040 | loss: 61.87102 - R2: 0.1205 -- iter: 0640/1168
Training Step: 752  | total loss: [1m[32m61.68518[0m[0m | time: 0.033s
| SGD | epoch: 040 | loss: 61.68518 - R2: 0.1208 -- iter: 0704/1168
Training Step: 753  | total loss: [1m[32m61.64745[0m[0m | time: 0.037s
| SGD | epoch: 040 | loss: 61.64745 - R2: 0.1214 -- iter: 0768/1168
Training Step: 754  | total loss: [1m[32m61.64745[0m[0m | time: 0.047s
| SGD | epoch: 040 | loss: 61.64745 - R2: 0.1217 -- iter: 0832/1168
Training Step: 755  | total loss: [1m[32m61.27486[0m[0m | time: 0.052s
| SGD | epoch: 040 | loss: 61.27486 - R2: 0.1224 -- iter: 0896/1168
Training Step: 756  | total loss: [1m[32m61.12571[0m[0m | time: 0.056s
| SGD | epoch: 040 | loss: 61.12571 - R2: 0.1228 -- iter: 0960/1168

Training Step: 807  | total loss: [1m[32m54.47004[0m[0m | time: 0.120s
| SGD | epoch: 043 | loss: 54.47004 - R2: 0.1502 -- iter: 0576/1168
Training Step: 808  | total loss: [1m[32m54.32015[0m[0m | time: 0.123s
| SGD | epoch: 043 | loss: 54.32015 - R2: 0.1506 -- iter: 0640/1168
Training Step: 809  | total loss: [1m[32m54.16050[0m[0m | time: 0.132s
| SGD | epoch: 043 | loss: 54.16050 - R2: 0.1512 -- iter: 0704/1168
Training Step: 810  | total loss: [1m[32m54.07117[0m[0m | time: 0.136s
| SGD | epoch: 043 | loss: 54.07117 - R2: 0.1518 -- iter: 0768/1168
Training Step: 811  | total loss: [1m[32m54.07117[0m[0m | time: 0.139s
| SGD | epoch: 043 | loss: 54.07117 - R2: 0.1523 -- iter: 0832/1168
Training Step: 812  | total loss: [1m[32m53.90161[0m[0m | time: 0.147s
| SGD | epoch: 043 | loss: 53.90161 - R2: 0.1530 -- iter: 0896/1168
Training Step: 813  | total loss: [1m[32m53.33619[0m[0m | time: 0.152s
| SGD | epoch: 043 | loss: 53.33619 - R2: 0.1541 -- iter: 0960/1168

Training Step: 864  | total loss: [1m[32m46.76671[0m[0m | time: 0.080s
| SGD | epoch: 046 | loss: 46.76671 - R2: 0.1876 -- iter: 0576/1168
Training Step: 865  | total loss: [1m[32m46.68961[0m[0m | time: 0.082s
| SGD | epoch: 046 | loss: 46.68961 - R2: 0.1882 -- iter: 0640/1168
Training Step: 866  | total loss: [1m[32m46.29147[0m[0m | time: 0.085s
| SGD | epoch: 046 | loss: 46.29147 - R2: 0.1891 -- iter: 0704/1168
Training Step: 867  | total loss: [1m[32m46.19859[0m[0m | time: 0.087s
| SGD | epoch: 046 | loss: 46.19859 - R2: 0.1899 -- iter: 0768/1168
Training Step: 868  | total loss: [1m[32m46.19859[0m[0m | time: 0.091s
| SGD | epoch: 046 | loss: 46.19859 - R2: 0.1906 -- iter: 0832/1168
Training Step: 869  | total loss: [1m[32m45.95046[0m[0m | time: 0.095s
| SGD | epoch: 046 | loss: 45.95046 - R2: 0.1916 -- iter: 0896/1168
Training Step: 870  | total loss: [1m[32m45.72201[0m[0m | time: 0.098s
| SGD | epoch: 046 | loss: 45.72201 - R2: 0.1922 -- iter: 0960/1168

Training Step: 921  | total loss: [1m[32m38.49412[0m[0m | time: 0.055s
| SGD | epoch: 049 | loss: 38.49412 - R2: 0.2352 -- iter: 0576/1168
Training Step: 922  | total loss: [1m[32m38.22150[0m[0m | time: 0.059s
| SGD | epoch: 049 | loss: 38.22150 - R2: 0.2361 -- iter: 0640/1168
Training Step: 923  | total loss: [1m[32m38.09934[0m[0m | time: 0.063s
| SGD | epoch: 049 | loss: 38.09934 - R2: 0.2371 -- iter: 0704/1168
Training Step: 924  | total loss: [1m[32m38.09934[0m[0m | time: 0.066s
| SGD | epoch: 049 | loss: 38.09934 - R2: 0.2379 -- iter: 0768/1168
Training Step: 925  | total loss: [1m[32m38.03635[0m[0m | time: 0.070s
| SGD | epoch: 049 | loss: 38.03635 - R2: 0.2386 -- iter: 0832/1168
Training Step: 926  | total loss: [1m[32m37.91795[0m[0m | time: 0.073s
| SGD | epoch: 049 | loss: 37.91795 - R2: 0.2394 -- iter: 0896/1168
Training Step: 927  | total loss: [1m[32m37.74556[0m[0m | time: 0.076s
| SGD | epoch: 049 | loss: 37.74556 - R2: 0.2403 -- iter: 0960/1168

Training Step: 26  | total loss: [1m[32m121.03569[0m[0m | time: 0.078s
| SGD | epoch: 002 | loss: 121.03569 - R2: 0.0081 -- iter: 0448/1168
Training Step: 27  | total loss: [1m[32m119.63425[0m[0m | time: 0.081s
| SGD | epoch: 002 | loss: 119.63425 - R2: 0.0088 -- iter: 0512/1168
Training Step: 28  | total loss: [1m[32m118.68800[0m[0m | time: 0.083s
| SGD | epoch: 002 | loss: 118.68800 - R2: 0.0103 -- iter: 0576/1168
Training Step: 29  | total loss: [1m[32m117.38301[0m[0m | time: 0.087s
| SGD | epoch: 002 | loss: 117.38301 - R2: 0.0103 -- iter: 0640/1168
Training Step: 30  | total loss: [1m[32m116.43993[0m[0m | time: 0.091s
| SGD | epoch: 002 | loss: 116.43993 - R2: 0.0111 -- iter: 0704/1168
Training Step: 31  | total loss: [1m[32m114.81978[0m[0m | time: 0.093s
| SGD | epoch: 002 | loss: 114.81978 - R2: 0.0127 -- iter: 0768/1168
Training Step: 32  | total loss: [1m[32m114.81978[0m[0m | time: 0.096s
| SGD | epoch: 002 | loss: 114.81978 - R2: 0.0127 -- iter: 08

Training Step: 83  | total loss: [1m[32m65.40740[0m[0m | time: 0.130s
| SGD | epoch: 005 | loss: 65.40740 - R2: 0.1113 -- iter: 0448/1168
Training Step: 84  | total loss: [1m[32m64.42296[0m[0m | time: 0.139s
| SGD | epoch: 005 | loss: 64.42296 - R2: 0.1151 -- iter: 0512/1168
Training Step: 85  | total loss: [1m[32m62.25517[0m[0m | time: 0.145s
| SGD | epoch: 005 | loss: 62.25517 - R2: 0.1234 -- iter: 0576/1168
Training Step: 86  | total loss: [1m[32m61.03180[0m[0m | time: 0.149s
| SGD | epoch: 005 | loss: 61.03180 - R2: 0.1280 -- iter: 0640/1168
Training Step: 87  | total loss: [1m[32m59.78436[0m[0m | time: 0.164s
| SGD | epoch: 005 | loss: 59.78436 - R2: 0.1328 -- iter: 0704/1168
Training Step: 88  | total loss: [1m[32m58.63967[0m[0m | time: 0.170s
| SGD | epoch: 005 | loss: 58.63967 - R2: 0.1376 -- iter: 0768/1168
Training Step: 89  | total loss: [1m[32m58.63967[0m[0m | time: 0.180s
| SGD | epoch: 005 | loss: 58.63967 - R2: 0.1376 -- iter: 0832/1168
Traini

Training Step: 140  | total loss: [1m[32m2.98776[0m[0m | time: 0.095s
| SGD | epoch: 008 | loss: 2.98776 - R2: 0.8588 -- iter: 0448/1168
Training Step: 141  | total loss: [1m[32m2.69816[0m[0m | time: 0.099s
| SGD | epoch: 008 | loss: 2.69816 - R2: 0.8734 -- iter: 0512/1168
Training Step: 142  | total loss: [1m[32m2.69816[0m[0m | time: 0.101s
| SGD | epoch: 008 | loss: 2.69816 - R2: 0.8734 -- iter: 0576/1168
Training Step: 143  | total loss: [1m[32m2.43573[0m[0m | time: 0.102s
| SGD | epoch: 008 | loss: 2.43573 - R2: 0.8859 -- iter: 0640/1168
Training Step: 144  | total loss: [1m[32m2.19875[0m[0m | time: 0.105s
| SGD | epoch: 008 | loss: 2.19875 - R2: 0.8972 -- iter: 0704/1168
Training Step: 145  | total loss: [1m[32m1.79294[0m[0m | time: 0.119s
| SGD | epoch: 008 | loss: 1.79294 - R2: 0.9169 -- iter: 0768/1168
Training Step: 146  | total loss: [1m[32m1.79294[0m[0m | time: 0.130s
| SGD | epoch: 008 | loss: 1.79294 - R2: 0.9256 -- iter: 0832/1168
Training Step

Training Step: 198  | total loss: [1m[32m0.04284[0m[0m | time: 0.048s
| SGD | epoch: 011 | loss: 0.04284 - R2: 0.9998 -- iter: 0512/1168
Training Step: 199  | total loss: [1m[32m0.04082[0m[0m | time: 0.056s
| SGD | epoch: 011 | loss: 0.04082 - R2: 0.9998 -- iter: 0576/1168
Training Step: 200  | total loss: [1m[32m0.03919[0m[0m | time: 0.060s
| SGD | epoch: 011 | loss: 0.03919 - R2: 0.9992 -- iter: 0640/1168
Training Step: 201  | total loss: [1m[32m0.03653[0m[0m | time: 0.063s
| SGD | epoch: 011 | loss: 0.03653 - R2: 0.9999 -- iter: 0704/1168
Training Step: 202  | total loss: [1m[32m0.03483[0m[0m | time: 0.066s
| SGD | epoch: 011 | loss: 0.03483 - R2: 0.9997 -- iter: 0768/1168
Training Step: 203  | total loss: [1m[32m0.03281[0m[0m | time: 0.074s
| SGD | epoch: 011 | loss: 0.03281 - R2: 0.9997 -- iter: 0832/1168
Training Step: 204  | total loss: [1m[32m0.03281[0m[0m | time: 0.087s
| SGD | epoch: 011 | loss: 0.03281 - R2: 0.9997 -- iter: 0896/1168
Training Step

Training Step: 256  | total loss: [1m[32m0.02952[0m[0m | time: 0.069s
| SGD | epoch: 014 | loss: 0.02952 - R2: 0.9996 -- iter: 0576/1168
Training Step: 257  | total loss: [1m[32m0.02845[0m[0m | time: 0.073s
| SGD | epoch: 014 | loss: 0.02845 - R2: 0.9998 -- iter: 0640/1168
Training Step: 258  | total loss: [1m[32m0.02696[0m[0m | time: 0.076s
| SGD | epoch: 014 | loss: 0.02696 - R2: 0.9999 -- iter: 0704/1168
Training Step: 259  | total loss: [1m[32m0.02663[0m[0m | time: 0.098s
| SGD | epoch: 014 | loss: 0.02663 - R2: 0.9998 -- iter: 0768/1168
Training Step: 260  | total loss: [1m[32m0.02584[0m[0m | time: 0.122s
| SGD | epoch: 014 | loss: 0.02584 - R2: 0.9998 -- iter: 0832/1168
Training Step: 261  | total loss: [1m[32m0.02501[0m[0m | time: 0.126s
| SGD | epoch: 014 | loss: 0.02501 - R2: 0.9998 -- iter: 0896/1168
Training Step: 262  | total loss: [1m[32m0.02501[0m[0m | time: 0.134s
| SGD | epoch: 014 | loss: 0.02501 - R2: 0.9998 -- iter: 0960/1168
Training Step

Training Step: 314  | total loss: [1m[32m0.01647[0m[0m | time: 0.103s
| SGD | epoch: 017 | loss: 0.01647 - R2: 0.9997 -- iter: 0640/1168
Training Step: 315  | total loss: [1m[32m0.01647[0m[0m | time: 0.106s
| SGD | epoch: 017 | loss: 0.01647 - R2: 0.9997 -- iter: 0704/1168
Training Step: 316  | total loss: [1m[32m0.01760[0m[0m | time: 0.119s
| SGD | epoch: 017 | loss: 0.01760 - R2: 1.0000 -- iter: 0768/1168
Training Step: 317  | total loss: [1m[32m0.01834[0m[0m | time: 0.125s
| SGD | epoch: 017 | loss: 0.01834 - R2: 1.0003 -- iter: 0832/1168
Training Step: 318  | total loss: [1m[32m0.02067[0m[0m | time: 0.130s
| SGD | epoch: 017 | loss: 0.02067 - R2: 1.0003 -- iter: 0896/1168
Training Step: 319  | total loss: [1m[32m0.02059[0m[0m | time: 0.134s
| SGD | epoch: 017 | loss: 0.02059 - R2: 0.9994 -- iter: 0960/1168
Training Step: 320  | total loss: [1m[32m0.02011[0m[0m | time: 0.138s
| SGD | epoch: 017 | loss: 0.02011 - R2: 0.9992 -- iter: 1024/1168
Training Step

Training Step: 372  | total loss: [1m[32m0.02078[0m[0m | time: 0.089s
| SGD | epoch: 020 | loss: 0.02078 - R2: 1.0001 -- iter: 0704/1168
Training Step: 373  | total loss: [1m[32m0.02230[0m[0m | time: 0.093s
| SGD | epoch: 020 | loss: 0.02230 - R2: 1.0005 -- iter: 0768/1168
Training Step: 374  | total loss: [1m[32m0.02146[0m[0m | time: 0.097s
| SGD | epoch: 020 | loss: 0.02146 - R2: 1.0002 -- iter: 0832/1168
Training Step: 375  | total loss: [1m[32m0.02228[0m[0m | time: 0.104s
| SGD | epoch: 020 | loss: 0.02228 - R2: 0.9999 -- iter: 0896/1168
Training Step: 376  | total loss: [1m[32m0.02269[0m[0m | time: 0.107s
| SGD | epoch: 020 | loss: 0.02269 - R2: 0.9998 -- iter: 0960/1168
Training Step: 377  | total loss: [1m[32m0.02269[0m[0m | time: 0.111s
| SGD | epoch: 020 | loss: 0.02269 - R2: 0.9998 -- iter: 1024/1168
Training Step: 378  | total loss: [1m[32m0.02189[0m[0m | time: 0.120s
| SGD | epoch: 020 | loss: 0.02189 - R2: 0.9995 -- iter: 1088/1168
Training Step

Training Step: 430  | total loss: [1m[32m0.04278[0m[0m | time: 0.081s
| SGD | epoch: 023 | loss: 0.04278 - R2: 0.9991 -- iter: 0768/1168
Training Step: 431  | total loss: [1m[32m0.03748[0m[0m | time: 0.085s
| SGD | epoch: 023 | loss: 0.03748 - R2: 0.9994 -- iter: 0832/1168
Training Step: 432  | total loss: [1m[32m0.03588[0m[0m | time: 0.091s
| SGD | epoch: 023 | loss: 0.03588 - R2: 0.9995 -- iter: 0896/1168
Training Step: 433  | total loss: [1m[32m0.03588[0m[0m | time: 0.094s
| SGD | epoch: 023 | loss: 0.03588 - R2: 0.9995 -- iter: 0960/1168
Training Step: 434  | total loss: [1m[32m0.03394[0m[0m | time: 0.097s
| SGD | epoch: 023 | loss: 0.03394 - R2: 0.9995 -- iter: 1024/1168
Training Step: 435  | total loss: [1m[32m0.03268[0m[0m | time: 0.102s
| SGD | epoch: 023 | loss: 0.03268 - R2: 0.9996 -- iter: 1088/1168
Training Step: 436  | total loss: [1m[32m0.03104[0m[0m | time: 0.105s
| SGD | epoch: 023 | loss: 0.03104 - R2: 0.9996 -- iter: 1152/1168
Training Step

Training Step: 488  | total loss: [1m[32m0.02738[0m[0m | time: 0.143s
| SGD | epoch: 026 | loss: 0.02738 - R2: 0.9994 -- iter: 0832/1168
Training Step: 489  | total loss: [1m[32m0.02636[0m[0m | time: 0.150s
| SGD | epoch: 026 | loss: 0.02636 - R2: 0.9996 -- iter: 0896/1168
Training Step: 490  | total loss: [1m[32m0.02411[0m[0m | time: 0.154s
| SGD | epoch: 026 | loss: 0.02411 - R2: 0.9997 -- iter: 0960/1168
Training Step: 491  | total loss: [1m[32m0.02411[0m[0m | time: 0.160s
| SGD | epoch: 026 | loss: 0.02411 - R2: 0.9997 -- iter: 1024/1168
Training Step: 492  | total loss: [1m[32m0.02372[0m[0m | time: 0.163s
| SGD | epoch: 026 | loss: 0.02372 - R2: 0.9998 -- iter: 1088/1168
Training Step: 493  | total loss: [1m[32m0.02269[0m[0m | time: 0.167s
| SGD | epoch: 026 | loss: 0.02269 - R2: 0.9998 -- iter: 1152/1168
Training Step: 494  | total loss: [1m[32m0.02250[0m[0m | time: 1.178s
| SGD | epoch: 026 | loss: 0.02250 - R2: 0.9988 | val_loss: 0.02961 - val_acc: 1

Training Step: 546  | total loss: [1m[32m0.02578[0m[0m | time: 0.066s
| SGD | epoch: 029 | loss: 0.02578 - R2: 0.9999 -- iter: 0896/1168
Training Step: 547  | total loss: [1m[32m0.02448[0m[0m | time: 0.070s
| SGD | epoch: 029 | loss: 0.02448 - R2: 1.0001 -- iter: 0960/1168
Training Step: 548  | total loss: [1m[32m0.04440[0m[0m | time: 0.073s
| SGD | epoch: 029 | loss: 0.04440 - R2: 1.0002 -- iter: 1024/1168
Training Step: 549  | total loss: [1m[32m0.04141[0m[0m | time: 0.078s
| SGD | epoch: 029 | loss: 0.04141 - R2: 1.0000 -- iter: 1088/1168
Training Step: 550  | total loss: [1m[32m0.03969[0m[0m | time: 0.081s
| SGD | epoch: 029 | loss: 0.03969 - R2: 1.0000 -- iter: 1152/1168
Training Step: 551  | total loss: [1m[32m0.03874[0m[0m | time: 1.089s
| SGD | epoch: 029 | loss: 0.03874 - R2: 0.9997 | val_loss: 0.02878 - val_acc: 1.0023 -- iter: 1168/1168
--
Training Step: 552  | total loss: [1m[32m0.03449[0m[0m | time: 0.067s
| SGD | epoch: 030 | loss: 0.03449 - R2

Training Step: 604  | total loss: [1m[32m0.02182[0m[0m | time: 0.093s
| SGD | epoch: 032 | loss: 0.02182 - R2: 1.0001 -- iter: 0960/1168
Training Step: 605  | total loss: [1m[32m0.02182[0m[0m | time: 0.097s
| SGD | epoch: 032 | loss: 0.02182 - R2: 1.0001 -- iter: 1024/1168
Training Step: 606  | total loss: [1m[32m0.02265[0m[0m | time: 0.100s
| SGD | epoch: 032 | loss: 0.02265 - R2: 0.9999 -- iter: 1088/1168
Training Step: 607  | total loss: [1m[32m0.02147[0m[0m | time: 0.104s
| SGD | epoch: 032 | loss: 0.02147 - R2: 0.9999 -- iter: 1152/1168
Training Step: 608  | total loss: [1m[32m0.02083[0m[0m | time: 1.113s
| SGD | epoch: 032 | loss: 0.02083 - R2: 0.9996 | val_loss: 0.02911 - val_acc: 1.0034 -- iter: 1168/1168
--
Training Step: 609  | total loss: [1m[32m0.02009[0m[0m | time: 0.060s
| SGD | epoch: 033 | loss: 0.02009 - R2: 0.9995 -- iter: 0064/1168
Training Step: 610  | total loss: [1m[32m0.02063[0m[0m | time: 0.063s
| SGD | epoch: 033 | loss: 0.02063 - R2

Training Step: 662  | total loss: [1m[32m0.02340[0m[0m | time: 0.168s
| SGD | epoch: 035 | loss: 0.02340 - R2: 1.0003 -- iter: 1024/1168
Training Step: 663  | total loss: [1m[32m0.02441[0m[0m | time: 0.174s
| SGD | epoch: 035 | loss: 0.02441 - R2: 1.0005 -- iter: 1088/1168
Training Step: 664  | total loss: [1m[32m0.02527[0m[0m | time: 0.179s
| SGD | epoch: 035 | loss: 0.02527 - R2: 1.0002 -- iter: 1152/1168
Training Step: 665  | total loss: [1m[32m0.02507[0m[0m | time: 1.186s
| SGD | epoch: 035 | loss: 0.02507 - R2: 1.0002 | val_loss: 0.02715 - val_acc: 1.0016 -- iter: 1168/1168
--
Training Step: 666  | total loss: [1m[32m0.02406[0m[0m | time: 0.032s
| SGD | epoch: 036 | loss: 0.02406 - R2: 1.0001 -- iter: 0064/1168
Training Step: 667  | total loss: [1m[32m0.02406[0m[0m | time: 0.042s
| SGD | epoch: 036 | loss: 0.02406 - R2: 1.0001 -- iter: 0128/1168
Training Step: 668  | total loss: [1m[32m0.04817[0m[0m | time: 0.054s
| SGD | epoch: 036 | loss: 0.04817 - R2

Training Step: 720  | total loss: [1m[32m0.02774[0m[0m | time: 0.123s
| SGD | epoch: 038 | loss: 0.02774 - R2: 1.0000 -- iter: 1088/1168
Training Step: 721  | total loss: [1m[32m0.02637[0m[0m | time: 0.127s
| SGD | epoch: 038 | loss: 0.02637 - R2: 0.9998 -- iter: 1152/1168
Training Step: 722  | total loss: [1m[32m0.02524[0m[0m | time: 1.133s
| SGD | epoch: 038 | loss: 0.02524 - R2: 0.9995 | val_loss: 0.02794 - val_acc: 1.0032 -- iter: 1168/1168
--
Training Step: 723  | total loss: [1m[32m0.02508[0m[0m | time: 0.034s
| SGD | epoch: 039 | loss: 0.02508 - R2: 0.9998 -- iter: 0064/1168
Training Step: 724  | total loss: [1m[32m0.02426[0m[0m | time: 0.054s
| SGD | epoch: 039 | loss: 0.02426 - R2: 0.9993 -- iter: 0128/1168
Training Step: 725  | total loss: [1m[32m0.02349[0m[0m | time: 0.056s
| SGD | epoch: 039 | loss: 0.02349 - R2: 0.9997 -- iter: 0192/1168
Training Step: 726  | total loss: [1m[32m0.02349[0m[0m | time: 0.059s
| SGD | epoch: 039 | loss: 0.02349 - R2

Training Step: 778  | total loss: [1m[32m0.02721[0m[0m | time: 0.178s
| SGD | epoch: 041 | loss: 0.02721 - R2: 1.0004 -- iter: 1152/1168
Training Step: 779  | total loss: [1m[32m0.03240[0m[0m | time: 1.188s
| SGD | epoch: 041 | loss: 0.03240 - R2: 1.0016 | val_loss: 0.02827 - val_acc: 0.9957 -- iter: 1168/1168
--
Training Step: 780  | total loss: [1m[32m0.03240[0m[0m | time: 0.078s
| SGD | epoch: 042 | loss: 0.03240 - R2: 1.0016 -- iter: 0064/1168
Training Step: 781  | total loss: [1m[32m0.03521[0m[0m | time: 0.083s
| SGD | epoch: 042 | loss: 0.03521 - R2: 1.0019 -- iter: 0128/1168
Training Step: 782  | total loss: [1m[32m0.03469[0m[0m | time: 0.088s
| SGD | epoch: 042 | loss: 0.03469 - R2: 1.0002 -- iter: 0192/1168
Training Step: 783  | total loss: [1m[32m0.03307[0m[0m | time: 0.094s
| SGD | epoch: 042 | loss: 0.03307 - R2: 0.9997 -- iter: 0256/1168
Training Step: 784  | total loss: [1m[32m0.03191[0m[0m | time: 0.099s
| SGD | epoch: 042 | loss: 0.03191 - R2

Training Step: 836  | total loss: [1m[32m0.03389[0m[0m | time: 1.224s
| SGD | epoch: 044 | loss: 0.03389 - R2: 1.0000 | val_loss: 0.02818 - val_acc: 1.0028 -- iter: 1168/1168
--
Training Step: 837  | total loss: [1m[32m0.03389[0m[0m | time: 0.068s
| SGD | epoch: 045 | loss: 0.03389 - R2: 1.0000 -- iter: 0064/1168
Training Step: 838  | total loss: [1m[32m0.03243[0m[0m | time: 0.073s
| SGD | epoch: 045 | loss: 0.03243 - R2: 1.0000 -- iter: 0128/1168
Training Step: 839  | total loss: [1m[32m0.03125[0m[0m | time: 0.084s
| SGD | epoch: 045 | loss: 0.03125 - R2: 0.9997 -- iter: 0192/1168
Training Step: 840  | total loss: [1m[32m0.02856[0m[0m | time: 0.087s
| SGD | epoch: 045 | loss: 0.02856 - R2: 0.9993 -- iter: 0256/1168
Training Step: 841  | total loss: [1m[32m0.02604[0m[0m | time: 0.102s
| SGD | epoch: 045 | loss: 0.02604 - R2: 0.9993 -- iter: 0320/1168
Training Step: 842  | total loss: [1m[32m0.02457[0m[0m | time: 0.104s
| SGD | epoch: 045 | loss: 0.02457 - R2

Training Step: 894  | total loss: [1m[32m0.02189[0m[0m | time: 0.023s
| SGD | epoch: 048 | loss: 0.02189 - R2: 0.9999 -- iter: 0064/1168
Training Step: 895  | total loss: [1m[32m0.02204[0m[0m | time: 0.027s
| SGD | epoch: 048 | loss: 0.02204 - R2: 1.0000 -- iter: 0128/1168
Training Step: 896  | total loss: [1m[32m0.02176[0m[0m | time: 0.038s
| SGD | epoch: 048 | loss: 0.02176 - R2: 1.0003 -- iter: 0192/1168
Training Step: 897  | total loss: [1m[32m0.02113[0m[0m | time: 0.052s
| SGD | epoch: 048 | loss: 0.02113 - R2: 0.9999 -- iter: 0256/1168
Training Step: 898  | total loss: [1m[32m0.02017[0m[0m | time: 0.062s
| SGD | epoch: 048 | loss: 0.02017 - R2: 0.9998 -- iter: 0320/1168
Training Step: 899  | total loss: [1m[32m0.01942[0m[0m | time: 0.066s
| SGD | epoch: 048 | loss: 0.01942 - R2: 0.9999 -- iter: 0384/1168
Training Step: 900  | total loss: [1m[32m0.02645[0m[0m | time: 0.069s
| SGD | epoch: 048 | loss: 0.02645 - R2: 1.0016 -- iter: 0448/1168
Training Step

---------------------------------
Training samples: 1168
Validation samples: 292
--
Training Step: 1  | time: 0.043s
| SGD | epoch: 001 | loss: 0.00000 - R2: 0.0000 -- iter: 0064/1168
Training Step: 2  | total loss: [1m[32m129.04245[0m[0m | time: 0.046s
| SGD | epoch: 001 | loss: 129.04245 - R2: 0.0000 -- iter: 0128/1168
Training Step: 3  | total loss: [1m[32m132.48660[0m[0m | time: 0.048s
| SGD | epoch: 001 | loss: 132.48660 - R2: 0.0014 -- iter: 0192/1168
Training Step: 4  | total loss: [1m[32m124.09962[0m[0m | time: 0.050s
| SGD | epoch: 001 | loss: 124.09962 - R2: 0.0054 -- iter: 0256/1168
Training Step: 5  | total loss: [1m[32m115.59547[0m[0m | time: 0.054s
| SGD | epoch: 001 | loss: 115.59547 - R2: 0.0116 -- iter: 0320/1168
Training Step: 6  | total loss: [1m[32m99.43223[0m[0m | time: 0.056s
| SGD | epoch: 001 | loss: 99.43223 - R2: 0.0302 -- iter: 0384/1168
Training Step: 7  | total loss: [1m[32m92.18421[0m[0m | time: 0.058s
| SGD | epoch: 001 | loss: 92.

Training Step: 59  | total loss: [1m[32m0.03424[0m[0m | time: 0.023s
| SGD | epoch: 004 | loss: 0.03424 - R2: 1.0014 -- iter: 0128/1168
Training Step: 60  | total loss: [1m[32m0.03313[0m[0m | time: 0.026s
| SGD | epoch: 004 | loss: 0.03313 - R2: 1.0015 -- iter: 0192/1168
Training Step: 61  | total loss: [1m[32m0.03049[0m[0m | time: 0.031s
| SGD | epoch: 004 | loss: 0.03049 - R2: 1.0009 -- iter: 0256/1168
Training Step: 62  | total loss: [1m[32m0.03049[0m[0m | time: 0.033s
| SGD | epoch: 004 | loss: 0.03049 - R2: 1.0009 -- iter: 0320/1168
Training Step: 63  | total loss: [1m[32m0.02816[0m[0m | time: 0.038s
| SGD | epoch: 004 | loss: 0.02816 - R2: 0.9996 -- iter: 0384/1168
Training Step: 64  | total loss: [1m[32m0.02730[0m[0m | time: 0.043s
| SGD | epoch: 004 | loss: 0.02730 - R2: 0.9990 -- iter: 0448/1168
Training Step: 65  | total loss: [1m[32m0.02730[0m[0m | time: 0.045s
| SGD | epoch: 004 | loss: 0.02730 - R2: 0.9987 -- iter: 0512/1168
Training Step: 66  |

Training Step: 117  | total loss: [1m[32m0.02486[0m[0m | time: 0.018s
| SGD | epoch: 007 | loss: 0.02486 - R2: 0.9987 -- iter: 0192/1168
Training Step: 118  | total loss: [1m[32m0.02486[0m[0m | time: 0.022s
| SGD | epoch: 007 | loss: 0.02486 - R2: 0.9987 -- iter: 0256/1168
Training Step: 119  | total loss: [1m[32m0.02446[0m[0m | time: 0.026s
| SGD | epoch: 007 | loss: 0.02446 - R2: 0.9987 -- iter: 0320/1168
Training Step: 120  | total loss: [1m[32m0.02449[0m[0m | time: 0.029s
| SGD | epoch: 007 | loss: 0.02449 - R2: 0.9986 -- iter: 0384/1168
Training Step: 121  | total loss: [1m[32m0.02447[0m[0m | time: 0.035s
| SGD | epoch: 007 | loss: 0.02447 - R2: 0.9985 -- iter: 0448/1168
Training Step: 122  | total loss: [1m[32m0.02348[0m[0m | time: 0.037s
| SGD | epoch: 007 | loss: 0.02348 - R2: 0.9985 -- iter: 0512/1168
Training Step: 123  | total loss: [1m[32m0.02415[0m[0m | time: 0.040s
| SGD | epoch: 007 | loss: 0.02415 - R2: 0.9991 -- iter: 0576/1168
Training Step

Training Step: 175  | total loss: [1m[32m0.02224[0m[0m | time: 0.031s
| SGD | epoch: 010 | loss: 0.02224 - R2: 0.9999 -- iter: 0256/1168
Training Step: 176  | total loss: [1m[32m0.02177[0m[0m | time: 0.034s
| SGD | epoch: 010 | loss: 0.02177 - R2: 1.0001 -- iter: 0320/1168
Training Step: 177  | total loss: [1m[32m0.02183[0m[0m | time: 0.037s
| SGD | epoch: 010 | loss: 0.02183 - R2: 0.9999 -- iter: 0384/1168
Training Step: 178  | total loss: [1m[32m0.02125[0m[0m | time: 0.040s
| SGD | epoch: 010 | loss: 0.02125 - R2: 0.9999 -- iter: 0448/1168
Training Step: 179  | total loss: [1m[32m0.02086[0m[0m | time: 0.044s
| SGD | epoch: 010 | loss: 0.02086 - R2: 0.9995 -- iter: 0512/1168
Training Step: 180  | total loss: [1m[32m0.02056[0m[0m | time: 0.047s
| SGD | epoch: 010 | loss: 0.02056 - R2: 0.9995 -- iter: 0576/1168
Training Step: 181  | total loss: [1m[32m0.02029[0m[0m | time: 0.053s
| SGD | epoch: 010 | loss: 0.02029 - R2: 0.9993 -- iter: 0640/1168
Training Step

Training Step: 233  | total loss: [1m[32m0.04148[0m[0m | time: 0.016s
| SGD | epoch: 013 | loss: 0.04148 - R2: 1.0003 -- iter: 0320/1168
Training Step: 234  | total loss: [1m[32m0.03925[0m[0m | time: 0.021s
| SGD | epoch: 013 | loss: 0.03925 - R2: 0.9998 -- iter: 0384/1168
Training Step: 235  | total loss: [1m[32m0.03925[0m[0m | time: 0.024s
| SGD | epoch: 013 | loss: 0.03925 - R2: 0.9998 -- iter: 0448/1168
Training Step: 236  | total loss: [1m[32m0.03810[0m[0m | time: 0.027s
| SGD | epoch: 013 | loss: 0.03810 - R2: 0.9995 -- iter: 0512/1168
Training Step: 237  | total loss: [1m[32m0.03885[0m[0m | time: 0.030s
| SGD | epoch: 013 | loss: 0.03885 - R2: 0.9997 -- iter: 0576/1168
Training Step: 238  | total loss: [1m[32m0.03639[0m[0m | time: 0.033s
| SGD | epoch: 013 | loss: 0.03639 - R2: 1.0000 -- iter: 0640/1168
Training Step: 239  | total loss: [1m[32m0.03466[0m[0m | time: 0.034s
| SGD | epoch: 013 | loss: 0.03466 - R2: 0.9997 -- iter: 0704/1168
Training Step

Training Step: 291  | total loss: [1m[32m0.04283[0m[0m | time: 0.133s
| SGD | epoch: 016 | loss: 0.04283 - R2: 1.0004 -- iter: 0384/1168
Training Step: 292  | total loss: [1m[32m0.04067[0m[0m | time: 0.137s
| SGD | epoch: 016 | loss: 0.04067 - R2: 1.0001 -- iter: 0448/1168
Training Step: 293  | total loss: [1m[32m0.03839[0m[0m | time: 0.141s
| SGD | epoch: 016 | loss: 0.03839 - R2: 1.0003 -- iter: 0512/1168
Training Step: 294  | total loss: [1m[32m0.03839[0m[0m | time: 0.144s
| SGD | epoch: 016 | loss: 0.03839 - R2: 1.0003 -- iter: 0576/1168
Training Step: 295  | total loss: [1m[32m0.03666[0m[0m | time: 0.147s
| SGD | epoch: 016 | loss: 0.03666 - R2: 1.0002 -- iter: 0640/1168
Training Step: 296  | total loss: [1m[32m0.03561[0m[0m | time: 0.154s
| SGD | epoch: 016 | loss: 0.03561 - R2: 0.9998 -- iter: 0704/1168
Training Step: 297  | total loss: [1m[32m0.03322[0m[0m | time: 0.168s
| SGD | epoch: 016 | loss: 0.03322 - R2: 0.9999 -- iter: 0768/1168
Training Step

Training Step: 349  | total loss: [1m[32m0.02480[0m[0m | time: 0.078s
| SGD | epoch: 019 | loss: 0.02480 - R2: 0.9994 -- iter: 0448/1168
Training Step: 350  | total loss: [1m[32m0.02413[0m[0m | time: 0.081s
| SGD | epoch: 019 | loss: 0.02413 - R2: 0.9994 -- iter: 0512/1168
Training Step: 351  | total loss: [1m[32m0.02623[0m[0m | time: 0.084s
| SGD | epoch: 019 | loss: 0.02623 - R2: 1.0000 -- iter: 0576/1168
Training Step: 352  | total loss: [1m[32m0.02623[0m[0m | time: 0.086s
| SGD | epoch: 019 | loss: 0.02623 - R2: 1.0000 -- iter: 0640/1168
Training Step: 353  | total loss: [1m[32m0.02506[0m[0m | time: 0.089s
| SGD | epoch: 019 | loss: 0.02506 - R2: 0.9997 -- iter: 0704/1168
Training Step: 354  | total loss: [1m[32m0.02483[0m[0m | time: 0.091s
| SGD | epoch: 019 | loss: 0.02483 - R2: 0.9998 -- iter: 0768/1168
Training Step: 355  | total loss: [1m[32m0.02412[0m[0m | time: 0.102s
| SGD | epoch: 019 | loss: 0.02412 - R2: 0.9998 -- iter: 0832/1168
Training Step

Training Step: 407  | total loss: [1m[32m0.02328[0m[0m | time: 0.094s
| SGD | epoch: 022 | loss: 0.02328 - R2: 0.9996 -- iter: 0512/1168
Training Step: 408  | total loss: [1m[32m0.02185[0m[0m | time: 0.098s
| SGD | epoch: 022 | loss: 0.02185 - R2: 0.9997 -- iter: 0576/1168
Training Step: 409  | total loss: [1m[32m0.02185[0m[0m | time: 0.113s
| SGD | epoch: 022 | loss: 0.02185 - R2: 0.9997 -- iter: 0640/1168
Training Step: 410  | total loss: [1m[32m0.02098[0m[0m | time: 0.139s
| SGD | epoch: 022 | loss: 0.02098 - R2: 1.0000 -- iter: 0704/1168
Training Step: 411  | total loss: [1m[32m0.02098[0m[0m | time: 0.149s
| SGD | epoch: 022 | loss: 0.02098 - R2: 0.9998 -- iter: 0768/1168
Training Step: 412  | total loss: [1m[32m0.02021[0m[0m | time: 0.153s
| SGD | epoch: 022 | loss: 0.02021 - R2: 0.9999 -- iter: 0832/1168
Training Step: 413  | total loss: [1m[32m0.02160[0m[0m | time: 0.155s
| SGD | epoch: 022 | loss: 0.02160 - R2: 0.9997 -- iter: 0896/1168
Training Step

Training Step: 465  | total loss: [1m[32m0.02256[0m[0m | time: 0.030s
| SGD | epoch: 025 | loss: 0.02256 - R2: 0.9988 -- iter: 0576/1168
Training Step: 466  | total loss: [1m[32m0.02127[0m[0m | time: 0.035s
| SGD | epoch: 025 | loss: 0.02127 - R2: 0.9989 -- iter: 0640/1168
Training Step: 467  | total loss: [1m[32m0.02127[0m[0m | time: 0.042s
| SGD | epoch: 025 | loss: 0.02127 - R2: 0.9989 -- iter: 0704/1168
Training Step: 468  | total loss: [1m[32m0.02078[0m[0m | time: 0.045s
| SGD | epoch: 025 | loss: 0.02078 - R2: 0.9992 -- iter: 0768/1168
Training Step: 469  | total loss: [1m[32m0.02093[0m[0m | time: 0.047s
| SGD | epoch: 025 | loss: 0.02093 - R2: 0.9989 -- iter: 0832/1168
Training Step: 470  | total loss: [1m[32m0.02176[0m[0m | time: 0.049s
| SGD | epoch: 025 | loss: 0.02176 - R2: 0.9988 -- iter: 0896/1168
Training Step: 471  | total loss: [1m[32m0.02083[0m[0m | time: 0.051s
| SGD | epoch: 025 | loss: 0.02083 - R2: 0.9988 -- iter: 0960/1168
Training Step

Training Step: 523  | total loss: [1m[32m0.02804[0m[0m | time: 0.152s
| SGD | epoch: 028 | loss: 0.02804 - R2: 0.9977 -- iter: 0640/1168
Training Step: 524  | total loss: [1m[32m0.02833[0m[0m | time: 0.158s
| SGD | epoch: 028 | loss: 0.02833 - R2: 0.9980 -- iter: 0704/1168
Training Step: 525  | total loss: [1m[32m0.02833[0m[0m | time: 0.176s
| SGD | epoch: 028 | loss: 0.02833 - R2: 0.9980 -- iter: 0768/1168
Training Step: 526  | total loss: [1m[32m0.02803[0m[0m | time: 0.186s
| SGD | epoch: 028 | loss: 0.02803 - R2: 0.9985 -- iter: 0832/1168
Training Step: 527  | total loss: [1m[32m0.02683[0m[0m | time: 0.188s
| SGD | epoch: 028 | loss: 0.02683 - R2: 0.9985 -- iter: 0896/1168
Training Step: 528  | total loss: [1m[32m0.02558[0m[0m | time: 0.192s
| SGD | epoch: 028 | loss: 0.02558 - R2: 0.9987 -- iter: 0960/1168
Training Step: 529  | total loss: [1m[32m0.02500[0m[0m | time: 0.213s
| SGD | epoch: 028 | loss: 0.02500 - R2: 0.9988 -- iter: 1024/1168
Training Step

Training Step: 581  | total loss: [1m[32m0.03217[0m[0m | time: 0.107s
| SGD | epoch: 031 | loss: 0.03217 - R2: 1.0010 -- iter: 0704/1168
Training Step: 582  | total loss: [1m[32m0.03068[0m[0m | time: 0.111s
| SGD | epoch: 031 | loss: 0.03068 - R2: 1.0012 -- iter: 0768/1168
Training Step: 583  | total loss: [1m[32m0.02968[0m[0m | time: 0.114s
| SGD | epoch: 031 | loss: 0.02968 - R2: 1.0008 -- iter: 0832/1168
Training Step: 584  | total loss: [1m[32m0.02675[0m[0m | time: 0.120s
| SGD | epoch: 031 | loss: 0.02675 - R2: 1.0000 -- iter: 0896/1168
Training Step: 585  | total loss: [1m[32m0.02532[0m[0m | time: 0.134s
| SGD | epoch: 031 | loss: 0.02532 - R2: 0.9997 -- iter: 0960/1168
Training Step: 586  | total loss: [1m[32m0.02644[0m[0m | time: 0.138s
| SGD | epoch: 031 | loss: 0.02644 - R2: 0.9999 -- iter: 1024/1168
Training Step: 587  | total loss: [1m[32m0.02644[0m[0m | time: 0.149s
| SGD | epoch: 031 | loss: 0.02644 - R2: 0.9999 -- iter: 1088/1168
Training Step

Training Step: 639  | total loss: [1m[32m0.03167[0m[0m | time: 0.088s
| SGD | epoch: 034 | loss: 0.03167 - R2: 0.9990 -- iter: 0768/1168
Training Step: 640  | total loss: [1m[32m0.02813[0m[0m | time: 0.091s
| SGD | epoch: 034 | loss: 0.02813 - R2: 0.9992 -- iter: 0832/1168
Training Step: 641  | total loss: [1m[32m0.02813[0m[0m | time: 0.095s
| SGD | epoch: 034 | loss: 0.02813 - R2: 0.9992 -- iter: 0896/1168
Training Step: 642  | total loss: [1m[32m0.02656[0m[0m | time: 0.097s
| SGD | epoch: 034 | loss: 0.02656 - R2: 0.9993 -- iter: 0960/1168
Training Step: 643  | total loss: [1m[32m0.02586[0m[0m | time: 0.099s
| SGD | epoch: 034 | loss: 0.02586 - R2: 0.9991 -- iter: 1024/1168
Training Step: 644  | total loss: [1m[32m0.02442[0m[0m | time: 0.101s
| SGD | epoch: 034 | loss: 0.02442 - R2: 0.9991 -- iter: 1088/1168
Training Step: 645  | total loss: [1m[32m0.02734[0m[0m | time: 0.103s
| SGD | epoch: 034 | loss: 0.02734 - R2: 0.9998 -- iter: 1152/1168
Training Step

Training Step: 697  | total loss: [1m[32m0.02126[0m[0m | time: 0.086s
| SGD | epoch: 037 | loss: 0.02126 - R2: 0.9996 -- iter: 0832/1168
Training Step: 698  | total loss: [1m[32m0.02069[0m[0m | time: 0.094s
| SGD | epoch: 037 | loss: 0.02069 - R2: 0.9997 -- iter: 0896/1168
Training Step: 699  | total loss: [1m[32m0.02017[0m[0m | time: 0.107s
| SGD | epoch: 037 | loss: 0.02017 - R2: 0.9991 -- iter: 0960/1168
Training Step: 700  | total loss: [1m[32m0.01970[0m[0m | time: 0.112s
| SGD | epoch: 037 | loss: 0.01970 - R2: 0.9984 -- iter: 1024/1168
Training Step: 701  | total loss: [1m[32m0.01970[0m[0m | time: 0.115s
| SGD | epoch: 037 | loss: 0.01970 - R2: 0.9984 -- iter: 1088/1168
Training Step: 702  | total loss: [1m[32m0.02191[0m[0m | time: 0.119s
| SGD | epoch: 037 | loss: 0.02191 - R2: 0.9987 -- iter: 1152/1168
Training Step: 703  | total loss: [1m[32m0.02186[0m[0m | time: 1.126s
| SGD | epoch: 037 | loss: 0.02186 - R2: 0.9984 | val_loss: 0.01886 - val_acc: 0

Training Step: 755  | total loss: [1m[32m0.03699[0m[0m | time: 0.162s
| SGD | epoch: 040 | loss: 0.03699 - R2: 0.9997 -- iter: 0896/1168
Training Step: 756  | total loss: [1m[32m0.03494[0m[0m | time: 0.164s
| SGD | epoch: 040 | loss: 0.03494 - R2: 1.0000 -- iter: 0960/1168
Training Step: 757  | total loss: [1m[32m0.03156[0m[0m | time: 0.169s
| SGD | epoch: 040 | loss: 0.03156 - R2: 0.9996 -- iter: 1024/1168
Training Step: 758  | total loss: [1m[32m0.03156[0m[0m | time: 0.176s
| SGD | epoch: 040 | loss: 0.03156 - R2: 0.9994 -- iter: 1088/1168
Training Step: 759  | total loss: [1m[32m0.02998[0m[0m | time: 0.179s
| SGD | epoch: 040 | loss: 0.02998 - R2: 0.9994 -- iter: 1152/1168
Training Step: 760  | total loss: [1m[32m0.02859[0m[0m | time: 1.183s
| SGD | epoch: 040 | loss: 0.02859 - R2: 0.9987 | val_loss: 0.01886 - val_acc: 0.9974 -- iter: 1168/1168
--
Training Step: 761  | total loss: [1m[32m0.02734[0m[0m | time: 0.031s
| SGD | epoch: 041 | loss: 0.02734 - R2

Training Step: 813  | total loss: [1m[32m0.03551[0m[0m | time: 0.132s
| SGD | epoch: 043 | loss: 0.03551 - R2: 0.9994 -- iter: 0960/1168
Training Step: 814  | total loss: [1m[32m0.03380[0m[0m | time: 0.140s
| SGD | epoch: 043 | loss: 0.03380 - R2: 0.9991 -- iter: 1024/1168
Training Step: 815  | total loss: [1m[32m0.03246[0m[0m | time: 0.145s
| SGD | epoch: 043 | loss: 0.03246 - R2: 0.9997 -- iter: 1088/1168
Training Step: 816  | total loss: [1m[32m0.03070[0m[0m | time: 0.150s
| SGD | epoch: 043 | loss: 0.03070 - R2: 0.9998 -- iter: 1152/1168
Training Step: 817  | total loss: [1m[32m0.02995[0m[0m | time: 1.167s
| SGD | epoch: 043 | loss: 0.02995 - R2: 0.9994 | val_loss: 0.01886 - val_acc: 0.9974 -- iter: 1168/1168
--
Training Step: 818  | total loss: [1m[32m0.02995[0m[0m | time: 0.082s
| SGD | epoch: 044 | loss: 0.02995 - R2: 0.9994 -- iter: 0064/1168
Training Step: 819  | total loss: [1m[32m0.02858[0m[0m | time: 0.085s
| SGD | epoch: 044 | loss: 0.02858 - R2

Training Step: 871  | total loss: [1m[32m0.05010[0m[0m | time: 0.044s
| SGD | epoch: 046 | loss: 0.05010 - R2: 0.9992 -- iter: 1024/1168
Training Step: 872  | total loss: [1m[32m0.04650[0m[0m | time: 0.046s
| SGD | epoch: 046 | loss: 0.04650 - R2: 0.9989 -- iter: 1088/1168
Training Step: 873  | total loss: [1m[32m0.04417[0m[0m | time: 0.049s
| SGD | epoch: 046 | loss: 0.04417 - R2: 0.9993 -- iter: 1152/1168
Training Step: 874  | total loss: [1m[32m0.04012[0m[0m | time: 1.057s
| SGD | epoch: 046 | loss: 0.04012 - R2: 0.9989 | val_loss: 0.01886 - val_acc: 0.9974 -- iter: 1168/1168
--
Training Step: 875  | total loss: [1m[32m0.03852[0m[0m | time: 0.007s
| SGD | epoch: 047 | loss: 0.03852 - R2: 0.9987 -- iter: 0064/1168
Training Step: 876  | total loss: [1m[32m0.03852[0m[0m | time: 0.011s
| SGD | epoch: 047 | loss: 0.03852 - R2: 0.9987 -- iter: 0128/1168
Training Step: 877  | total loss: [1m[32m0.03571[0m[0m | time: 0.015s
| SGD | epoch: 047 | loss: 0.03571 - R2

Training Step: 929  | total loss: [1m[32m0.05765[0m[0m | time: 0.184s
| SGD | epoch: 049 | loss: 0.05765 - R2: 0.9993 -- iter: 1088/1168
Training Step: 930  | total loss: [1m[32m0.05336[0m[0m | time: 0.187s
| SGD | epoch: 049 | loss: 0.05336 - R2: 0.9993 -- iter: 1152/1168
Training Step: 931  | total loss: [1m[32m0.04881[0m[0m | time: 1.193s
| SGD | epoch: 049 | loss: 0.04881 - R2: 0.9997 | val_loss: 0.01886 - val_acc: 0.9974 -- iter: 1168/1168
--
Training Step: 932  | total loss: [1m[32m0.04881[0m[0m | time: 0.035s
| SGD | epoch: 050 | loss: 0.04881 - R2: 0.9997 -- iter: 0064/1168
Training Step: 933  | total loss: [1m[32m0.04506[0m[0m | time: 0.040s
| SGD | epoch: 050 | loss: 0.04506 - R2: 0.9999 -- iter: 0128/1168
Training Step: 934  | total loss: [1m[32m0.04248[0m[0m | time: 0.052s
| SGD | epoch: 050 | loss: 0.04248 - R2: 0.9997 -- iter: 0192/1168
Training Step: 935  | total loss: [1m[32m0.04020[0m[0m | time: 0.065s
| SGD | epoch: 050 | loss: 0.04020 - R2

Training Step: 35  | total loss: [1m[32m0.25084[0m[0m | time: 0.099s
| SGD | epoch: 002 | loss: 0.25084 - R2: 0.9912 -- iter: 1024/1168
Training Step: 36  | total loss: [1m[32m0.20783[0m[0m | time: 0.102s
| SGD | epoch: 002 | loss: 0.20783 - R2: 0.9934 -- iter: 1088/1168
Training Step: 37  | total loss: [1m[32m0.17345[0m[0m | time: 0.103s
| SGD | epoch: 002 | loss: 0.17345 - R2: 0.9955 -- iter: 1152/1168
Training Step: 38  | total loss: [1m[32m0.14962[0m[0m | time: 1.107s
| SGD | epoch: 002 | loss: 0.14962 - R2: 0.9958 | val_loss: 0.02435 - val_acc: 0.9919 -- iter: 1168/1168
--
Training Step: 39  | total loss: [1m[32m0.12463[0m[0m | time: 0.021s
| SGD | epoch: 003 | loss: 0.12463 - R2: 0.9971 -- iter: 0064/1168
Training Step: 40  | total loss: [1m[32m0.10835[0m[0m | time: 0.038s
| SGD | epoch: 003 | loss: 0.10835 - R2: 0.9988 -- iter: 0128/1168
Training Step: 41  | total loss: [1m[32m0.09314[0m[0m | time: 0.040s
| SGD | epoch: 003 | loss: 0.09314 - R2: 0.999

Training Step: 93  | total loss: [1m[32m0.02562[0m[0m | time: 0.170s
| SGD | epoch: 005 | loss: 0.02562 - R2: 0.9998 -- iter: 1088/1168
Training Step: 94  | total loss: [1m[32m0.02562[0m[0m | time: 0.177s
| SGD | epoch: 005 | loss: 0.02562 - R2: 1.0003 -- iter: 1152/1168
Training Step: 95  | total loss: [1m[32m0.02460[0m[0m | time: 1.189s
| SGD | epoch: 005 | loss: 0.02460 - R2: 0.9999 | val_loss: 0.01837 - val_acc: 0.9949 -- iter: 1168/1168
--
Training Step: 96  | total loss: [1m[32m0.02460[0m[0m | time: 0.042s
| SGD | epoch: 006 | loss: 0.02460 - R2: 0.9999 -- iter: 0064/1168
Training Step: 97  | total loss: [1m[32m0.02512[0m[0m | time: 0.049s
| SGD | epoch: 006 | loss: 0.02512 - R2: 1.0001 -- iter: 0128/1168
Training Step: 98  | total loss: [1m[32m0.02525[0m[0m | time: 0.054s
| SGD | epoch: 006 | loss: 0.02525 - R2: 1.0000 -- iter: 0192/1168
Training Step: 99  | total loss: [1m[32m0.02326[0m[0m | time: 0.059s
| SGD | epoch: 006 | loss: 0.02326 - R2: 0.999

Training Step: 151  | total loss: [1m[32m0.04108[0m[0m | time: 0.108s
| SGD | epoch: 008 | loss: 0.04108 - R2: 0.9993 -- iter: 1152/1168
Training Step: 152  | total loss: [1m[32m0.03921[0m[0m | time: 1.113s
| SGD | epoch: 008 | loss: 0.03921 - R2: 0.9995 | val_loss: 0.01812 - val_acc: 0.9952 -- iter: 1168/1168
--
Training Step: 153  | total loss: [1m[32m0.03770[0m[0m | time: 0.032s
| SGD | epoch: 009 | loss: 0.03770 - R2: 0.9988 -- iter: 0064/1168
Training Step: 154  | total loss: [1m[32m0.03847[0m[0m | time: 0.037s
| SGD | epoch: 009 | loss: 0.03847 - R2: 0.9991 -- iter: 0128/1168
Training Step: 155  | total loss: [1m[32m0.03622[0m[0m | time: 0.041s
| SGD | epoch: 009 | loss: 0.03622 - R2: 0.9993 -- iter: 0192/1168
Training Step: 156  | total loss: [1m[32m0.03356[0m[0m | time: 0.044s
| SGD | epoch: 009 | loss: 0.03356 - R2: 0.9993 -- iter: 0256/1168
Training Step: 157  | total loss: [1m[32m0.03134[0m[0m | time: 0.046s
| SGD | epoch: 009 | loss: 0.03134 - R2

Training Step: 209  | total loss: [1m[32m0.04570[0m[0m | time: 1.123s
| SGD | epoch: 011 | loss: 0.04570 - R2: 0.9999 | val_loss: 0.01814 - val_acc: 0.9952 -- iter: 1168/1168
--
Training Step: 210  | total loss: [1m[32m0.04570[0m[0m | time: 0.069s
| SGD | epoch: 012 | loss: 0.04570 - R2: 0.9999 -- iter: 0064/1168
Training Step: 211  | total loss: [1m[32m0.04271[0m[0m | time: 0.072s
| SGD | epoch: 012 | loss: 0.04271 - R2: 1.0002 -- iter: 0128/1168
Training Step: 212  | total loss: [1m[32m0.03775[0m[0m | time: 0.087s
| SGD | epoch: 012 | loss: 0.03775 - R2: 1.0004 -- iter: 0192/1168
Training Step: 213  | total loss: [1m[32m0.03775[0m[0m | time: 0.094s
| SGD | epoch: 012 | loss: 0.03775 - R2: 1.0004 -- iter: 0256/1168
Training Step: 214  | total loss: [1m[32m0.03532[0m[0m | time: 0.099s
| SGD | epoch: 012 | loss: 0.03532 - R2: 1.0006 -- iter: 0320/1168
Training Step: 215  | total loss: [1m[32m0.03305[0m[0m | time: 0.105s
| SGD | epoch: 012 | loss: 0.03305 - R2

Training Step: 267  | total loss: [1m[32m0.05557[0m[0m | time: 0.033s
| SGD | epoch: 015 | loss: 0.05557 - R2: 0.9983 -- iter: 0064/1168
Training Step: 268  | total loss: [1m[32m0.05557[0m[0m | time: 0.037s
| SGD | epoch: 015 | loss: 0.05557 - R2: 0.9983 -- iter: 0128/1168
Training Step: 269  | total loss: [1m[32m0.04828[0m[0m | time: 0.042s
| SGD | epoch: 015 | loss: 0.04828 - R2: 0.9985 -- iter: 0192/1168
Training Step: 270  | total loss: [1m[32m0.04479[0m[0m | time: 0.046s
| SGD | epoch: 015 | loss: 0.04479 - R2: 0.9985 -- iter: 0256/1168
Training Step: 271  | total loss: [1m[32m0.04304[0m[0m | time: 0.050s
| SGD | epoch: 015 | loss: 0.04304 - R2: 0.9986 -- iter: 0320/1168
Training Step: 272  | total loss: [1m[32m0.04304[0m[0m | time: 0.052s
| SGD | epoch: 015 | loss: 0.04304 - R2: 0.9986 -- iter: 0384/1168
Training Step: 273  | total loss: [1m[32m0.04093[0m[0m | time: 0.055s
| SGD | epoch: 015 | loss: 0.04093 - R2: 0.9990 -- iter: 0448/1168
Training Step

Training Step: 325  | total loss: [1m[32m0.02492[0m[0m | time: 0.018s
| SGD | epoch: 018 | loss: 0.02492 - R2: 0.9995 -- iter: 0128/1168
Training Step: 326  | total loss: [1m[32m0.02412[0m[0m | time: 0.023s
| SGD | epoch: 018 | loss: 0.02412 - R2: 0.9993 -- iter: 0192/1168
Training Step: 327  | total loss: [1m[32m0.02524[0m[0m | time: 0.027s
| SGD | epoch: 018 | loss: 0.02524 - R2: 0.9998 -- iter: 0256/1168
Training Step: 328  | total loss: [1m[32m0.02524[0m[0m | time: 0.029s
| SGD | epoch: 018 | loss: 0.02524 - R2: 0.9998 -- iter: 0320/1168
Training Step: 329  | total loss: [1m[32m0.02720[0m[0m | time: 0.033s
| SGD | epoch: 018 | loss: 0.02720 - R2: 1.0002 -- iter: 0384/1168
Training Step: 330  | total loss: [1m[32m0.02653[0m[0m | time: 0.039s
| SGD | epoch: 018 | loss: 0.02653 - R2: 1.0002 -- iter: 0448/1168
Training Step: 331  | total loss: [1m[32m0.02530[0m[0m | time: 0.041s
| SGD | epoch: 018 | loss: 0.02530 - R2: 1.0001 -- iter: 0512/1168
Training Step

Training Step: 383  | total loss: [1m[32m0.02511[0m[0m | time: 0.028s
| SGD | epoch: 021 | loss: 0.02511 - R2: 0.9983 -- iter: 0192/1168
Training Step: 384  | total loss: [1m[32m0.02392[0m[0m | time: 0.030s
| SGD | epoch: 021 | loss: 0.02392 - R2: 0.9983 -- iter: 0256/1168
Training Step: 385  | total loss: [1m[32m0.02362[0m[0m | time: 0.033s
| SGD | epoch: 021 | loss: 0.02362 - R2: 0.9987 -- iter: 0320/1168
Training Step: 386  | total loss: [1m[32m0.02596[0m[0m | time: 0.036s
| SGD | epoch: 021 | loss: 0.02596 - R2: 0.9994 -- iter: 0384/1168
Training Step: 387  | total loss: [1m[32m0.04556[0m[0m | time: 0.040s
| SGD | epoch: 021 | loss: 0.04556 - R2: 1.0011 -- iter: 0448/1168
Training Step: 388  | total loss: [1m[32m0.04556[0m[0m | time: 0.044s
| SGD | epoch: 021 | loss: 0.04556 - R2: 1.0011 -- iter: 0512/1168
Training Step: 389  | total loss: [1m[32m0.04337[0m[0m | time: 0.046s
| SGD | epoch: 021 | loss: 0.04337 - R2: 1.0010 -- iter: 0576/1168
Training Step

Training Step: 441  | total loss: [1m[32m0.02970[0m[0m | time: 0.017s
| SGD | epoch: 024 | loss: 0.02970 - R2: 0.9997 -- iter: 0256/1168
Training Step: 442  | total loss: [1m[32m0.02885[0m[0m | time: 0.021s
| SGD | epoch: 024 | loss: 0.02885 - R2: 0.9993 -- iter: 0320/1168
Training Step: 443  | total loss: [1m[32m0.02701[0m[0m | time: 0.025s
| SGD | epoch: 024 | loss: 0.02701 - R2: 0.9991 -- iter: 0384/1168
Training Step: 444  | total loss: [1m[32m0.02867[0m[0m | time: 0.029s
| SGD | epoch: 024 | loss: 0.02867 - R2: 0.9998 -- iter: 0448/1168
Training Step: 445  | total loss: [1m[32m0.03022[0m[0m | time: 0.037s
| SGD | epoch: 024 | loss: 0.03022 - R2: 1.0003 -- iter: 0512/1168
Training Step: 446  | total loss: [1m[32m0.02933[0m[0m | time: 0.042s
| SGD | epoch: 024 | loss: 0.02933 - R2: 1.0004 -- iter: 0576/1168
Training Step: 447  | total loss: [1m[32m0.02933[0m[0m | time: 0.044s
| SGD | epoch: 024 | loss: 0.02933 - R2: 0.9990 -- iter: 0640/1168
Training Step

Training Step: 499  | total loss: [1m[32m0.03319[0m[0m | time: 0.033s
| SGD | epoch: 027 | loss: 0.03319 - R2: 0.9987 -- iter: 0320/1168
Training Step: 500  | total loss: [1m[32m0.03259[0m[0m | time: 0.036s
| SGD | epoch: 027 | loss: 0.03259 - R2: 0.9983 -- iter: 0384/1168
Training Step: 501  | total loss: [1m[32m0.03259[0m[0m | time: 0.038s
| SGD | epoch: 027 | loss: 0.03259 - R2: 0.9983 -- iter: 0448/1168
Training Step: 502  | total loss: [1m[32m0.03112[0m[0m | time: 0.044s
| SGD | epoch: 027 | loss: 0.03112 - R2: 0.9986 -- iter: 0512/1168
Training Step: 503  | total loss: [1m[32m0.02903[0m[0m | time: 0.053s
| SGD | epoch: 027 | loss: 0.02903 - R2: 0.9985 -- iter: 0576/1168
Training Step: 504  | total loss: [1m[32m0.02815[0m[0m | time: 0.055s
| SGD | epoch: 027 | loss: 0.02815 - R2: 0.9986 -- iter: 0640/1168
Training Step: 505  | total loss: [1m[32m0.02803[0m[0m | time: 0.057s
| SGD | epoch: 027 | loss: 0.02803 - R2: 0.9988 -- iter: 0704/1168
Training Step

Training Step: 557  | total loss: [1m[32m0.03503[0m[0m | time: 0.100s
| SGD | epoch: 030 | loss: 0.03503 - R2: 1.0005 -- iter: 0384/1168
Training Step: 558  | total loss: [1m[32m0.03517[0m[0m | time: 0.106s
| SGD | epoch: 030 | loss: 0.03517 - R2: 1.0005 -- iter: 0448/1168
Training Step: 559  | total loss: [1m[32m0.03335[0m[0m | time: 0.115s
| SGD | epoch: 030 | loss: 0.03335 - R2: 1.0002 -- iter: 0512/1168
Training Step: 560  | total loss: [1m[32m0.03132[0m[0m | time: 0.117s
| SGD | epoch: 030 | loss: 0.03132 - R2: 0.9996 -- iter: 0576/1168
Training Step: 561  | total loss: [1m[32m0.02948[0m[0m | time: 0.123s
| SGD | epoch: 030 | loss: 0.02948 - R2: 0.9991 -- iter: 0640/1168
Training Step: 562  | total loss: [1m[32m0.02799[0m[0m | time: 0.128s
| SGD | epoch: 030 | loss: 0.02799 - R2: 0.9988 -- iter: 0704/1168
Training Step: 563  | total loss: [1m[32m0.02478[0m[0m | time: 0.131s
| SGD | epoch: 030 | loss: 0.02478 - R2: 0.9985 -- iter: 0768/1168
Training Step

Training Step: 615  | total loss: [1m[32m0.03464[0m[0m | time: 0.116s
| SGD | epoch: 033 | loss: 0.03464 - R2: 0.9994 -- iter: 0448/1168
Training Step: 616  | total loss: [1m[32m0.03282[0m[0m | time: 0.119s
| SGD | epoch: 033 | loss: 0.03282 - R2: 0.9991 -- iter: 0512/1168
Training Step: 617  | total loss: [1m[32m0.03057[0m[0m | time: 0.123s
| SGD | epoch: 033 | loss: 0.03057 - R2: 0.9995 -- iter: 0576/1168
Training Step: 618  | total loss: [1m[32m0.02881[0m[0m | time: 0.134s
| SGD | epoch: 033 | loss: 0.02881 - R2: 0.9994 -- iter: 0640/1168
Training Step: 619  | total loss: [1m[32m0.02881[0m[0m | time: 0.147s
| SGD | epoch: 033 | loss: 0.02881 - R2: 0.9994 -- iter: 0704/1168
Training Step: 620  | total loss: [1m[32m0.02508[0m[0m | time: 0.151s
| SGD | epoch: 033 | loss: 0.02508 - R2: 0.9999 -- iter: 0768/1168
Training Step: 621  | total loss: [1m[32m0.02519[0m[0m | time: 0.156s
| SGD | epoch: 033 | loss: 0.02519 - R2: 0.9999 -- iter: 0832/1168
Training Step

Training Step: 673  | total loss: [1m[32m0.02094[0m[0m | time: 0.091s
| SGD | epoch: 036 | loss: 0.02094 - R2: 0.9984 -- iter: 0512/1168
Training Step: 674  | total loss: [1m[32m0.02094[0m[0m | time: 0.097s
| SGD | epoch: 036 | loss: 0.02094 - R2: 0.9994 -- iter: 0576/1168
Training Step: 675  | total loss: [1m[32m0.02450[0m[0m | time: 0.103s
| SGD | epoch: 036 | loss: 0.02450 - R2: 0.9997 -- iter: 0640/1168
Training Step: 676  | total loss: [1m[32m0.02406[0m[0m | time: 0.106s
| SGD | epoch: 036 | loss: 0.02406 - R2: 0.9997 -- iter: 0704/1168
Training Step: 677  | total loss: [1m[32m0.02408[0m[0m | time: 0.112s
| SGD | epoch: 036 | loss: 0.02408 - R2: 0.9997 -- iter: 0768/1168
Training Step: 678  | total loss: [1m[32m0.02444[0m[0m | time: 0.115s
| SGD | epoch: 036 | loss: 0.02444 - R2: 0.9997 -- iter: 0832/1168
Training Step: 679  | total loss: [1m[32m0.02569[0m[0m | time: 0.116s
| SGD | epoch: 036 | loss: 0.02569 - R2: 1.0000 -- iter: 0896/1168
Training Step

Training Step: 731  | total loss: [1m[32m0.02101[0m[0m | time: 0.069s
| SGD | epoch: 039 | loss: 0.02101 - R2: 0.9997 -- iter: 0576/1168
Training Step: 732  | total loss: [1m[32m0.02029[0m[0m | time: 0.072s
| SGD | epoch: 039 | loss: 0.02029 - R2: 0.9997 -- iter: 0640/1168
Training Step: 733  | total loss: [1m[32m0.01943[0m[0m | time: 0.082s
| SGD | epoch: 039 | loss: 0.01943 - R2: 0.9992 -- iter: 0704/1168
Training Step: 734  | total loss: [1m[32m0.02106[0m[0m | time: 0.085s
| SGD | epoch: 039 | loss: 0.02106 - R2: 0.9998 -- iter: 0768/1168
Training Step: 735  | total loss: [1m[32m0.02464[0m[0m | time: 0.099s
| SGD | epoch: 039 | loss: 0.02464 - R2: 1.0001 -- iter: 0832/1168
Training Step: 736  | total loss: [1m[32m0.02358[0m[0m | time: 0.105s
| SGD | epoch: 039 | loss: 0.02358 - R2: 1.0001 -- iter: 0896/1168
Training Step: 737  | total loss: [1m[32m0.02302[0m[0m | time: 0.110s
| SGD | epoch: 039 | loss: 0.02302 - R2: 0.9998 -- iter: 0960/1168
Training Step

Training Step: 789  | total loss: [1m[32m0.02513[0m[0m | time: 0.058s
| SGD | epoch: 042 | loss: 0.02513 - R2: 0.9994 -- iter: 0640/1168
Training Step: 790  | total loss: [1m[32m0.02513[0m[0m | time: 0.063s
| SGD | epoch: 042 | loss: 0.02513 - R2: 0.9994 -- iter: 0704/1168
Training Step: 791  | total loss: [1m[32m0.02634[0m[0m | time: 0.073s
| SGD | epoch: 042 | loss: 0.02634 - R2: 0.9999 -- iter: 0768/1168
Training Step: 792  | total loss: [1m[32m0.02495[0m[0m | time: 0.078s
| SGD | epoch: 042 | loss: 0.02495 - R2: 1.0005 -- iter: 0832/1168
Training Step: 793  | total loss: [1m[32m0.02495[0m[0m | time: 0.087s
| SGD | epoch: 042 | loss: 0.02495 - R2: 1.0005 -- iter: 0896/1168
Training Step: 794  | total loss: [1m[32m0.02357[0m[0m | time: 0.093s
| SGD | epoch: 042 | loss: 0.02357 - R2: 1.0003 -- iter: 0960/1168
Training Step: 795  | total loss: [1m[32m0.02357[0m[0m | time: 0.096s
| SGD | epoch: 042 | loss: 0.02357 - R2: 1.0003 -- iter: 1024/1168
Training Step

Training Step: 847  | total loss: [1m[32m0.05293[0m[0m | time: 0.081s
| SGD | epoch: 045 | loss: 0.05293 - R2: 0.9984 -- iter: 0704/1168
Training Step: 848  | total loss: [1m[32m0.04908[0m[0m | time: 0.087s
| SGD | epoch: 045 | loss: 0.04908 - R2: 0.9982 -- iter: 0768/1168
Training Step: 849  | total loss: [1m[32m0.04600[0m[0m | time: 0.093s
| SGD | epoch: 045 | loss: 0.04600 - R2: 0.9986 -- iter: 0832/1168
Training Step: 850  | total loss: [1m[32m0.04357[0m[0m | time: 0.097s
| SGD | epoch: 045 | loss: 0.04357 - R2: 0.9985 -- iter: 0896/1168
Training Step: 851  | total loss: [1m[32m0.04344[0m[0m | time: 0.102s
| SGD | epoch: 045 | loss: 0.04344 - R2: 0.9986 -- iter: 0960/1168
Training Step: 852  | total loss: [1m[32m0.04085[0m[0m | time: 0.107s
| SGD | epoch: 045 | loss: 0.04085 - R2: 0.9983 -- iter: 1024/1168
Training Step: 853  | total loss: [1m[32m0.03884[0m[0m | time: 0.111s
| SGD | epoch: 045 | loss: 0.03884 - R2: 0.9986 -- iter: 1088/1168
Training Step

Training Step: 905  | total loss: [1m[32m0.02264[0m[0m | time: 0.125s
| SGD | epoch: 048 | loss: 0.02264 - R2: 1.0008 -- iter: 0768/1168
Training Step: 906  | total loss: [1m[32m0.02188[0m[0m | time: 0.127s
| SGD | epoch: 048 | loss: 0.02188 - R2: 1.0006 -- iter: 0832/1168
Training Step: 907  | total loss: [1m[32m0.02194[0m[0m | time: 0.129s
| SGD | epoch: 048 | loss: 0.02194 - R2: 1.0004 -- iter: 0896/1168
Training Step: 908  | total loss: [1m[32m0.04517[0m[0m | time: 0.132s
| SGD | epoch: 048 | loss: 0.04517 - R2: 1.0000 -- iter: 0960/1168
Training Step: 909  | total loss: [1m[32m0.04391[0m[0m | time: 0.144s
| SGD | epoch: 048 | loss: 0.04391 - R2: 0.9999 -- iter: 1024/1168
Training Step: 910  | total loss: [1m[32m0.04391[0m[0m | time: 0.155s
| SGD | epoch: 048 | loss: 0.04391 - R2: 0.9999 -- iter: 1088/1168
Training Step: 911  | total loss: [1m[32m0.03958[0m[0m | time: 0.159s
| SGD | epoch: 048 | loss: 0.03958 - R2: 1.0000 -- iter: 1152/1168
Training Step

Training Step: 11  | total loss: [1m[32m71.83313[0m[0m | time: 0.100s
| SGD | epoch: 001 | loss: 71.83313 - R2: 0.0908 -- iter: 0704/1168
Training Step: 12  | total loss: [1m[32m65.21367[0m[0m | time: 0.103s
| SGD | epoch: 001 | loss: 65.21367 - R2: 0.1125 -- iter: 0768/1168
Training Step: 13  | total loss: [1m[32m51.73347[0m[0m | time: 0.109s
| SGD | epoch: 001 | loss: 51.73347 - R2: 0.1688 -- iter: 0832/1168
Training Step: 14  | total loss: [1m[32m45.18029[0m[0m | time: 0.115s
| SGD | epoch: 001 | loss: 45.18029 - R2: 0.2040 -- iter: 0896/1168
Training Step: 15  | total loss: [1m[32m38.80954[0m[0m | time: 0.124s
| SGD | epoch: 001 | loss: 38.80954 - R2: 0.2449 -- iter: 0960/1168
Training Step: 16  | total loss: [1m[32m38.80954[0m[0m | time: 0.126s
| SGD | epoch: 001 | loss: 38.80954 - R2: 0.2449 -- iter: 1024/1168
Training Step: 17  | total loss: [1m[32m32.54639[0m[0m | time: 0.129s
| SGD | epoch: 001 | loss: 32.54639 - R2: 0.2929 -- iter: 1088/1168
Traini

Training Step: 69  | total loss: [1m[32m0.16620[0m[0m | time: 0.142s
| SGD | epoch: 004 | loss: 0.16620 - R2: 0.9999 -- iter: 0768/1168
Training Step: 70  | total loss: [1m[32m0.16754[0m[0m | time: 0.148s
| SGD | epoch: 004 | loss: 0.16754 - R2: 1.0011 -- iter: 0832/1168
Training Step: 71  | total loss: [1m[32m0.17113[0m[0m | time: 0.156s
| SGD | epoch: 004 | loss: 0.17113 - R2: 0.9998 -- iter: 0896/1168
Training Step: 72  | total loss: [1m[32m0.17113[0m[0m | time: 0.160s
| SGD | epoch: 004 | loss: 0.17113 - R2: 0.9998 -- iter: 0960/1168
Training Step: 73  | total loss: [1m[32m0.17469[0m[0m | time: 0.162s
| SGD | epoch: 004 | loss: 0.17469 - R2: 1.0014 -- iter: 1024/1168
Training Step: 74  | total loss: [1m[32m0.16630[0m[0m | time: 0.166s
| SGD | epoch: 004 | loss: 0.16630 - R2: 1.0014 -- iter: 1088/1168
Training Step: 75  | total loss: [1m[32m0.16598[0m[0m | time: 0.169s
| SGD | epoch: 004 | loss: 0.16598 - R2: 1.0011 -- iter: 1152/1168
Training Step: 76  |

Training Step: 127  | total loss: [1m[32m0.17213[0m[0m | time: 0.139s
| SGD | epoch: 007 | loss: 0.17213 - R2: 1.0010 -- iter: 0832/1168
Training Step: 128  | total loss: [1m[32m0.16558[0m[0m | time: 0.142s
| SGD | epoch: 007 | loss: 0.16558 - R2: 1.0008 -- iter: 0896/1168
Training Step: 129  | total loss: [1m[32m0.16558[0m[0m | time: 0.147s
| SGD | epoch: 007 | loss: 0.16558 - R2: 1.0008 -- iter: 0960/1168
Training Step: 130  | total loss: [1m[32m0.16136[0m[0m | time: 0.152s
| SGD | epoch: 007 | loss: 0.16136 - R2: 1.0008 -- iter: 1024/1168
Training Step: 131  | total loss: [1m[32m0.16136[0m[0m | time: 0.155s
| SGD | epoch: 007 | loss: 0.16136 - R2: 1.0001 -- iter: 1088/1168
Training Step: 132  | total loss: [1m[32m0.16941[0m[0m | time: 0.160s
| SGD | epoch: 007 | loss: 0.16941 - R2: 0.9998 -- iter: 1152/1168
Training Step: 133  | total loss: [1m[32m0.16656[0m[0m | time: 1.172s
| SGD | epoch: 007 | loss: 0.16656 - R2: 0.9998 | val_loss: 0.15110 - val_acc: 1

Training Step: 185  | total loss: [1m[32m0.16910[0m[0m | time: 0.108s
| SGD | epoch: 010 | loss: 0.16910 - R2: 0.9996 -- iter: 0896/1168
Training Step: 186  | total loss: [1m[32m0.16937[0m[0m | time: 0.113s
| SGD | epoch: 010 | loss: 0.16937 - R2: 0.9992 -- iter: 0960/1168
Training Step: 187  | total loss: [1m[32m0.16714[0m[0m | time: 0.122s
| SGD | epoch: 010 | loss: 0.16714 - R2: 0.9997 -- iter: 1024/1168
Training Step: 188  | total loss: [1m[32m0.16714[0m[0m | time: 0.127s
| SGD | epoch: 010 | loss: 0.16714 - R2: 0.9997 -- iter: 1088/1168
Training Step: 189  | total loss: [1m[32m0.16548[0m[0m | time: 0.131s
| SGD | epoch: 010 | loss: 0.16548 - R2: 1.0005 -- iter: 1152/1168
Training Step: 190  | total loss: [1m[32m0.16454[0m[0m | time: 1.139s
| SGD | epoch: 010 | loss: 0.16454 - R2: 0.9995 | val_loss: 0.15090 - val_acc: 1.0041 -- iter: 1168/1168
--
Training Step: 191  | total loss: [1m[32m0.16454[0m[0m | time: 0.053s
| SGD | epoch: 011 | loss: 0.16454 - R2

Training Step: 243  | total loss: [1m[32m0.18964[0m[0m | time: 0.172s
| SGD | epoch: 013 | loss: 0.18964 - R2: 0.9959 -- iter: 0960/1168
Training Step: 244  | total loss: [1m[32m0.18745[0m[0m | time: 0.176s
| SGD | epoch: 013 | loss: 0.18745 - R2: 0.9971 -- iter: 1024/1168
Training Step: 245  | total loss: [1m[32m0.17988[0m[0m | time: 0.180s
| SGD | epoch: 013 | loss: 0.17988 - R2: 0.9971 -- iter: 1088/1168
Training Step: 246  | total loss: [1m[32m0.17988[0m[0m | time: 0.184s
| SGD | epoch: 013 | loss: 0.17988 - R2: 0.9971 -- iter: 1152/1168
Training Step: 247  | total loss: [1m[32m0.17835[0m[0m | time: 1.191s
| SGD | epoch: 013 | loss: 0.17835 - R2: 0.9984 | val_loss: 0.15090 - val_acc: 1.0040 -- iter: 1168/1168
--
Training Step: 248  | total loss: [1m[32m0.17674[0m[0m | time: 0.040s
| SGD | epoch: 014 | loss: 0.17674 - R2: 0.9980 -- iter: 0064/1168
Training Step: 249  | total loss: [1m[32m0.17674[0m[0m | time: 0.045s
| SGD | epoch: 014 | loss: 0.17674 - R2

Training Step: 301  | total loss: [1m[32m0.15364[0m[0m | time: 0.116s
| SGD | epoch: 016 | loss: 0.15364 - R2: 0.9991 -- iter: 1024/1168
Training Step: 302  | total loss: [1m[32m0.15566[0m[0m | time: 0.120s
| SGD | epoch: 016 | loss: 0.15566 - R2: 0.9995 -- iter: 1088/1168
Training Step: 303  | total loss: [1m[32m0.15571[0m[0m | time: 0.123s
| SGD | epoch: 016 | loss: 0.15571 - R2: 1.0004 -- iter: 1152/1168
Training Step: 304  | total loss: [1m[32m0.15854[0m[0m | time: 1.133s
| SGD | epoch: 016 | loss: 0.15854 - R2: 1.0008 | val_loss: 0.15090 - val_acc: 1.0040 -- iter: 1168/1168
--
Training Step: 305  | total loss: [1m[32m0.15854[0m[0m | time: 0.077s
| SGD | epoch: 017 | loss: 0.15854 - R2: 1.0008 -- iter: 0064/1168
Training Step: 306  | total loss: [1m[32m0.15037[0m[0m | time: 0.082s
| SGD | epoch: 017 | loss: 0.15037 - R2: 1.0008 -- iter: 0128/1168
Training Step: 307  | total loss: [1m[32m0.16013[0m[0m | time: 0.088s
| SGD | epoch: 017 | loss: 0.16013 - R2

Training Step: 359  | total loss: [1m[32m0.16995[0m[0m | time: 0.138s
| SGD | epoch: 019 | loss: 0.16995 - R2: 1.0015 -- iter: 1088/1168
Training Step: 360  | total loss: [1m[32m0.16997[0m[0m | time: 0.149s
| SGD | epoch: 019 | loss: 0.16997 - R2: 1.0017 -- iter: 1152/1168
Training Step: 361  | total loss: [1m[32m0.16997[0m[0m | time: 1.155s
| SGD | epoch: 019 | loss: 0.16997 - R2: 1.0017 | val_loss: 0.15090 - val_acc: 1.0040 -- iter: 1168/1168
--
Training Step: 362  | total loss: [1m[32m0.16655[0m[0m | time: 0.031s
| SGD | epoch: 020 | loss: 0.16655 - R2: 1.0019 -- iter: 0064/1168
Training Step: 363  | total loss: [1m[32m0.16512[0m[0m | time: 0.035s
| SGD | epoch: 020 | loss: 0.16512 - R2: 1.0022 -- iter: 0128/1168
Training Step: 364  | total loss: [1m[32m0.16214[0m[0m | time: 0.043s
| SGD | epoch: 020 | loss: 0.16214 - R2: 1.0021 -- iter: 0192/1168
Training Step: 365  | total loss: [1m[32m0.16128[0m[0m | time: 0.047s
| SGD | epoch: 020 | loss: 0.16128 - R2

Training Step: 417  | total loss: [1m[32m0.16263[0m[0m | time: 0.123s
| SGD | epoch: 022 | loss: 0.16263 - R2: 1.0017 -- iter: 1152/1168
Training Step: 418  | total loss: [1m[32m0.16116[0m[0m | time: 1.128s
| SGD | epoch: 022 | loss: 0.16116 - R2: 1.0012 | val_loss: 0.15090 - val_acc: 1.0040 -- iter: 1168/1168
--
Training Step: 419  | total loss: [1m[32m0.15646[0m[0m | time: 0.062s
| SGD | epoch: 023 | loss: 0.15646 - R2: 1.0009 -- iter: 0064/1168
Training Step: 420  | total loss: [1m[32m0.15646[0m[0m | time: 0.064s
| SGD | epoch: 023 | loss: 0.15646 - R2: 1.0002 -- iter: 0128/1168
Training Step: 421  | total loss: [1m[32m0.15832[0m[0m | time: 0.072s
| SGD | epoch: 023 | loss: 0.15832 - R2: 0.9995 -- iter: 0192/1168
Training Step: 422  | total loss: [1m[32m0.15832[0m[0m | time: 0.074s
| SGD | epoch: 023 | loss: 0.15832 - R2: 0.9997 -- iter: 0256/1168
Training Step: 423  | total loss: [1m[32m0.16352[0m[0m | time: 0.076s
| SGD | epoch: 023 | loss: 0.16352 - R2

Training Step: 475  | total loss: [1m[32m0.16728[0m[0m | time: 1.221s
| SGD | epoch: 025 | loss: 0.16728 - R2: 1.0003 | val_loss: 0.15090 - val_acc: 1.0040 -- iter: 1168/1168
--
Training Step: 476  | total loss: [1m[32m0.16240[0m[0m | time: 0.040s
| SGD | epoch: 026 | loss: 0.16240 - R2: 1.0003 -- iter: 0064/1168
Training Step: 477  | total loss: [1m[32m0.16240[0m[0m | time: 0.045s
| SGD | epoch: 026 | loss: 0.16240 - R2: 1.0003 -- iter: 0128/1168
Training Step: 478  | total loss: [1m[32m0.15752[0m[0m | time: 0.050s
| SGD | epoch: 026 | loss: 0.15752 - R2: 1.0015 -- iter: 0192/1168
Training Step: 479  | total loss: [1m[32m0.15611[0m[0m | time: 0.052s
| SGD | epoch: 026 | loss: 0.15611 - R2: 1.0015 -- iter: 0256/1168
Training Step: 480  | total loss: [1m[32m0.16766[0m[0m | time: 0.056s
| SGD | epoch: 026 | loss: 0.16766 - R2: 1.0005 -- iter: 0320/1168
Training Step: 481  | total loss: [1m[32m0.17281[0m[0m | time: 0.060s
| SGD | epoch: 026 | loss: 0.17281 - R2

Training Step: 533  | total loss: [1m[32m0.16039[0m[0m | time: 0.043s
| SGD | epoch: 029 | loss: 0.16039 - R2: 1.0003 -- iter: 0064/1168
Training Step: 534  | total loss: [1m[32m0.16039[0m[0m | time: 0.047s
| SGD | epoch: 029 | loss: 0.16039 - R2: 1.0003 -- iter: 0128/1168
Training Step: 535  | total loss: [1m[32m0.16902[0m[0m | time: 0.053s
| SGD | epoch: 029 | loss: 0.16902 - R2: 1.0002 -- iter: 0192/1168
Training Step: 536  | total loss: [1m[32m0.16477[0m[0m | time: 0.058s
| SGD | epoch: 029 | loss: 0.16477 - R2: 0.9996 -- iter: 0256/1168
Training Step: 537  | total loss: [1m[32m0.16477[0m[0m | time: 0.061s
| SGD | epoch: 029 | loss: 0.16477 - R2: 0.9996 -- iter: 0320/1168
Training Step: 538  | total loss: [1m[32m0.15862[0m[0m | time: 0.063s
| SGD | epoch: 029 | loss: 0.15862 - R2: 1.0005 -- iter: 0384/1168
Training Step: 539  | total loss: [1m[32m0.16100[0m[0m | time: 0.066s
| SGD | epoch: 029 | loss: 0.16100 - R2: 1.0004 -- iter: 0448/1168
Training Step

Training Step: 591  | total loss: [1m[32m0.16963[0m[0m | time: 0.085s
| SGD | epoch: 032 | loss: 0.16963 - R2: 0.9978 -- iter: 0128/1168
Training Step: 592  | total loss: [1m[32m0.16588[0m[0m | time: 0.088s
| SGD | epoch: 032 | loss: 0.16588 - R2: 0.9969 -- iter: 0192/1168
Training Step: 593  | total loss: [1m[32m0.16273[0m[0m | time: 0.090s
| SGD | epoch: 032 | loss: 0.16273 - R2: 0.9969 -- iter: 0256/1168
Training Step: 594  | total loss: [1m[32m0.16216[0m[0m | time: 0.092s
| SGD | epoch: 032 | loss: 0.16216 - R2: 0.9970 -- iter: 0320/1168
Training Step: 595  | total loss: [1m[32m0.15920[0m[0m | time: 0.094s
| SGD | epoch: 032 | loss: 0.15920 - R2: 0.9993 -- iter: 0384/1168
Training Step: 596  | total loss: [1m[32m0.16396[0m[0m | time: 0.097s
| SGD | epoch: 032 | loss: 0.16396 - R2: 1.0002 -- iter: 0448/1168
Training Step: 597  | total loss: [1m[32m0.17322[0m[0m | time: 0.101s
| SGD | epoch: 032 | loss: 0.17322 - R2: 0.9992 -- iter: 0512/1168
Training Step

Training Step: 649  | total loss: [1m[32m0.16000[0m[0m | time: 0.073s
| SGD | epoch: 035 | loss: 0.16000 - R2: 1.0019 -- iter: 0192/1168
Training Step: 650  | total loss: [1m[32m0.16000[0m[0m | time: 0.080s
| SGD | epoch: 035 | loss: 0.16000 - R2: 1.0019 -- iter: 0256/1168
Training Step: 651  | total loss: [1m[32m0.15825[0m[0m | time: 0.085s
| SGD | epoch: 035 | loss: 0.15825 - R2: 1.0015 -- iter: 0320/1168
Training Step: 652  | total loss: [1m[32m0.16632[0m[0m | time: 0.087s
| SGD | epoch: 035 | loss: 0.16632 - R2: 1.0017 -- iter: 0384/1168
Training Step: 653  | total loss: [1m[32m0.16711[0m[0m | time: 0.091s
| SGD | epoch: 035 | loss: 0.16711 - R2: 1.0002 -- iter: 0448/1168
Training Step: 654  | total loss: [1m[32m0.16721[0m[0m | time: 0.096s
| SGD | epoch: 035 | loss: 0.16721 - R2: 1.0008 -- iter: 0512/1168
Training Step: 655  | total loss: [1m[32m0.16362[0m[0m | time: 0.098s
| SGD | epoch: 035 | loss: 0.16362 - R2: 1.0006 -- iter: 0576/1168
Training Step

Training Step: 707  | total loss: [1m[32m0.16112[0m[0m | time: 0.081s
| SGD | epoch: 038 | loss: 0.16112 - R2: 1.0019 -- iter: 0256/1168
Training Step: 708  | total loss: [1m[32m0.16822[0m[0m | time: 0.087s
| SGD | epoch: 038 | loss: 0.16822 - R2: 1.0028 -- iter: 0320/1168
Training Step: 709  | total loss: [1m[32m0.16822[0m[0m | time: 0.089s
| SGD | epoch: 038 | loss: 0.16822 - R2: 1.0028 -- iter: 0384/1168
Training Step: 710  | total loss: [1m[32m0.17007[0m[0m | time: 0.092s
| SGD | epoch: 038 | loss: 0.17007 - R2: 1.0019 -- iter: 0448/1168
Training Step: 711  | total loss: [1m[32m0.17154[0m[0m | time: 0.096s
| SGD | epoch: 038 | loss: 0.17154 - R2: 1.0028 -- iter: 0512/1168
Training Step: 712  | total loss: [1m[32m0.16921[0m[0m | time: 0.100s
| SGD | epoch: 038 | loss: 0.16921 - R2: 1.0023 -- iter: 0576/1168
Training Step: 713  | total loss: [1m[32m0.16620[0m[0m | time: 0.105s
| SGD | epoch: 038 | loss: 0.16620 - R2: 1.0030 -- iter: 0640/1168
Training Step

Training Step: 765  | total loss: [1m[32m0.17855[0m[0m | time: 0.092s
| SGD | epoch: 041 | loss: 0.17855 - R2: 0.9970 -- iter: 0320/1168
Training Step: 766  | total loss: [1m[32m0.17855[0m[0m | time: 0.095s
| SGD | epoch: 041 | loss: 0.17855 - R2: 0.9970 -- iter: 0384/1168
Training Step: 767  | total loss: [1m[32m0.17674[0m[0m | time: 0.098s
| SGD | epoch: 041 | loss: 0.17674 - R2: 1.0013 -- iter: 0448/1168
Training Step: 768  | total loss: [1m[32m0.17615[0m[0m | time: 0.102s
| SGD | epoch: 041 | loss: 0.17615 - R2: 1.0013 -- iter: 0512/1168
Training Step: 769  | total loss: [1m[32m0.17481[0m[0m | time: 0.104s
| SGD | epoch: 041 | loss: 0.17481 - R2: 1.0005 -- iter: 0576/1168
Training Step: 770  | total loss: [1m[32m0.17481[0m[0m | time: 0.107s
| SGD | epoch: 041 | loss: 0.17481 - R2: 1.0005 -- iter: 0640/1168
Training Step: 771  | total loss: [1m[32m0.17651[0m[0m | time: 0.113s
| SGD | epoch: 041 | loss: 0.17651 - R2: 0.9994 -- iter: 0704/1168
Training Step

Training Step: 823  | total loss: [1m[32m0.15983[0m[0m | time: 0.036s
| SGD | epoch: 044 | loss: 0.15983 - R2: 0.9975 -- iter: 0384/1168
Training Step: 824  | total loss: [1m[32m0.15800[0m[0m | time: 0.038s
| SGD | epoch: 044 | loss: 0.15800 - R2: 0.9983 -- iter: 0448/1168
Training Step: 825  | total loss: [1m[32m0.15910[0m[0m | time: 0.041s
| SGD | epoch: 044 | loss: 0.15910 - R2: 0.9996 -- iter: 0512/1168
Training Step: 826  | total loss: [1m[32m0.16325[0m[0m | time: 0.045s
| SGD | epoch: 044 | loss: 0.16325 - R2: 0.9993 -- iter: 0576/1168
Training Step: 827  | total loss: [1m[32m0.16873[0m[0m | time: 0.048s
| SGD | epoch: 044 | loss: 0.16873 - R2: 0.9994 -- iter: 0640/1168
Training Step: 828  | total loss: [1m[32m0.16873[0m[0m | time: 0.050s
| SGD | epoch: 044 | loss: 0.16873 - R2: 0.9994 -- iter: 0704/1168
Training Step: 829  | total loss: [1m[32m0.16259[0m[0m | time: 0.052s
| SGD | epoch: 044 | loss: 0.16259 - R2: 0.9990 -- iter: 0768/1168
Training Step

Training Step: 881  | total loss: [1m[32m0.16662[0m[0m | time: 0.131s
| SGD | epoch: 047 | loss: 0.16662 - R2: 1.0003 -- iter: 0448/1168
Training Step: 882  | total loss: [1m[32m0.16662[0m[0m | time: 0.134s
| SGD | epoch: 047 | loss: 0.16662 - R2: 0.9992 -- iter: 0512/1168
Training Step: 883  | total loss: [1m[32m0.16595[0m[0m | time: 0.136s
| SGD | epoch: 047 | loss: 0.16595 - R2: 0.9992 -- iter: 0576/1168
Training Step: 884  | total loss: [1m[32m0.16352[0m[0m | time: 0.138s
| SGD | epoch: 047 | loss: 0.16352 - R2: 0.9978 -- iter: 0640/1168
Training Step: 885  | total loss: [1m[32m0.16340[0m[0m | time: 0.140s
| SGD | epoch: 047 | loss: 0.16340 - R2: 0.9978 -- iter: 0704/1168
Training Step: 886  | total loss: [1m[32m0.16061[0m[0m | time: 0.143s
| SGD | epoch: 047 | loss: 0.16061 - R2: 0.9984 -- iter: 0768/1168
Training Step: 887  | total loss: [1m[32m0.16271[0m[0m | time: 0.144s
| SGD | epoch: 047 | loss: 0.16271 - R2: 0.9987 -- iter: 0832/1168
Training Step

Training Step: 939  | total loss: [1m[32m0.15899[0m[0m | time: 0.118s
| SGD | epoch: 050 | loss: 0.15899 - R2: 0.9992 -- iter: 0512/1168
Training Step: 940  | total loss: [1m[32m0.15261[0m[0m | time: 0.120s
| SGD | epoch: 050 | loss: 0.15261 - R2: 1.0005 -- iter: 0576/1168
Training Step: 941  | total loss: [1m[32m0.14686[0m[0m | time: 0.123s
| SGD | epoch: 050 | loss: 0.14686 - R2: 1.0017 -- iter: 0640/1168
Training Step: 942  | total loss: [1m[32m0.14873[0m[0m | time: 0.125s
| SGD | epoch: 050 | loss: 0.14873 - R2: 1.0008 -- iter: 0704/1168
Training Step: 943  | total loss: [1m[32m0.15587[0m[0m | time: 0.141s
| SGD | epoch: 050 | loss: 0.15587 - R2: 1.0011 -- iter: 0768/1168
Training Step: 944  | total loss: [1m[32m0.15587[0m[0m | time: 0.144s
| SGD | epoch: 050 | loss: 0.15587 - R2: 1.0011 -- iter: 0832/1168
Training Step: 945  | total loss: [1m[32m0.15708[0m[0m | time: 0.150s
| SGD | epoch: 050 | loss: 0.15708 - R2: 1.0033 -- iter: 0896/1168
Training Step

Training Step: 45  | total loss: [1m[32m0.24341[0m[0m | time: 0.127s
| SGD | epoch: 003 | loss: 0.24341 - R2: 0.9965 -- iter: 0448/1168
Training Step: 46  | total loss: [1m[32m0.23544[0m[0m | time: 0.129s
| SGD | epoch: 003 | loss: 0.23544 - R2: 0.9965 -- iter: 0512/1168
Training Step: 47  | total loss: [1m[32m0.21038[0m[0m | time: 0.132s
| SGD | epoch: 003 | loss: 0.21038 - R2: 0.9981 -- iter: 0576/1168
Training Step: 48  | total loss: [1m[32m0.21038[0m[0m | time: 0.135s
| SGD | epoch: 003 | loss: 0.21038 - R2: 0.9981 -- iter: 0640/1168
Training Step: 49  | total loss: [1m[32m0.20412[0m[0m | time: 0.137s
| SGD | epoch: 003 | loss: 0.20412 - R2: 0.9977 -- iter: 0704/1168
Training Step: 50  | total loss: [1m[32m0.20128[0m[0m | time: 0.141s
| SGD | epoch: 003 | loss: 0.20128 - R2: 0.9975 -- iter: 0768/1168
Training Step: 51  | total loss: [1m[32m0.20128[0m[0m | time: 0.144s
| SGD | epoch: 003 | loss: 0.20128 - R2: 0.9975 -- iter: 0832/1168
Training Step: 52  |

Training Step: 103  | total loss: [1m[32m0.15408[0m[0m | time: 0.018s
| SGD | epoch: 006 | loss: 0.15408 - R2: 0.9999 -- iter: 0512/1168
Training Step: 104  | total loss: [1m[32m0.15633[0m[0m | time: 0.019s
| SGD | epoch: 006 | loss: 0.15633 - R2: 0.9993 -- iter: 0576/1168
Training Step: 105  | total loss: [1m[32m0.15620[0m[0m | time: 0.021s
| SGD | epoch: 006 | loss: 0.15620 - R2: 0.9989 -- iter: 0640/1168
Training Step: 106  | total loss: [1m[32m0.15742[0m[0m | time: 0.024s
| SGD | epoch: 006 | loss: 0.15742 - R2: 0.9995 -- iter: 0704/1168
Training Step: 107  | total loss: [1m[32m0.15840[0m[0m | time: 0.026s
| SGD | epoch: 006 | loss: 0.15840 - R2: 1.0008 -- iter: 0768/1168
Training Step: 108  | total loss: [1m[32m0.15769[0m[0m | time: 0.029s
| SGD | epoch: 006 | loss: 0.15769 - R2: 0.9991 -- iter: 0832/1168
Training Step: 109  | total loss: [1m[32m0.15726[0m[0m | time: 0.030s
| SGD | epoch: 006 | loss: 0.15726 - R2: 0.9991 -- iter: 0896/1168
Training Step

Training Step: 161  | total loss: [1m[32m0.17268[0m[0m | time: 0.042s
| SGD | epoch: 009 | loss: 0.17268 - R2: 0.9992 -- iter: 0576/1168
Training Step: 162  | total loss: [1m[32m0.17779[0m[0m | time: 0.048s
| SGD | epoch: 009 | loss: 0.17779 - R2: 0.9990 -- iter: 0640/1168
Training Step: 163  | total loss: [1m[32m0.17647[0m[0m | time: 0.053s
| SGD | epoch: 009 | loss: 0.17647 - R2: 1.0000 -- iter: 0704/1168
Training Step: 164  | total loss: [1m[32m0.17881[0m[0m | time: 0.059s
| SGD | epoch: 009 | loss: 0.17881 - R2: 0.9994 -- iter: 0768/1168
Training Step: 165  | total loss: [1m[32m0.18637[0m[0m | time: 0.064s
| SGD | epoch: 009 | loss: 0.18637 - R2: 0.9990 -- iter: 0832/1168
Training Step: 166  | total loss: [1m[32m0.18637[0m[0m | time: 0.065s
| SGD | epoch: 009 | loss: 0.18637 - R2: 0.9990 -- iter: 0896/1168
Training Step: 167  | total loss: [1m[32m0.18544[0m[0m | time: 0.068s
| SGD | epoch: 009 | loss: 0.18544 - R2: 0.9983 -- iter: 0960/1168
Training Step

Training Step: 219  | total loss: [1m[32m0.16209[0m[0m | time: 0.082s
| SGD | epoch: 012 | loss: 0.16209 - R2: 0.9989 -- iter: 0640/1168
Training Step: 220  | total loss: [1m[32m0.16660[0m[0m | time: 0.084s
| SGD | epoch: 012 | loss: 0.16660 - R2: 0.9990 -- iter: 0704/1168
Training Step: 221  | total loss: [1m[32m0.17066[0m[0m | time: 0.089s
| SGD | epoch: 012 | loss: 0.17066 - R2: 0.9990 -- iter: 0768/1168
Training Step: 222  | total loss: [1m[32m0.16604[0m[0m | time: 0.096s
| SGD | epoch: 012 | loss: 0.16604 - R2: 0.9997 -- iter: 0832/1168
Training Step: 223  | total loss: [1m[32m0.16604[0m[0m | time: 0.100s
| SGD | epoch: 012 | loss: 0.16604 - R2: 0.9997 -- iter: 0896/1168
Training Step: 224  | total loss: [1m[32m0.16722[0m[0m | time: 0.104s
| SGD | epoch: 012 | loss: 0.16722 - R2: 0.9999 -- iter: 0960/1168
Training Step: 225  | total loss: [1m[32m0.17348[0m[0m | time: 0.108s
| SGD | epoch: 012 | loss: 0.17348 - R2: 1.0007 -- iter: 1024/1168
Training Step

Training Step: 277  | total loss: [1m[32m0.16244[0m[0m | time: 0.064s
| SGD | epoch: 015 | loss: 0.16244 - R2: 1.0007 -- iter: 0704/1168
Training Step: 278  | total loss: [1m[32m0.15913[0m[0m | time: 0.068s
| SGD | epoch: 015 | loss: 0.15913 - R2: 1.0014 -- iter: 0768/1168
Training Step: 279  | total loss: [1m[32m0.15676[0m[0m | time: 0.072s
| SGD | epoch: 015 | loss: 0.15676 - R2: 1.0024 -- iter: 0832/1168
Training Step: 280  | total loss: [1m[32m0.15462[0m[0m | time: 0.076s
| SGD | epoch: 015 | loss: 0.15462 - R2: 1.0033 -- iter: 0896/1168
Training Step: 281  | total loss: [1m[32m0.15750[0m[0m | time: 0.086s
| SGD | epoch: 015 | loss: 0.15750 - R2: 1.0028 -- iter: 0960/1168
Training Step: 282  | total loss: [1m[32m0.15509[0m[0m | time: 0.091s
| SGD | epoch: 015 | loss: 0.15509 - R2: 1.0025 -- iter: 1024/1168
Training Step: 283  | total loss: [1m[32m0.15703[0m[0m | time: 0.099s
| SGD | epoch: 015 | loss: 0.15703 - R2: 1.0014 -- iter: 1088/1168
Training Step

Training Step: 335  | total loss: [1m[32m0.16731[0m[0m | time: 0.124s
| SGD | epoch: 018 | loss: 0.16731 - R2: 1.0004 -- iter: 0768/1168
Training Step: 336  | total loss: [1m[32m0.16940[0m[0m | time: 0.127s
| SGD | epoch: 018 | loss: 0.16940 - R2: 1.0005 -- iter: 0832/1168
Training Step: 337  | total loss: [1m[32m0.16947[0m[0m | time: 0.129s
| SGD | epoch: 018 | loss: 0.16947 - R2: 1.0002 -- iter: 0896/1168
Training Step: 338  | total loss: [1m[32m0.17246[0m[0m | time: 0.131s
| SGD | epoch: 018 | loss: 0.17246 - R2: 1.0002 -- iter: 0960/1168
Training Step: 339  | total loss: [1m[32m0.16547[0m[0m | time: 0.132s
| SGD | epoch: 018 | loss: 0.16547 - R2: 1.0005 -- iter: 1024/1168
Training Step: 340  | total loss: [1m[32m0.15841[0m[0m | time: 0.135s
| SGD | epoch: 018 | loss: 0.15841 - R2: 1.0002 -- iter: 1088/1168
Training Step: 341  | total loss: [1m[32m0.15131[0m[0m | time: 0.139s
| SGD | epoch: 018 | loss: 0.15131 - R2: 0.9983 -- iter: 1152/1168
Training Step

Training Step: 393  | total loss: [1m[32m0.16192[0m[0m | time: 0.123s
| SGD | epoch: 021 | loss: 0.16192 - R2: 0.9990 -- iter: 0832/1168
Training Step: 394  | total loss: [1m[32m0.16136[0m[0m | time: 0.126s
| SGD | epoch: 021 | loss: 0.16136 - R2: 1.0000 -- iter: 0896/1168
Training Step: 395  | total loss: [1m[32m0.16465[0m[0m | time: 0.130s
| SGD | epoch: 021 | loss: 0.16465 - R2: 1.0023 -- iter: 0960/1168
Training Step: 396  | total loss: [1m[32m0.16510[0m[0m | time: 0.134s
| SGD | epoch: 021 | loss: 0.16510 - R2: 1.0034 -- iter: 1024/1168
Training Step: 397  | total loss: [1m[32m0.16510[0m[0m | time: 0.137s
| SGD | epoch: 021 | loss: 0.16510 - R2: 1.0041 -- iter: 1088/1168
Training Step: 398  | total loss: [1m[32m0.16754[0m[0m | time: 0.139s
| SGD | epoch: 021 | loss: 0.16754 - R2: 1.0041 -- iter: 1152/1168
Training Step: 399  | total loss: [1m[32m0.16797[0m[0m | time: 1.145s
| SGD | epoch: 021 | loss: 0.16797 - R2: 0.9995 | val_loss: 0.14237 - val_acc: 1

Training Step: 451  | total loss: [1m[32m0.16103[0m[0m | time: 0.117s
| SGD | epoch: 024 | loss: 0.16103 - R2: 1.0027 -- iter: 0896/1168
Training Step: 452  | total loss: [1m[32m0.15864[0m[0m | time: 0.121s
| SGD | epoch: 024 | loss: 0.15864 - R2: 1.0011 -- iter: 0960/1168
Training Step: 453  | total loss: [1m[32m0.15751[0m[0m | time: 0.125s
| SGD | epoch: 024 | loss: 0.15751 - R2: 1.0012 -- iter: 1024/1168
Training Step: 454  | total loss: [1m[32m0.16226[0m[0m | time: 0.131s
| SGD | epoch: 024 | loss: 0.16226 - R2: 1.0009 -- iter: 1088/1168
Training Step: 455  | total loss: [1m[32m0.16226[0m[0m | time: 0.134s
| SGD | epoch: 024 | loss: 0.16226 - R2: 1.0009 -- iter: 1152/1168
Training Step: 456  | total loss: [1m[32m0.16388[0m[0m | time: 1.143s
| SGD | epoch: 024 | loss: 0.16388 - R2: 1.0002 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 457  | total loss: [1m[32m0.16389[0m[0m | time: 0.052s
| SGD | epoch: 025 | loss: 0.16389 - R2

Training Step: 509  | total loss: [1m[32m0.16415[0m[0m | time: 0.107s
| SGD | epoch: 027 | loss: 0.16415 - R2: 0.9988 -- iter: 0960/1168
Training Step: 510  | total loss: [1m[32m0.16966[0m[0m | time: 0.109s
| SGD | epoch: 027 | loss: 0.16966 - R2: 0.9980 -- iter: 1024/1168
Training Step: 511  | total loss: [1m[32m0.16689[0m[0m | time: 0.112s
| SGD | epoch: 027 | loss: 0.16689 - R2: 0.9980 -- iter: 1088/1168
Training Step: 512  | total loss: [1m[32m0.16502[0m[0m | time: 0.114s
| SGD | epoch: 027 | loss: 0.16502 - R2: 0.9987 -- iter: 1152/1168
Training Step: 513  | total loss: [1m[32m0.16409[0m[0m | time: 1.119s
| SGD | epoch: 027 | loss: 0.16409 - R2: 0.9990 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 514  | total loss: [1m[32m0.17187[0m[0m | time: 0.103s
| SGD | epoch: 028 | loss: 0.17187 - R2: 0.9988 -- iter: 0064/1168
Training Step: 515  | total loss: [1m[32m0.17137[0m[0m | time: 0.109s
| SGD | epoch: 028 | loss: 0.17137 - R2

Training Step: 567  | total loss: [1m[32m0.16882[0m[0m | time: 0.091s
| SGD | epoch: 030 | loss: 0.16882 - R2: 1.0031 -- iter: 1024/1168
Training Step: 568  | total loss: [1m[32m0.16882[0m[0m | time: 0.093s
| SGD | epoch: 030 | loss: 0.16882 - R2: 1.0031 -- iter: 1088/1168
Training Step: 569  | total loss: [1m[32m0.17087[0m[0m | time: 0.094s
| SGD | epoch: 030 | loss: 0.17087 - R2: 1.0034 -- iter: 1152/1168
Training Step: 570  | total loss: [1m[32m0.17087[0m[0m | time: 1.100s
| SGD | epoch: 030 | loss: 0.17087 - R2: 1.0034 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 571  | total loss: [1m[32m0.16816[0m[0m | time: 0.002s
| SGD | epoch: 031 | loss: 0.16816 - R2: 1.0024 -- iter: 0064/1168
Training Step: 572  | total loss: [1m[32m0.16786[0m[0m | time: 0.004s
| SGD | epoch: 031 | loss: 0.16786 - R2: 1.0024 -- iter: 0128/1168
Training Step: 573  | total loss: [1m[32m0.16719[0m[0m | time: 0.008s
| SGD | epoch: 031 | loss: 0.16719 - R2

Training Step: 625  | total loss: [1m[32m0.16619[0m[0m | time: 0.148s
| SGD | epoch: 033 | loss: 0.16619 - R2: 1.0040 -- iter: 1088/1168
Training Step: 626  | total loss: [1m[32m0.16619[0m[0m | time: 0.151s
| SGD | epoch: 033 | loss: 0.16619 - R2: 1.0040 -- iter: 1152/1168
Training Step: 627  | total loss: [1m[32m0.16463[0m[0m | time: 1.158s
| SGD | epoch: 033 | loss: 0.16463 - R2: 1.0039 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 628  | total loss: [1m[32m0.16616[0m[0m | time: 0.083s
| SGD | epoch: 034 | loss: 0.16616 - R2: 1.0062 -- iter: 0064/1168
Training Step: 629  | total loss: [1m[32m0.16164[0m[0m | time: 0.087s
| SGD | epoch: 034 | loss: 0.16164 - R2: 1.0067 -- iter: 0128/1168
Training Step: 630  | total loss: [1m[32m0.16164[0m[0m | time: 0.090s
| SGD | epoch: 034 | loss: 0.16164 - R2: 1.0067 -- iter: 0192/1168
Training Step: 631  | total loss: [1m[32m0.16329[0m[0m | time: 0.092s
| SGD | epoch: 034 | loss: 0.16329 - R2

Training Step: 683  | total loss: [1m[32m0.16525[0m[0m | time: 0.134s
| SGD | epoch: 036 | loss: 0.16525 - R2: 1.0019 -- iter: 1152/1168
Training Step: 684  | total loss: [1m[32m0.16525[0m[0m | time: 1.138s
| SGD | epoch: 036 | loss: 0.16525 - R2: 1.0019 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 685  | total loss: [1m[32m0.16571[0m[0m | time: 0.105s
| SGD | epoch: 037 | loss: 0.16571 - R2: 1.0021 -- iter: 0064/1168
Training Step: 686  | total loss: [1m[32m0.16565[0m[0m | time: 0.109s
| SGD | epoch: 037 | loss: 0.16565 - R2: 1.0015 -- iter: 0128/1168
Training Step: 687  | total loss: [1m[32m0.16878[0m[0m | time: 0.111s
| SGD | epoch: 037 | loss: 0.16878 - R2: 1.0023 -- iter: 0192/1168
Training Step: 688  | total loss: [1m[32m0.16687[0m[0m | time: 0.113s
| SGD | epoch: 037 | loss: 0.16687 - R2: 1.0025 -- iter: 0256/1168
Training Step: 689  | total loss: [1m[32m0.16490[0m[0m | time: 0.118s
| SGD | epoch: 037 | loss: 0.16490 - R2

Training Step: 741  | total loss: [1m[32m0.16185[0m[0m | time: 1.164s
| SGD | epoch: 039 | loss: 0.16185 - R2: 1.0032 | val_loss: 0.14237 - val_acc: 1.0036 -- iter: 1168/1168
--
Training Step: 742  | total loss: [1m[32m0.16173[0m[0m | time: 0.007s
| SGD | epoch: 040 | loss: 0.16173 - R2: 1.0019 -- iter: 0064/1168
Training Step: 743  | total loss: [1m[32m0.15819[0m[0m | time: 0.009s
| SGD | epoch: 040 | loss: 0.15819 - R2: 1.0027 -- iter: 0128/1168
Training Step: 744  | total loss: [1m[32m0.16238[0m[0m | time: 0.011s
| SGD | epoch: 040 | loss: 0.16238 - R2: 1.0038 -- iter: 0192/1168
Training Step: 745  | total loss: [1m[32m0.15985[0m[0m | time: 0.013s
| SGD | epoch: 040 | loss: 0.15985 - R2: 1.0026 -- iter: 0256/1168
Training Step: 746  | total loss: [1m[32m0.16961[0m[0m | time: 0.015s
| SGD | epoch: 040 | loss: 0.16961 - R2: 1.0009 -- iter: 0320/1168
Training Step: 747  | total loss: [1m[32m0.16604[0m[0m | time: 0.017s
| SGD | epoch: 040 | loss: 0.16604 - R2

Training Step: 799  | total loss: [1m[32m0.17168[0m[0m | time: 0.017s
| SGD | epoch: 043 | loss: 0.17168 - R2: 0.9997 -- iter: 0064/1168
Training Step: 800  | total loss: [1m[32m0.17934[0m[0m | time: 0.020s
| SGD | epoch: 043 | loss: 0.17934 - R2: 0.9971 -- iter: 0128/1168
Training Step: 801  | total loss: [1m[32m0.18624[0m[0m | time: 0.025s
| SGD | epoch: 043 | loss: 0.18624 - R2: 0.9948 -- iter: 0192/1168
Training Step: 802  | total loss: [1m[32m0.18438[0m[0m | time: 0.029s
| SGD | epoch: 043 | loss: 0.18438 - R2: 0.9960 -- iter: 0256/1168
Training Step: 803  | total loss: [1m[32m0.18438[0m[0m | time: 0.031s
| SGD | epoch: 043 | loss: 0.18438 - R2: 0.9958 -- iter: 0320/1168
Training Step: 804  | total loss: [1m[32m0.17353[0m[0m | time: 0.037s
| SGD | epoch: 043 | loss: 0.17353 - R2: 0.9970 -- iter: 0384/1168
Training Step: 805  | total loss: [1m[32m0.17104[0m[0m | time: 0.040s
| SGD | epoch: 043 | loss: 0.17104 - R2: 0.9992 -- iter: 0448/1168
Training Step

Training Step: 857  | total loss: [1m[32m0.16426[0m[0m | time: 0.078s
| SGD | epoch: 046 | loss: 0.16426 - R2: 1.0004 -- iter: 0128/1168
Training Step: 858  | total loss: [1m[32m0.16426[0m[0m | time: 0.081s
| SGD | epoch: 046 | loss: 0.16426 - R2: 1.0004 -- iter: 0192/1168
Training Step: 859  | total loss: [1m[32m0.16104[0m[0m | time: 0.084s
| SGD | epoch: 046 | loss: 0.16104 - R2: 1.0002 -- iter: 0256/1168
Training Step: 860  | total loss: [1m[32m0.15882[0m[0m | time: 0.089s
| SGD | epoch: 046 | loss: 0.15882 - R2: 0.9991 -- iter: 0320/1168
Training Step: 861  | total loss: [1m[32m0.15570[0m[0m | time: 0.096s
| SGD | epoch: 046 | loss: 0.15570 - R2: 0.9988 -- iter: 0384/1168
Training Step: 862  | total loss: [1m[32m0.15570[0m[0m | time: 0.101s
| SGD | epoch: 046 | loss: 0.15570 - R2: 0.9981 -- iter: 0448/1168
Training Step: 863  | total loss: [1m[32m0.15408[0m[0m | time: 0.103s
| SGD | epoch: 046 | loss: 0.15408 - R2: 0.9981 -- iter: 0512/1168
Training Step

Training Step: 915  | total loss: [1m[32m0.16171[0m[0m | time: 0.070s
| SGD | epoch: 049 | loss: 0.16171 - R2: 1.0019 -- iter: 0192/1168
Training Step: 916  | total loss: [1m[32m0.16719[0m[0m | time: 0.072s
| SGD | epoch: 049 | loss: 0.16719 - R2: 1.0025 -- iter: 0256/1168
Training Step: 917  | total loss: [1m[32m0.16735[0m[0m | time: 0.082s
| SGD | epoch: 049 | loss: 0.16735 - R2: 1.0028 -- iter: 0320/1168
Training Step: 918  | total loss: [1m[32m0.16971[0m[0m | time: 0.085s
| SGD | epoch: 049 | loss: 0.16971 - R2: 1.0018 -- iter: 0384/1168
Training Step: 919  | total loss: [1m[32m0.17299[0m[0m | time: 0.090s
| SGD | epoch: 049 | loss: 0.17299 - R2: 1.0001 -- iter: 0448/1168
Training Step: 920  | total loss: [1m[32m0.17299[0m[0m | time: 0.092s
| SGD | epoch: 049 | loss: 0.17299 - R2: 1.0001 -- iter: 0512/1168
Training Step: 921  | total loss: [1m[32m0.17689[0m[0m | time: 0.100s
| SGD | epoch: 049 | loss: 0.17689 - R2: 0.9980 -- iter: 0576/1168
Training Step

Training Step: 21  | total loss: [1m[32m18.47350[0m[0m | time: 0.012s
| SGD | epoch: 002 | loss: 18.47350 - R2: 0.4456 -- iter: 0128/1168
Training Step: 22  | total loss: [1m[32m10.79813[0m[0m | time: 0.023s
| SGD | epoch: 002 | loss: 10.79813 - R2: 0.5911 -- iter: 0192/1168
Training Step: 23  | total loss: [1m[32m10.79813[0m[0m | time: 0.031s
| SGD | epoch: 002 | loss: 10.79813 - R2: 0.5911 -- iter: 0256/1168
Training Step: 24  | total loss: [1m[32m7.98193[0m[0m | time: 0.034s
| SGD | epoch: 002 | loss: 7.98193 - R2: 0.6694 -- iter: 0320/1168
Training Step: 25  | total loss: [1m[32m5.83977[0m[0m | time: 0.036s
| SGD | epoch: 002 | loss: 5.83977 - R2: 0.7544 -- iter: 0384/1168
Training Step: 26  | total loss: [1m[32m4.33254[0m[0m | time: 0.038s
| SGD | epoch: 002 | loss: 4.33254 - R2: 0.8207 -- iter: 0448/1168
Training Step: 27  | total loss: [1m[32m3.26186[0m[0m | time: 0.040s
| SGD | epoch: 002 | loss: 3.26186 - R2: 0.8608 -- iter: 0512/1168
Training Step:

Training Step: 79  | total loss: [1m[32m0.15464[0m[0m | time: 0.062s
| SGD | epoch: 005 | loss: 0.15464 - R2: 0.9994 -- iter: 0192/1168
Training Step: 80  | total loss: [1m[32m0.13859[0m[0m | time: 0.069s
| SGD | epoch: 005 | loss: 0.13859 - R2: 1.0004 -- iter: 0256/1168
Training Step: 81  | total loss: [1m[32m0.14730[0m[0m | time: 0.081s
| SGD | epoch: 005 | loss: 0.14730 - R2: 1.0015 -- iter: 0320/1168
Training Step: 82  | total loss: [1m[32m0.15296[0m[0m | time: 0.095s
| SGD | epoch: 005 | loss: 0.15296 - R2: 1.0000 -- iter: 0384/1168
Training Step: 83  | total loss: [1m[32m0.15708[0m[0m | time: 0.099s
| SGD | epoch: 005 | loss: 0.15708 - R2: 1.0009 -- iter: 0448/1168
Training Step: 84  | total loss: [1m[32m0.15460[0m[0m | time: 0.104s
| SGD | epoch: 005 | loss: 0.15460 - R2: 1.0006 -- iter: 0512/1168
Training Step: 85  | total loss: [1m[32m0.15634[0m[0m | time: 0.108s
| SGD | epoch: 005 | loss: 0.15634 - R2: 1.0005 -- iter: 0576/1168
Training Step: 86  |

Training Step: 137  | total loss: [1m[32m0.15745[0m[0m | time: 0.087s
| SGD | epoch: 008 | loss: 0.15745 - R2: 0.9986 -- iter: 0256/1168
Training Step: 138  | total loss: [1m[32m0.16009[0m[0m | time: 0.090s
| SGD | epoch: 008 | loss: 0.16009 - R2: 0.9987 -- iter: 0320/1168
Training Step: 139  | total loss: [1m[32m0.15798[0m[0m | time: 0.094s
| SGD | epoch: 008 | loss: 0.15798 - R2: 0.9990 -- iter: 0384/1168
Training Step: 140  | total loss: [1m[32m0.15779[0m[0m | time: 0.098s
| SGD | epoch: 008 | loss: 0.15779 - R2: 1.0077 -- iter: 0448/1168
Training Step: 141  | total loss: [1m[32m0.15779[0m[0m | time: 0.103s
| SGD | epoch: 008 | loss: 0.15779 - R2: 1.0077 -- iter: 0512/1168
Training Step: 142  | total loss: [1m[32m0.16467[0m[0m | time: 0.107s
| SGD | epoch: 008 | loss: 0.16467 - R2: 1.0043 -- iter: 0576/1168
Training Step: 143  | total loss: [1m[32m0.16467[0m[0m | time: 0.111s
| SGD | epoch: 008 | loss: 0.16467 - R2: 1.0037 -- iter: 0640/1168
Training Step

Training Step: 195  | total loss: [1m[32m0.16022[0m[0m | time: 0.046s
| SGD | epoch: 011 | loss: 0.16022 - R2: 0.9989 -- iter: 0320/1168
Training Step: 196  | total loss: [1m[32m0.15945[0m[0m | time: 0.055s
| SGD | epoch: 011 | loss: 0.15945 - R2: 0.9984 -- iter: 0384/1168
Training Step: 197  | total loss: [1m[32m0.15600[0m[0m | time: 0.062s
| SGD | epoch: 011 | loss: 0.15600 - R2: 0.9977 -- iter: 0448/1168
Training Step: 198  | total loss: [1m[32m0.15920[0m[0m | time: 0.067s
| SGD | epoch: 011 | loss: 0.15920 - R2: 0.9966 -- iter: 0512/1168
Training Step: 199  | total loss: [1m[32m0.15920[0m[0m | time: 0.069s
| SGD | epoch: 011 | loss: 0.15920 - R2: 0.9966 -- iter: 0576/1168
Training Step: 200  | total loss: [1m[32m0.15555[0m[0m | time: 0.074s
| SGD | epoch: 011 | loss: 0.15555 - R2: 0.9974 -- iter: 0640/1168
Training Step: 201  | total loss: [1m[32m0.15555[0m[0m | time: 0.079s
| SGD | epoch: 011 | loss: 0.15555 - R2: 0.9974 -- iter: 0704/1168
Training Step

Training Step: 253  | total loss: [1m[32m0.16604[0m[0m | time: 0.062s
| SGD | epoch: 014 | loss: 0.16604 - R2: 0.9968 -- iter: 0384/1168
Training Step: 254  | total loss: [1m[32m0.16378[0m[0m | time: 0.066s
| SGD | epoch: 014 | loss: 0.16378 - R2: 0.9979 -- iter: 0448/1168
Training Step: 255  | total loss: [1m[32m0.16238[0m[0m | time: 0.070s
| SGD | epoch: 014 | loss: 0.16238 - R2: 0.9965 -- iter: 0512/1168
Training Step: 256  | total loss: [1m[32m0.16388[0m[0m | time: 0.075s
| SGD | epoch: 014 | loss: 0.16388 - R2: 0.9978 -- iter: 0576/1168
Training Step: 257  | total loss: [1m[32m0.16096[0m[0m | time: 0.081s
| SGD | epoch: 014 | loss: 0.16096 - R2: 0.9971 -- iter: 0640/1168
Training Step: 258  | total loss: [1m[32m0.16535[0m[0m | time: 0.086s
| SGD | epoch: 014 | loss: 0.16535 - R2: 0.9982 -- iter: 0704/1168
Training Step: 259  | total loss: [1m[32m0.16686[0m[0m | time: 0.089s
| SGD | epoch: 014 | loss: 0.16686 - R2: 0.9979 -- iter: 0768/1168
Training Step

Training Step: 311  | total loss: [1m[32m0.16506[0m[0m | time: 0.067s
| SGD | epoch: 017 | loss: 0.16506 - R2: 0.9997 -- iter: 0448/1168
Training Step: 312  | total loss: [1m[32m0.16581[0m[0m | time: 0.071s
| SGD | epoch: 017 | loss: 0.16581 - R2: 0.9988 -- iter: 0512/1168
Training Step: 313  | total loss: [1m[32m0.16291[0m[0m | time: 0.076s
| SGD | epoch: 017 | loss: 0.16291 - R2: 0.9990 -- iter: 0576/1168
Training Step: 314  | total loss: [1m[32m0.16291[0m[0m | time: 0.085s
| SGD | epoch: 017 | loss: 0.16291 - R2: 0.9990 -- iter: 0640/1168
Training Step: 315  | total loss: [1m[32m0.15673[0m[0m | time: 0.089s
| SGD | epoch: 017 | loss: 0.15673 - R2: 0.9983 -- iter: 0704/1168
Training Step: 316  | total loss: [1m[32m0.15597[0m[0m | time: 0.093s
| SGD | epoch: 017 | loss: 0.15597 - R2: 0.9980 -- iter: 0768/1168
Training Step: 317  | total loss: [1m[32m0.15597[0m[0m | time: 0.096s
| SGD | epoch: 017 | loss: 0.15597 - R2: 0.9980 -- iter: 0832/1168
Training Step

Training Step: 369  | total loss: [1m[32m0.15529[0m[0m | time: 0.101s
| SGD | epoch: 020 | loss: 0.15529 - R2: 0.9979 -- iter: 0512/1168
Training Step: 370  | total loss: [1m[32m0.16211[0m[0m | time: 0.106s
| SGD | epoch: 020 | loss: 0.16211 - R2: 0.9982 -- iter: 0576/1168
Training Step: 371  | total loss: [1m[32m0.16103[0m[0m | time: 0.110s
| SGD | epoch: 020 | loss: 0.16103 - R2: 0.9992 -- iter: 0640/1168
Training Step: 372  | total loss: [1m[32m0.15996[0m[0m | time: 0.117s
| SGD | epoch: 020 | loss: 0.15996 - R2: 0.9992 -- iter: 0704/1168
Training Step: 373  | total loss: [1m[32m0.16071[0m[0m | time: 0.123s
| SGD | epoch: 020 | loss: 0.16071 - R2: 0.9988 -- iter: 0768/1168
Training Step: 374  | total loss: [1m[32m0.16071[0m[0m | time: 0.127s
| SGD | epoch: 020 | loss: 0.16071 - R2: 0.9988 -- iter: 0832/1168
Training Step: 375  | total loss: [1m[32m0.15760[0m[0m | time: 0.131s
| SGD | epoch: 020 | loss: 0.15760 - R2: 0.9983 -- iter: 0896/1168
Training Step

Training Step: 427  | total loss: [1m[32m0.15649[0m[0m | time: 0.093s
| SGD | epoch: 023 | loss: 0.15649 - R2: 0.9988 -- iter: 0576/1168
Training Step: 428  | total loss: [1m[32m0.14983[0m[0m | time: 0.096s
| SGD | epoch: 023 | loss: 0.14983 - R2: 0.9976 -- iter: 0640/1168
Training Step: 429  | total loss: [1m[32m0.14983[0m[0m | time: 0.098s
| SGD | epoch: 023 | loss: 0.14983 - R2: 0.9976 -- iter: 0704/1168
Training Step: 430  | total loss: [1m[32m0.15439[0m[0m | time: 0.103s
| SGD | epoch: 023 | loss: 0.15439 - R2: 0.9968 -- iter: 0768/1168
Training Step: 431  | total loss: [1m[32m0.15140[0m[0m | time: 0.105s
| SGD | epoch: 023 | loss: 0.15140 - R2: 0.9967 -- iter: 0832/1168
Training Step: 432  | total loss: [1m[32m0.14952[0m[0m | time: 0.106s
| SGD | epoch: 023 | loss: 0.14952 - R2: 0.9976 -- iter: 0896/1168
Training Step: 433  | total loss: [1m[32m0.14996[0m[0m | time: 0.109s
| SGD | epoch: 023 | loss: 0.14996 - R2: 0.9980 -- iter: 0960/1168
Training Step

Training Step: 485  | total loss: [1m[32m0.16075[0m[0m | time: 0.101s
| SGD | epoch: 026 | loss: 0.16075 - R2: 0.9988 -- iter: 0640/1168
Training Step: 486  | total loss: [1m[32m0.16075[0m[0m | time: 0.107s
| SGD | epoch: 026 | loss: 0.16075 - R2: 0.9982 -- iter: 0704/1168
Training Step: 487  | total loss: [1m[32m0.15350[0m[0m | time: 0.112s
| SGD | epoch: 026 | loss: 0.15350 - R2: 0.9982 -- iter: 0768/1168
Training Step: 488  | total loss: [1m[32m0.14898[0m[0m | time: 0.118s
| SGD | epoch: 026 | loss: 0.14898 - R2: 0.9996 -- iter: 0832/1168
Training Step: 489  | total loss: [1m[32m0.15722[0m[0m | time: 0.122s
| SGD | epoch: 026 | loss: 0.15722 - R2: 0.9996 -- iter: 0896/1168
Training Step: 490  | total loss: [1m[32m0.15616[0m[0m | time: 0.124s
| SGD | epoch: 026 | loss: 0.15616 - R2: 0.9989 -- iter: 0960/1168
Training Step: 491  | total loss: [1m[32m0.15555[0m[0m | time: 0.129s
| SGD | epoch: 026 | loss: 0.15555 - R2: 0.9986 -- iter: 1024/1168
Training Step

Training Step: 543  | total loss: [1m[32m0.16436[0m[0m | time: 0.140s
| SGD | epoch: 029 | loss: 0.16436 - R2: 0.9952 -- iter: 0704/1168
Training Step: 544  | total loss: [1m[32m0.15840[0m[0m | time: 0.144s
| SGD | epoch: 029 | loss: 0.15840 - R2: 0.9967 -- iter: 0768/1168
Training Step: 545  | total loss: [1m[32m0.15840[0m[0m | time: 0.146s
| SGD | epoch: 029 | loss: 0.15840 - R2: 0.9967 -- iter: 0832/1168
Training Step: 546  | total loss: [1m[32m0.15615[0m[0m | time: 0.150s
| SGD | epoch: 029 | loss: 0.15615 - R2: 0.9975 -- iter: 0896/1168
Training Step: 547  | total loss: [1m[32m0.15615[0m[0m | time: 0.160s
| SGD | epoch: 029 | loss: 0.15615 - R2: 0.9975 -- iter: 0960/1168
Training Step: 548  | total loss: [1m[32m0.15433[0m[0m | time: 0.163s
| SGD | epoch: 029 | loss: 0.15433 - R2: 0.9985 -- iter: 1024/1168
Training Step: 549  | total loss: [1m[32m0.15317[0m[0m | time: 0.166s
| SGD | epoch: 029 | loss: 0.15317 - R2: 0.9986 -- iter: 1088/1168
Training Step

Training Step: 601  | total loss: [1m[32m0.14503[0m[0m | time: 0.122s
| SGD | epoch: 032 | loss: 0.14503 - R2: 0.9982 -- iter: 0768/1168
Training Step: 602  | total loss: [1m[32m0.15114[0m[0m | time: 0.128s
| SGD | epoch: 032 | loss: 0.15114 - R2: 0.9981 -- iter: 0832/1168
Training Step: 603  | total loss: [1m[32m0.15114[0m[0m | time: 0.131s
| SGD | epoch: 032 | loss: 0.15114 - R2: 0.9981 -- iter: 0896/1168
Training Step: 604  | total loss: [1m[32m0.15302[0m[0m | time: 0.149s
| SGD | epoch: 032 | loss: 0.15302 - R2: 0.9981 -- iter: 0960/1168
Training Step: 605  | total loss: [1m[32m0.15664[0m[0m | time: 0.153s
| SGD | epoch: 032 | loss: 0.15664 - R2: 0.9989 -- iter: 1024/1168
Training Step: 606  | total loss: [1m[32m0.15128[0m[0m | time: 0.158s
| SGD | epoch: 032 | loss: 0.15128 - R2: 0.9991 -- iter: 1088/1168
Training Step: 607  | total loss: [1m[32m0.15128[0m[0m | time: 0.162s
| SGD | epoch: 032 | loss: 0.15128 - R2: 0.9991 -- iter: 1152/1168
Training Step

Training Step: 659  | total loss: [1m[32m0.16345[0m[0m | time: 0.115s
| SGD | epoch: 035 | loss: 0.16345 - R2: 0.9995 -- iter: 0832/1168
Training Step: 660  | total loss: [1m[32m0.15541[0m[0m | time: 0.146s
| SGD | epoch: 035 | loss: 0.15541 - R2: 0.9997 -- iter: 0896/1168
Training Step: 661  | total loss: [1m[32m0.14655[0m[0m | time: 0.152s
| SGD | epoch: 035 | loss: 0.14655 - R2: 1.0000 -- iter: 0960/1168
Training Step: 662  | total loss: [1m[32m0.14665[0m[0m | time: 0.158s
| SGD | epoch: 035 | loss: 0.14665 - R2: 1.0015 -- iter: 1024/1168
Training Step: 663  | total loss: [1m[32m0.14745[0m[0m | time: 0.164s
| SGD | epoch: 035 | loss: 0.14745 - R2: 1.0017 -- iter: 1088/1168
Training Step: 664  | total loss: [1m[32m0.14695[0m[0m | time: 0.169s
| SGD | epoch: 035 | loss: 0.14695 - R2: 1.0011 -- iter: 1152/1168
Training Step: 665  | total loss: [1m[32m0.14503[0m[0m | time: 1.175s
| SGD | epoch: 035 | loss: 0.14503 - R2: 1.0003 | val_loss: 0.16063 - val_acc: 0

Training Step: 717  | total loss: [1m[32m0.15614[0m[0m | time: 0.121s
| SGD | epoch: 038 | loss: 0.15614 - R2: 0.9982 -- iter: 0896/1168
Training Step: 718  | total loss: [1m[32m0.15888[0m[0m | time: 0.127s
| SGD | epoch: 038 | loss: 0.15888 - R2: 0.9985 -- iter: 0960/1168
Training Step: 719  | total loss: [1m[32m0.15909[0m[0m | time: 0.131s
| SGD | epoch: 038 | loss: 0.15909 - R2: 0.9968 -- iter: 1024/1168
Training Step: 720  | total loss: [1m[32m0.15909[0m[0m | time: 0.142s
| SGD | epoch: 038 | loss: 0.15909 - R2: 0.9968 -- iter: 1088/1168
Training Step: 721  | total loss: [1m[32m0.15928[0m[0m | time: 0.145s
| SGD | epoch: 038 | loss: 0.15928 - R2: 0.9952 -- iter: 1152/1168
Training Step: 722  | total loss: [1m[32m0.16082[0m[0m | time: 1.152s
| SGD | epoch: 038 | loss: 0.16082 - R2: 0.9954 | val_loss: 0.16063 - val_acc: 0.9945 -- iter: 1168/1168
--
Training Step: 723  | total loss: [1m[32m0.16309[0m[0m | time: 0.049s
| SGD | epoch: 039 | loss: 0.16309 - R2

Training Step: 775  | total loss: [1m[32m0.15057[0m[0m | time: 0.138s
| SGD | epoch: 041 | loss: 0.15057 - R2: 1.0007 -- iter: 0960/1168
Training Step: 776  | total loss: [1m[32m0.14815[0m[0m | time: 0.142s
| SGD | epoch: 041 | loss: 0.14815 - R2: 1.0002 -- iter: 1024/1168
Training Step: 777  | total loss: [1m[32m0.14815[0m[0m | time: 0.148s
| SGD | epoch: 041 | loss: 0.14815 - R2: 1.0002 -- iter: 1088/1168
Training Step: 778  | total loss: [1m[32m0.15309[0m[0m | time: 0.151s
| SGD | epoch: 041 | loss: 0.15309 - R2: 0.9997 -- iter: 1152/1168
Training Step: 779  | total loss: [1m[32m0.14835[0m[0m | time: 1.158s
| SGD | epoch: 041 | loss: 0.14835 - R2: 1.0014 | val_loss: 0.16063 - val_acc: 0.9945 -- iter: 1168/1168
--
Training Step: 780  | total loss: [1m[32m0.14835[0m[0m | time: 0.041s
| SGD | epoch: 042 | loss: 0.14835 - R2: 1.0014 -- iter: 0064/1168
Training Step: 781  | total loss: [1m[32m0.14394[0m[0m | time: 0.044s
| SGD | epoch: 042 | loss: 0.14394 - R2

Training Step: 833  | total loss: [1m[32m0.15571[0m[0m | time: 0.139s
| SGD | epoch: 044 | loss: 0.15571 - R2: 0.9966 -- iter: 1024/1168
Training Step: 834  | total loss: [1m[32m0.15870[0m[0m | time: 0.144s
| SGD | epoch: 044 | loss: 0.15870 - R2: 0.9982 -- iter: 1088/1168
Training Step: 835  | total loss: [1m[32m0.15817[0m[0m | time: 0.146s
| SGD | epoch: 044 | loss: 0.15817 - R2: 0.9979 -- iter: 1152/1168
Training Step: 836  | total loss: [1m[32m0.16059[0m[0m | time: 1.152s
| SGD | epoch: 044 | loss: 0.16059 - R2: 0.9986 | val_loss: 0.16063 - val_acc: 0.9945 -- iter: 1168/1168
--
Training Step: 837  | total loss: [1m[32m0.15755[0m[0m | time: 0.056s
| SGD | epoch: 045 | loss: 0.15755 - R2: 0.9995 -- iter: 0064/1168
Training Step: 838  | total loss: [1m[32m0.15985[0m[0m | time: 0.070s
| SGD | epoch: 045 | loss: 0.15985 - R2: 0.9991 -- iter: 0128/1168
Training Step: 839  | total loss: [1m[32m0.16316[0m[0m | time: 0.073s
| SGD | epoch: 045 | loss: 0.16316 - R2

Training Step: 891  | total loss: [1m[32m0.16434[0m[0m | time: 0.109s
| SGD | epoch: 047 | loss: 0.16434 - R2: 0.9996 -- iter: 1088/1168
Training Step: 892  | total loss: [1m[32m0.16508[0m[0m | time: 0.115s
| SGD | epoch: 047 | loss: 0.16508 - R2: 0.9992 -- iter: 1152/1168
Training Step: 893  | total loss: [1m[32m0.16635[0m[0m | time: 1.122s
| SGD | epoch: 047 | loss: 0.16635 - R2: 0.9988 | val_loss: 0.16063 - val_acc: 0.9945 -- iter: 1168/1168
--
Training Step: 894  | total loss: [1m[32m0.15601[0m[0m | time: 0.005s
| SGD | epoch: 048 | loss: 0.15601 - R2: 1.0004 -- iter: 0064/1168
Training Step: 895  | total loss: [1m[32m0.16165[0m[0m | time: 0.010s
| SGD | epoch: 048 | loss: 0.16165 - R2: 0.9987 -- iter: 0128/1168
Training Step: 896  | total loss: [1m[32m0.16165[0m[0m | time: 0.014s
| SGD | epoch: 048 | loss: 0.16165 - R2: 0.9987 -- iter: 0192/1168
Training Step: 897  | total loss: [1m[32m0.16729[0m[0m | time: 0.017s
| SGD | epoch: 048 | loss: 0.16729 - R2

In [None]:
# Make predictions

predictions_huber = best_clf.predict(test)
predictions_DNN = model.predict(test)
predictions_huber = np.exp(predictions_huber)
predictions_DNN = np.exp(predictions_DNN)
predictions_DNN = predictions_DNN.reshape(-1,)

sub = pd.DataFrame({
        "Id": ids,
        "SalePrice": predictions_DNN
    })

sub.to_csv("prices_submission.csv", index=False)
#print(sub)