# Disjoint-domain network

### Ethan Blackwood
### October 23, 2020

**Goal**: Train and analyze the network in Rogers/McClelland 2008 with 4 disjoint domains (Figures R3-R5), which learns to extract the feature of being more or less similar to other items in the same domain, across the 4 domains which have no items, contexts or attributes in common.

In [2]:
import numpy as np
from datetime import datetime as dt
import importlib

import disjoint_domain as dd

import ddnet
#importlib.reload(ddnet)

Common training procedure:

In [3]:
def train_n_dd_nets(n=36, run_type='', net_params=None, train_params=None):

    # get some defaults
    ctx_per_domain, n_domains, n_items, n_ctx, attrs_per_context = dd.get_net_dims()
    device, torchfp = dd.init_torch()
    
    net_defaults = {
        'ctx_per_domain': ctx_per_domain,
        'attrs_per_context': attrs_per_context,
        'n_domains': n_domains,
        'device': device,
        'torchfp': torchfp
    }
    if net_params is None:
        net_params = {}
    net_params = {**net_defaults, **net_params}
    if net_params['device'].type == 'cuda':
        print('Using CUDA')
    else:
        print('Using CPU')
    
    train_defaults = {
        'lr': 0.01,
        'scheduler': None,
        'num_epochs': 3001,
        'batch_size': 16,
        'report_freq': 50,
        'snap_freq': 50,
        'snap_freq_scale': 'lin',
        'holdout_testing': 'none',
        'test_thresh': 0.97,
        'test_max_epochs': 10000,
        'reports_per_test': 4,
        'do_combo_testing': True
    }
    if train_params is None:
        train_params = {}
    train_params = {**train_defaults, **train_params}
    
    snaps_all = []
    reports_all = []
    parameters_all = []
    ys_all = []
    
    for i in range(n):
        print(f'Training Iteration {i+1}')
        print('---------------------')
        
        net = ddnet.DisjointDomainNet(**net_params)
        res = net.do_training(**train_params)
        
        snaps_all.append(res['snaps'])
        reports_all.append(res['reports'])
        if 'params' in res:
            parameters_all.append(res['params'])
            
        ys_all.append(net.y.cpu().numpy())

        print('')

    snaps = {}
    for snap_type in snaps_all[0].keys():
        snaps[snap_type] = np.stack([snaps_one[snap_type] for snaps_one in snaps_all])
        
    reports = {}
    for report_type in reports_all[0].keys():
        reports[report_type] = np.stack([reports_one[report_type] for reports_one in reports_all])
        
    if len(parameters_all) > 0:
        parameters = {}
        for param_type in parameters_all[0].keys():
            parameters[param_type] = np.stack([params_one[param_type] for params_one in parameters_all])
    else:
        parameters = None
        
    ys = np.stack(ys_all)
    
    if run_type != '':
        run_type += '_'

    np.savez(f'data/{run_type}dd_res_{dt.now():%Y-%m-%d_%H-%M-%S}',
             snapshots=snaps, reports=reports, ys=ys, net_params=net_params,
             train_params=train_params, parameters=parameters)

Base network: test on one combination of item/context for each domain.

In [5]:
train_n_dd_nets()

Using CUDA
Training Iteration 1
---------------------
Holding out: A4*/A1, B2*/B4, C6@/C3, D5@/D2
Epoch    0 end: loss = 208.480, acc = 0.586, test acc = 0.966
Epoch   50 end: loss = 110.484, acc = 0.965, test acc = 0.969
Epoch  100 end: loss = 109.579, acc = 0.968, test acc = 0.969
Epoch  150 end: loss = 109.093, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 108.801, acc = 0.969, test acc = 0.969
Epoch  250 end: loss = 108.914, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 108.707, acc = 0.969, test acc = 0.969
Epoch  350 end: loss = 108.652, acc = 0.969, test acc = 0.969
Epoch  400 end: loss = 108.456, acc = 0.969, test acc = 0.969
Epoch  450 end: loss =  94.005, acc = 0.929, test acc = 0.857
Epoch  500 end: loss =  82.498, acc = 0.892, test acc = 0.858
Epoch  550 end: loss =  74.956, acc = 0.871, test acc = 0.837
Epoch  600 end: loss =  67.817, acc = 0.876, test acc = 0.823
Epoch  650 end: loss =  62.717, acc = 0.877, test acc = 0.812
Epoch  700 end: loss =  58.303, ac

Network with merged representation layer (items & contexts all go to all rep units)

In [6]:
train_n_dd_nets(run_type='merged_repr', net_params={'merged_repr': True})

Using CUDA
Training Iteration 1
---------------------
Holding out: A8$/A4, B5@/B1, C4*/C2, D6@/D3
Epoch    0 end: loss = 208.998, acc = 0.580, test acc = 0.959
Epoch   50 end: loss = 110.484, acc = 0.967, test acc = 0.969
Epoch  100 end: loss = 109.628, acc = 0.969, test acc = 0.969
Epoch  150 end: loss = 109.076, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 108.980, acc = 0.969, test acc = 0.969
Epoch  250 end: loss = 108.994, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 108.914, acc = 0.969, test acc = 0.969
Epoch  350 end: loss = 108.637, acc = 0.969, test acc = 0.969
Epoch  400 end: loss =  94.725, acc = 0.932, test acc = 0.884
Epoch  450 end: loss =  83.528, acc = 0.882, test acc = 0.832
Epoch  500 end: loss =  72.968, acc = 0.875, test acc = 0.843
Epoch  550 end: loss =  64.154, acc = 0.875, test acc = 0.840
Epoch  600 end: loss =  57.884, acc = 0.882, test acc = 0.843
Epoch  650 end: loss =  53.357, acc = 0.899, test acc = 0.856
Epoch  700 end: loss =  49.506, ac

In [5]:
# Try things a bit more systematically
scaling_params = {
    '': {},
    '_half_repr': {'item_repr_units': 8, 'ctx_repr_units': 8},
    '_half_hidden': {'hidden_units': 16},
    '_half_both': {'item_repr_units': 8, 'ctx_repr_units': 8, 'hidden_units': 8}
}

train_params = {'num_epochs': 8001}

for tag, scaling in scaling_params.items():
    train_n_dd_nets(run_type=f'normal{tag}', net_params=scaling, train_params=train_params)
    train_n_dd_nets(run_type=f'merged_repr{tag}', net_params={**scaling, 'merged_repr': True}, train_params=train_params)
    train_n_dd_nets(run_type=f'no_item_repr{tag}', net_params={**scaling, 'use_item_repr': False}, train_params=train_params)        
    train_n_dd_nets(run_type=f'no_ctx_repr{tag}', net_params={**scaling, 'use_ctx_repr': False}, train_params=train_params)
    
    if tag not in ['_half_repr', '_half_both']: # these would be pointless
        train_n_dd_nets(run_type=f'no_repr{tag}', net_params={**scaling, 'use_item_repr': False, 'use_ctx_repr': False}, train_params=train_params)

Using CUDA
Training Iteration 1
---------------------
Holding out: A4*/A3, B5@/B1, C7$/C2, D1*/D4
Epoch    0 end: loss = 208.669, weighted acc = 0.301, test weighted acc = 0.500
Epoch   50 end: loss = 110.880, weighted acc = 0.499, test weighted acc = 0.500
Epoch  100 end: loss = 109.747, weighted acc = 0.500, test weighted acc = 0.500
Epoch  150 end: loss = 109.626, weighted acc = 0.500, test weighted acc = 0.500
Epoch  200 end: loss = 109.252, weighted acc = 0.500, test weighted acc = 0.500
Epoch  250 end: loss = 109.207, weighted acc = 0.500, test weighted acc = 0.500
Epoch  300 end: loss = 109.149, weighted acc = 0.500, test weighted acc = 0.500
Epoch  350 end: loss = 109.011, weighted acc = 0.500, test weighted acc = 0.500
Epoch  400 end: loss =  95.438, weighted acc = 0.481, test weighted acc = 0.455
Epoch  450 end: loss =  84.017, weighted acc = 0.464, test weighted acc = 0.440
Epoch  500 end: loss =  76.501, weighted acc = 0.449, test weighted acc = 0.442
Epoch  550 end: loss =

In [8]:
# Redo just the beginning of training while saving parameters to look more closely
train_params = {'num_epochs': 500, 'report_freq': 10, 'snap_freq': 10, 'param_snapshots': True}
tag = 'short_save_params'

train_n_dd_nets(run_type=tag, train_params=train_params)
train_n_dd_nets(run_type=tag + '_merged_repr', net_params={'merged_repr': True}, train_params=train_params)
train_n_dd_nets(run_type=tag + '_no_item_repr', net_params={'use_item_repr': False}, train_params=train_params)
train_n_dd_nets(run_type=tag + '_no_ctx_repr', net_params={'use_ctx_repr': False}, train_params=train_params)
train_n_dd_nets(run_type=tag + '_no_repr', net_params={'use_item_repr': False, 'use_ctx_repr': False}, train_params=train_params)

Using CUDA
Training Iteration 1
---------------------
Holding out: A3*/A4, B1*/B1, C7$/C2, D2*/D3
Epoch   0 end: loss = 209.135, acc = 0.586, test acc = 0.969
Epoch  10 end: loss = 111.290, acc = 0.966, test acc = 0.965
Epoch  20 end: loss = 111.446, acc = 0.966, test acc = 0.969
Epoch  30 end: loss = 111.499, acc = 0.967, test acc = 0.969
Epoch  40 end: loss = 111.058, acc = 0.969, test acc = 0.969
Epoch  50 end: loss = 110.541, acc = 0.968, test acc = 0.969
Epoch  60 end: loss = 110.315, acc = 0.968, test acc = 0.969
Epoch  70 end: loss = 110.249, acc = 0.969, test acc = 0.969
Epoch  80 end: loss = 110.399, acc = 0.968, test acc = 0.969
Epoch  90 end: loss = 110.235, acc = 0.969, test acc = 0.969
Epoch 100 end: loss = 109.930, acc = 0.969, test acc = 0.969
Epoch 110 end: loss = 109.974, acc = 0.969, test acc = 0.969
Epoch 120 end: loss = 109.691, acc = 0.969, test acc = 0.969
Epoch 130 end: loss = 109.771, acc = 0.969, test acc = 0.969
Epoch 140 end: loss = 109.589, acc = 0.969, test

Do no-representation run without reducing # of units; also, save both hidden and rep layer snapshots

In [4]:
train_params = {'num_epochs': 5001}

train_n_dd_nets(train_params=train_params)
train_n_dd_nets(run_type='no_repr_reallocate',
                net_params={'use_item_repr': False, 'use_ctx_repr': False, 'hidden_units': 64},
                train_params=train_params)

train_n_dd_nets(run_type='merged_repr', net_params={'merged_repr': True},
                train_params=train_params)
train_n_dd_nets(run_type='no_item_repr_reallocate', train_params=train_params,
                net_params={'use_item_repr': False, 'hidden_units': 48})
train_n_dd_nets(run_type='no_ctx_repr_reallocate', train_params=train_params,
                net_params={'use_ctx_repr': False, 'hidden_units': 48})

Using CUDA
Training Iteration 1
---------------------
Holding out: A4*/A1, B7$/B3, C3*/C4, D2*/D2
Epoch    0 end: loss = 208.647, weighted acc = 0.300, test weighted acc = 0.498
Epoch   50 end: loss = 110.713, weighted acc = 0.500, test weighted acc = 0.500
Epoch  100 end: loss = 109.941, weighted acc = 0.500, test weighted acc = 0.500
Epoch  150 end: loss = 109.435, weighted acc = 0.500, test weighted acc = 0.500
Epoch  200 end: loss = 109.246, weighted acc = 0.500, test weighted acc = 0.500
Epoch  250 end: loss = 109.092, weighted acc = 0.500, test weighted acc = 0.500
Epoch  300 end: loss = 108.993, weighted acc = 0.500, test weighted acc = 0.500
Epoch  350 end: loss = 109.046, weighted acc = 0.500, test weighted acc = 0.500
Epoch  400 end: loss =  97.931, weighted acc = 0.487, test weighted acc = 0.473
Epoch  450 end: loss =  86.011, weighted acc = 0.469, test weighted acc = 0.448
Epoch  500 end: loss =  77.892, weighted acc = 0.446, test weighted acc = 0.439
Epoch  550 end: loss =

Try smaller item representation to see if it puts off using domain information for longer

In [3]:
train_n_dd_nets(run_type='small_item_repr', net_params={'item_repr_ratio': 0.25})

Using CUDA
Training Iteration 1
---------------------
Holding out: A1*/A2, B6@/B3, C5@/C4, D2*/D1
Epoch    0 end: loss = 223.386, acc = 0.504, test acc = 0.964
Epoch   50 end: loss = 110.388, acc = 0.969, test acc = 0.969
Epoch  100 end: loss = 109.739, acc = 0.969, test acc = 0.969
Epoch  150 end: loss = 109.454, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 109.089, acc = 0.969, test acc = 0.969
Epoch  250 end: loss = 109.008, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 108.858, acc = 0.969, test acc = 0.969
Epoch  350 end: loss = 108.840, acc = 0.969, test acc = 0.969
Epoch  400 end: loss = 108.068, acc = 0.969, test acc = 0.969
Epoch  450 end: loss =  89.054, acc = 0.917, test acc = 0.852
Epoch  500 end: loss =  78.860, acc = 0.881, test acc = 0.855
Epoch  550 end: loss =  71.151, acc = 0.867, test acc = 0.831
Epoch  600 end: loss =  65.392, acc = 0.861, test acc = 0.827
Epoch  650 end: loss =  59.928, acc = 0.872, test acc = 0.811
Epoch  700 end: loss =  54.811, ac

In [4]:
train_n_dd_nets(run_type='all_ratios_0.5', 
                net_params={'item_repr_ratio': 0.5, 'ctx_repr_ratio': 0.5, 'hidden_ratio': 0.5},
                train_params={'num_epochs': 4001})

Using CUDA
Training Iteration 1
---------------------
Holding out: A5@/A2, B2*/B3, C7$/C1, D6@/D4
Epoch    0 end: loss = 265.586, acc = 0.142, test acc = 0.917
Epoch   50 end: loss = 109.260, acc = 0.969, test acc = 0.969
Epoch  100 end: loss = 109.165, acc = 0.969, test acc = 0.969
Epoch  150 end: loss = 109.171, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 109.029, acc = 0.969, test acc = 0.969
Epoch  250 end: loss = 108.894, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 108.814, acc = 0.969, test acc = 0.969
Epoch  350 end: loss = 108.800, acc = 0.969, test acc = 0.969
Epoch  400 end: loss = 108.552, acc = 0.969, test acc = 0.969
Epoch  450 end: loss = 108.465, acc = 0.969, test acc = 0.969
Epoch  500 end: loss = 108.415, acc = 0.969, test acc = 0.969
Epoch  550 end: loss = 108.484, acc = 0.969, test acc = 0.969
Epoch  600 end: loss = 108.463, acc = 0.969, test acc = 0.969
Epoch  650 end: loss = 108.251, acc = 0.969, test acc = 0.969
Epoch  700 end: loss =  99.507, ac

Train for longer to try to look at potential re-emergence of separation by type

In [6]:
train_n_dd_nets(run_type='longer', train_params={'num_epochs': 6001, 'snap_freq': 100, 'report_freq': 100})

Using CUDA
Training Iteration 1
---------------------
Holding out: A8$/A2, B6@/B1, C4*/C3, D5@/D4
Epoch    0 end: loss = 208.962, acc = 0.579, test acc = 0.969
Epoch  100 end: loss = 109.327, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 108.935, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 108.636, acc = 0.969, test acc = 0.969
Epoch  400 end: loss = 108.309, acc = 0.969, test acc = 0.969
Epoch  500 end: loss =  79.766, acc = 0.883, test acc = 0.847
Epoch  600 end: loss =  65.169, acc = 0.864, test acc = 0.820
Epoch  700 end: loss =  55.824, acc = 0.884, test acc = 0.822
Epoch  800 end: loss =  49.262, acc = 0.895, test acc = 0.822
Epoch  900 end: loss =  47.621, acc = 0.898, test acc = 0.808
Epoch 1000 end: loss =  46.554, acc = 0.900, test acc = 0.831
Epoch 1100 end: loss =  44.418, acc = 0.910, test acc = 0.876
Epoch 1200 end: loss =  43.643, acc = 0.910, test acc = 0.876
Epoch 1300 end: loss =  43.341, acc = 0.910, test acc = 0.876
Epoch 1400 end: loss =  41.549, ac

In [8]:
train_n_dd_nets(run_type='half_hidden_longer', net_params={'hidden_ratio': 0.5},
                train_params={'num_epochs': 6001, 'snap_freq': 100, 'report_freq': 100})

Using CUDA
Training Iteration 1
---------------------
Holding out: A8$/A2, B2*/B1, C6@/C4, D3*/D3
Epoch    0 end: loss = 245.821, acc = 0.316, test acc = 0.960
Epoch  100 end: loss = 109.683, acc = 0.969, test acc = 0.969
Epoch  200 end: loss = 109.418, acc = 0.969, test acc = 0.969
Epoch  300 end: loss = 109.033, acc = 0.969, test acc = 0.969
Epoch  400 end: loss = 108.874, acc = 0.969, test acc = 0.969
Epoch  500 end: loss = 108.723, acc = 0.969, test acc = 0.969
Epoch  600 end: loss =  90.916, acc = 0.924, test acc = 0.855
Epoch  700 end: loss =  77.165, acc = 0.871, test acc = 0.832
Epoch  800 end: loss =  69.089, acc = 0.859, test acc = 0.847
Epoch  900 end: loss =  64.396, acc = 0.862, test acc = 0.802
Epoch 1000 end: loss =  62.582, acc = 0.860, test acc = 0.810
Epoch 1100 end: loss =  59.548, acc = 0.873, test acc = 0.810
Epoch 1200 end: loss =  58.125, acc = 0.871, test acc = 0.800
Epoch 1300 end: loss =  56.153, acc = 0.873, test acc = 0.801
Epoch 1400 end: loss =  55.164, ac

Troubleshooting, try with no testing

In [None]:
train_n_dd_nets(run_type='no_test', train_params={'do_combo_testing': False,
                                                  'holdout_testing': 'none'})

Base network with hold-out testing

In [None]:
train_n_dd_nets(run_type='ho_both', train_params={'holdout_testing': 'all'})

Try holding out only item or context at a time

In [None]:
train_n_dd_nets(run_type='ho_item', train_params={'holdout_testing': 'item'})

In [None]:
train_n_dd_nets(run_type='ho_context', train_params={'holdout_testing': 'context'})

Try simple item tree, to see if separation b/w item classes still happens when they're all equally "typical"

Result: actually creates a really different pattern - no similarity b/w subgroups of different domains that exceeds similarity within a domain.

In [None]:
train_n_dd_nets(run_type='simplified_no_holdout', net_params={'simple_item_tree': True},
                train_params={'holdout_testing': 'none'})

In [13]:
train_n_dd_nets(run_type='simplified_no_holdout', net_params={'simple_item_tree': True},
                train_params={'holdout_testing': 'none'})

Training Iteration 1
---------------------
Epoch    0 end: loss = 205.913, acc = 0.598
Epoch  100 end: loss = 109.594, acc = 0.969
Epoch  200 end: loss = 108.824, acc = 0.969
Epoch  300 end: loss = 108.700, acc = 0.969
Epoch  400 end: loss = 108.665, acc = 0.969
Epoch  500 end: loss = 108.466, acc = 0.969
Epoch  600 end: loss =  97.810, acc = 0.950
Epoch  700 end: loss =  77.425, acc = 0.850
Epoch  800 end: loss =  68.138, acc = 0.868
Epoch  900 end: loss =  55.826, acc = 0.895
Epoch 1000 end: loss =  47.401, acc = 0.899
Epoch 1100 end: loss =  37.915, acc = 0.917
Epoch 1200 end: loss =  29.719, acc = 0.936
Epoch 1300 end: loss =  25.079, acc = 0.944
Epoch 1400 end: loss =  22.493, acc = 0.952
Epoch 1500 end: loss =  20.901, acc = 0.957
Epoch 1600 end: loss =  19.798, acc = 0.959
Epoch 1700 end: loss =  19.125, acc = 0.961
Epoch 1800 end: loss =  18.431, acc = 0.963
Epoch 1900 end: loss =  16.261, acc = 0.964
Epoch 2000 end: loss =  14.847, acc = 0.968
Epoch 2100 end: loss =  13.996, a

Base network with hold-out testing

In [None]:
train_n_dd_nets(run_type='ho_both', train_params={'holdout_testing': 'all'})

Try holding out only item or context at a time

In [12]:
train_n_dd_nets(run_type='ho_item', train_params={'holdout_testing': 'item'})

Training Iteration 1
---------------------
Holding out item: A5@
Epoch    0 end: loss = 209.626, acc = 0.579, epochs for new item =   1279
Epoch  100 end: loss = 110.009, acc = 0.969
Epoch  200 end: loss = 109.327, acc = 0.969
Epoch  300 end: loss = 109.071, acc = 0.969
Epoch  400 end: loss = 103.845, acc = 0.968, epochs for new item =   1853
Epoch  500 end: loss =  87.080, acc = 0.901
Epoch  600 end: loss =  70.469, acc = 0.853
Epoch  700 end: loss =  57.883, acc = 0.870
Epoch  800 end: loss =  46.932, acc = 0.894, epochs for new item =   1021
Epoch  900 end: loss =  37.408, acc = 0.913
Epoch 1000 end: loss =  26.754, acc = 0.940
Epoch 1100 end: loss =  22.195, acc = 0.955
Epoch 1200 end: loss =  20.284, acc = 0.958, epochs for new item =    789
Epoch 1300 end: loss =  19.142, acc = 0.963
Epoch 1400 end: loss =  17.945, acc = 0.964
Epoch 1500 end: loss =  14.895, acc = 0.967
Epoch 1600 end: loss =  11.673, acc = 0.972, epochs for new item =    396
Epoch 1700 end: loss =  10.154, acc =

In [13]:
train_n_dd_nets(run_type='ho_context', train_params={'holdout_testing': 'context'})

Training Iteration 1
---------------------
Holding out context: C4
Epoch    0 end: loss = 211.982, acc = 0.569, epochs for new context =   1013
Epoch  100 end: loss = 108.563, acc = 0.968
Epoch  200 end: loss = 107.968, acc = 0.969
Epoch  300 end: loss = 107.663, acc = 0.969
Epoch  400 end: loss = 107.477, acc = 0.969, epochs for new context =    690
Epoch  500 end: loss =  96.449, acc = 0.964
Epoch  600 end: loss =  75.668, acc = 0.864
Epoch  700 end: loss =  57.252, acc = 0.892
Epoch  800 end: loss =  43.294, acc = 0.904, epochs for new context =    383
Epoch  900 end: loss =  31.842, acc = 0.919
Epoch 1000 end: loss =  22.180, acc = 0.943
Epoch 1100 end: loss =  15.670, acc = 0.954
Epoch 1200 end: loss =  11.994, acc = 0.968, epochs for new context =    299
Epoch 1300 end: loss =   9.862, acc = 0.976
Epoch 1400 end: loss =   8.450, acc = 0.981
Epoch 1500 end: loss =   7.399, acc = 0.984
Epoch 1600 end: loss =   6.634, acc = 0.986, epochs for new context =    304
Epoch 1700 end: loss

Try simple item tree, to see if separation b/w item classes still happens when they're all equally "typical"

Result: actually creates a really different pattern - no similarity b/w subgroups of different domains that exceeds similarity within a domain.

In [4]:
train_n_dd_nets(run_type='simplified_no_holdout', net_params={'simple_item_tree': True},
                train_params={'holdout_testing': 'none'})

Training Iteration 1
---------------------
Epoch    0 end: loss = 206.241, acc = 0.590
Epoch  100 end: loss = 109.462, acc = 0.968
Epoch  200 end: loss = 108.880, acc = 0.969
Epoch  300 end: loss = 108.726, acc = 0.969
Epoch  400 end: loss = 108.587, acc = 0.969
Epoch  500 end: loss = 108.492, acc = 0.969
Epoch  600 end: loss = 108.499, acc = 0.969
Epoch  700 end: loss = 108.346, acc = 0.969
Epoch  800 end: loss = 108.407, acc = 0.969
Epoch  900 end: loss = 108.298, acc = 0.969
Epoch 1000 end: loss = 108.313, acc = 0.969
Epoch 1100 end: loss = 108.229, acc = 0.969
Epoch 1200 end: loss = 108.169, acc = 0.969
Epoch 1300 end: loss = 108.237, acc = 0.969
Epoch 1400 end: loss =  88.505, acc = 0.942
Epoch 1500 end: loss =  67.182, acc = 0.872
Epoch 1600 end: loss =  48.577, acc = 0.904
Epoch 1700 end: loss =  43.639, acc = 0.918
Epoch 1800 end: loss =  42.378, acc = 0.919
Epoch 1900 end: loss =  39.936, acc = 0.921
Epoch 2000 end: loss =  35.435, acc = 0.933
Epoch 2100 end: loss =  28.163, a