NCC got random guess performance on TCEP #45

wpzdm · 2019-10-01T09:37:18Z

I test NCC with half TCEP pairs for training and half for testing, and randomly split training and testing for 100 times.
Code:

tueb, labels = load_tuebingen(shuffle=True)

def test_NCC():
    from sklearn.model_selection import train_test_split
    method = NCC
    print(method)
    m = method()

    accs = []
    for n in range(100):
        X_tr, X_te, y_tr, y_te = train_test_split(tueb, labels, train_size=.5)
        m.fit(X_tr, y_tr, epochs=10000)
        r = m.predict_dataset(X_te)
        acc = np.mean(r.values*y_te.values > 0)
        accs.append(acc)
        print(acc, file=open('ncc_.txt', 'a'))
    print(np.mean(accs), np.std(accs), file=open('ncc_.txt', 'a'))

The average acc of ~60 times is 50.03%.

A first image is overfitting. But I am also running with epochs=500, and there seems no big difference (although the training accs are less like overfitting).

Thank you,
Abel

The text was updated successfully, but these errors were encountered:

diviyank · 2019-10-01T10:09:17Z

My guess is that 50 pairs for training NCC is nowhere near enough, i would suggest using a polynomial generator to generate ~2000 pairs.

from cdt.data import CausalPairGenerator
c = CausalPairGenerator('polynomial')
data, labels = c.generate(2000, 500)

Best,
Diviyan

wpzdm · 2019-10-02T10:06:17Z

Both sample size and training epochs has influence:
When 50 testing vs 50 training, if epochs=500, average acc is ~50% as mentioned, while if epochs=200, average acc goes up to ~55%.
When 1 testing vs all 99 training, if epochs=500, acc is ~65%, but if epochs=1000, acc is ~49%.

I will also try to train on artificial pairs.

Thank you!

diviyank · 2019-10-02T17:14:52Z

When 50 testing vs 50 training, if epochs=500, average acc is ~50% as mentioned, while if epochs=200, average acc goes up to ~55%.

There might be some overfitting hidden here, I'll be waiting for the extensive results on artificial pairs :)

wpzdm · 2019-10-04T09:58:16Z

Hi

I tried to train on 3000 artificial pairs. The testing performance on TCEP is still only slightly better than guess.
And strangely, NCC seems to be overfitting even with only 5 training epoches.

Code (I checked CausalPairGenerator returns pairs with random directions, so I didnt do the shuffling.):

def test_NCC():
    method = NCC
    print(method)
    m = method()

    from cdt.data import CausalPairGenerator
    data0, dirs0 = CausalPairGenerator('polynomial').generate(1000, 500)
    data1, dirs1 = CausalPairGenerator('gp_add').generate(1000, 500)
    data2, dirs2 = CausalPairGenerator('nn').generate(1000, 500)
    data = pd.concat([data0, data1, data2])
    dirs = pd.concat([dirs0, dirs1, dirs2])

    m.fit(data, dirs, epochs=5)
    r = m.predict_dataset(tueb)
    acc = np.mean(r.values * labels.values > 0)

    print(acc)

Output with 1000 Epochs:

Epochs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [2:02:04<00:00,  7.72s/it, Acc=0.983]
 65%|█████████████████████████████████████████████████████████████████████████████████████████▎                                                | 66/102 [00:00<00:00, 299.84it/s]
0.5294117647058824

10 Epochs:

Epochs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:56<00:00,  5.60s/it, Acc=0.845]
 87%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                 | 89/102 [00:00<00:00, 871.85it/s]
0.5490196078431373

5 Epochs:

Epochs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:43<00:00,  8.96s/it, Acc=0.849]
 74%|█████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 75/102 [00:00<00:00, 747.90it/s]
0.5588235294117647

diviyank · 2019-10-14T11:57:48Z

Hi,
Right, I'll look into it.

sAviOr287 · 2020-04-30T02:18:11Z

Hi

has this ever been solved?

Thanks in advance

diviyank · 2020-04-30T08:41:47Z

Hello,
I didn't get an answer from the author, I will get back to the implementation myself.

diviyank added the Investigation Investigation of a possible bug label Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCC got random guess performance on TCEP #45

NCC got random guess performance on TCEP #45

wpzdm commented Oct 1, 2019 •

edited

diviyank commented Oct 1, 2019

wpzdm commented Oct 2, 2019

diviyank commented Oct 2, 2019

wpzdm commented Oct 4, 2019 •

edited

diviyank commented Oct 14, 2019

sAviOr287 commented Apr 30, 2020

diviyank commented Apr 30, 2020

NCC got random guess performance on TCEP #45

NCC got random guess performance on TCEP #45

Comments

wpzdm commented Oct 1, 2019 • edited

diviyank commented Oct 1, 2019

wpzdm commented Oct 2, 2019

diviyank commented Oct 2, 2019

wpzdm commented Oct 4, 2019 • edited

diviyank commented Oct 14, 2019

sAviOr287 commented Apr 30, 2020

diviyank commented Apr 30, 2020

wpzdm commented Oct 1, 2019 •

edited

wpzdm commented Oct 4, 2019 •

edited