CGNN results question #63

sAviOr287 · 2020-05-04T06:43:13Z

Hi,

So I have tried to run the experiments again for the CGNN pairwise experiments.

And I can confirm to get the same results for the Multi, Gauss, Net, Tueb datasets in terms of AUPRC (using 12 different runs to ensemble)
AUPR: 0.95 MULTI
AUPR: 0.80 GAUSS
AUPR: 0.90 NET

However when I look at the acc ie. predicting the actual direction I get:
0.43, 0.46, 0.49 respectively.

I compute the acc by the score

for dataset_name in ['multi', 'gauss', 'net']:
	data, labels = load_dataset(dataset_name)
	res = genfromtxt('results/res2_{}.csv'.format(dataset_name), delimiter=',', skip_header=True)
	idx = 0
	acc = 0
	labels = labels.to_numpy()
	for data_ in res[:, 1]:
		if data_ < 0 and labels[idx] == -1:
			acc += 1
                        idx += 1 # EDIT
		elif data_ > 0 and labels[idx] == 1:
			acc += 1
                        idx += 1 # EDIT
		else:
			idx += 1

	acc /= (res.shape[0]-1)
	print(res.shape[0])
	print('{} ACC : {}'.format(acc, dataset_name))
	aupr, curve = precision_recall(labels[:res.shape[0]], res[:, 1])
	print('AUPR: {}'.format(aupr))

This method also gives me around 74% unweighted on Tueb dataset.

So my question is whether this is expected or whether i should be computing the acc differently or maybe even the ACC doesnt matter?

Thanks for the clarification in advance.

Best

The text was updated successfully, but these errors were encountered:

diviyank · 2020-05-04T11:17:51Z

Hello, the results seem good, the accuracy should follow, could you join a sample of data_ ?
I think the predictions are not in the expected shape..

Best,

sAviOr287 · 2020-05-04T11:54:03Z

res2_gauss.csv.zip
Hi

I have added the csv file that comes out after training the model.

Thanks for your help!

Best

sAviOr287 · 2020-05-04T11:55:36Z

Here is also the way I loaded the data. I add this in cdt/data/loader.py

def load_ce_gauss(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Gauss_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Gauss_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels


def load_ce_multi(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Multi_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Multi_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels


def load_ce_net(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Net_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Net_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels

diviyank · 2020-05-07T13:38:36Z

Hello,
Whoops I forgot to ask if you had the labels as well ?

sAviOr287 · 2020-05-07T13:43:35Z

oh yeah I have

Archive.zip

which I downloaded from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3757KX

Thanks for the reply

Best

diviyank · 2020-05-07T14:07:06Z

Thanks for getting back to me quickly,

There seems to be an issue with your accuracy computation ; i got an accuracy of .72 on this dataset:

import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score, accuracy_score

preds = pd.read_csv('res2_gauss.csv')
labels = pd.read_csv('CE-Gauss_targets.csv')

print(labels.shape, preds.shape)
print(labels.columns, preds.columns)

# Returns :(300, 2) (300, 2)
# Returns : Index(['SampleID', 'Target'], dtype='object') Index(['SampleID', 'Predictions'], dtype='object')

average_precision_score(labels.Target, preds.Predictions) ## Equals to AUPR

# Returns :0.8027886920926466

preds.loc[preds.Predictions > 0, 'Predictions'] = 1
preds.loc[preds.Predictions < 0, 'Predictions'] = -1
accuracy_score(labels.Target,preds.Predictions)

# Returns :  0.7233333333333334

From my point of view, accuracy however might not be the best metric for evaluating causal algorithms: The confidence of an algorithm has to be taken into account, thus giving the possibility of not committing into a prediction if the prediction is not certain (Not answering is better that giving a wrong causal direction).

Best regards,
Diviyan

sAviOr287 · 2020-05-07T14:33:45Z

Thanks a lot
Sorry, I was an idiot ...
I forgot to increment the idx variable

Thanks for your help

Sorry for the inconvenience

diviyank · 2020-05-07T14:41:11Z

No issues, glad I could help you!
I'll be closing this issue, have a good day !

diviyank closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CGNN results question #63

CGNN results question #63

sAviOr287 commented May 4, 2020 •

edited

diviyank commented May 4, 2020

sAviOr287 commented May 4, 2020

sAviOr287 commented May 4, 2020

diviyank commented May 7, 2020 •

edited

sAviOr287 commented May 7, 2020

diviyank commented May 7, 2020

sAviOr287 commented May 7, 2020

diviyank commented May 7, 2020

CGNN results question #63

CGNN results question #63

Comments

sAviOr287 commented May 4, 2020 • edited

diviyank commented May 4, 2020

sAviOr287 commented May 4, 2020

sAviOr287 commented May 4, 2020

diviyank commented May 7, 2020 • edited

sAviOr287 commented May 7, 2020

diviyank commented May 7, 2020

sAviOr287 commented May 7, 2020

diviyank commented May 7, 2020

sAviOr287 commented May 4, 2020 •

edited

diviyank commented May 7, 2020 •

edited