Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGNN results question #63

Closed
sAviOr287 opened this issue May 4, 2020 · 8 comments
Closed

CGNN results question #63

sAviOr287 opened this issue May 4, 2020 · 8 comments

Comments

@sAviOr287
Copy link

sAviOr287 commented May 4, 2020

Hi,

So I have tried to run the experiments again for the CGNN pairwise experiments.

And I can confirm to get the same results for the Multi, Gauss, Net, Tueb datasets in terms of AUPRC (using 12 different runs to ensemble)
AUPR: 0.95 MULTI
AUPR: 0.80 GAUSS
AUPR: 0.90 NET

However when I look at the acc ie. predicting the actual direction I get:
0.43, 0.46, 0.49 respectively.

I compute the acc by the score

for dataset_name in ['multi', 'gauss', 'net']:
	data, labels = load_dataset(dataset_name)
	res = genfromtxt('results/res2_{}.csv'.format(dataset_name), delimiter=',', skip_header=True)
	idx = 0
	acc = 0
	labels = labels.to_numpy()
	for data_ in res[:, 1]:
		if data_ < 0 and labels[idx] == -1:
			acc += 1
                        idx += 1 # EDIT
		elif data_ > 0 and labels[idx] == 1:
			acc += 1
                        idx += 1 # EDIT
		else:
			idx += 1

	acc /= (res.shape[0]-1)
	print(res.shape[0])
	print('{} ACC : {}'.format(acc, dataset_name))
	aupr, curve = precision_recall(labels[:res.shape[0]], res[:, 1])
	print('AUPR: {}'.format(aupr))

This method also gives me around 74% unweighted on Tueb dataset.

So my question is whether this is expected or whether i should be computing the acc differently or maybe even the ACC doesnt matter?

Thanks for the clarification in advance.

Best

@diviyank
Copy link
Collaborator

diviyank commented May 4, 2020

Hello, the results seem good, the accuracy should follow, could you join a sample of data_ ?
I think the predictions are not in the expected shape..

Best,

@sAviOr287
Copy link
Author

res2_gauss.csv.zip
Hi

I have added the csv file that comes out after training the model.

Thanks for your help!

Best

@sAviOr287
Copy link
Author

Here is also the way I loaded the data. I add this in cdt/data/loader.py

def load_ce_gauss(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Gauss_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Gauss_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels


def load_ce_multi(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Multi_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Multi_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels


def load_ce_net(shuffle=False):
	dirname = os.path.dirname(os.path.realpath(__file__))

	data = read_causal_pairs('{}/resources/CE-Net_pairs.csv'.format(dirname), scale=False)
	labels = pd.read_csv('{}/resources/CE-Net_targets.csv'.format(dirname)).set_index('SampleID')

	if shuffle:
		for i in range(len(data)):
			if random.choice([True, False]):
				labels.iloc[i, 0] = -1
				buffer = data.iloc[i, 0]
				data.iloc[i, 0] = data.iloc[i, 1]
				data.iloc[i, 1] = buffer
	return data, labels

@diviyank
Copy link
Collaborator

diviyank commented May 7, 2020

Hello,
Whoops I forgot to ask if you had the labels as well ?

@sAviOr287
Copy link
Author

oh yeah I have

Archive.zip

which I downloaded from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3757KX

Thanks for the reply

Best

@diviyank
Copy link
Collaborator

diviyank commented May 7, 2020

Thanks for getting back to me quickly,

There seems to be an issue with your accuracy computation ; i got an accuracy of .72 on this dataset:

import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score, accuracy_score

preds = pd.read_csv('res2_gauss.csv')
labels = pd.read_csv('CE-Gauss_targets.csv')

print(labels.shape, preds.shape)
print(labels.columns, preds.columns)

# Returns :(300, 2) (300, 2)
# Returns : Index(['SampleID', 'Target'], dtype='object') Index(['SampleID', 'Predictions'], dtype='object')

average_precision_score(labels.Target, preds.Predictions) ## Equals to AUPR

# Returns :0.8027886920926466

preds.loc[preds.Predictions > 0, 'Predictions'] = 1
preds.loc[preds.Predictions < 0, 'Predictions'] = -1
accuracy_score(labels.Target,preds.Predictions)

# Returns :  0.7233333333333334

From my point of view, accuracy however might not be the best metric for evaluating causal algorithms: The confidence of an algorithm has to be taken into account, thus giving the possibility of not committing into a prediction if the prediction is not certain (Not answering is better that giving a wrong causal direction).

Best regards,
Diviyan

@sAviOr287
Copy link
Author

Thanks a lot
Sorry, I was an idiot ...
I forgot to increment the idx variable

Thanks for your help

Sorry for the inconvenience

@diviyank
Copy link
Collaborator

diviyank commented May 7, 2020

No issues, glad I could help you!
I'll be closing this issue, have a good day !

@diviyank diviyank closed this as completed May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants