Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to replicate the HDO distribution plot? #2

Closed
wenjiyanli opened this issue Mar 6, 2024 · 4 comments
Closed

how to replicate the HDO distribution plot? #2

wenjiyanli opened this issue Mar 6, 2024 · 4 comments

Comments

@wenjiyanli
Copy link

Hello,

I'm attempting to create the HDO distribution graph as depicted in Figure 4, specifically for the Cora dataset. I have acquired the embedding results best_emb from the raw HGCN model and am in the process of calculating the hyperbolic distances from the origin. However, the graph I'm generating does not align well with the one presented in the article. Below is the code I'm using; could you please assist me in pinpointing any possible issues? I would greatly appreciate your help.

`manifold = geoopt.PoincareBall()
origin = torch.zeros(2708, 8)
hyperbolic_distance = manifold.dist(best_emb, origin)
print("Hyperbolic distance to the origin:", hyperbolic_distance)

import networkx as nx
import matplotlib.pyplot as plt

hdo_values = hyperbolic_distance.cpu().detach().numpy()
hdo_values.shape #(2708,)
hdo_values.mean() #5.1924605
hdo_values.min() #2.5806413
hdo_values.max() #6.1599727, especially the max value is very big!

plt.figure(figsize=(10, 6))
plt.hist(hdo_values, color='blue', alpha=0.5)
plt.title('HDO Distribution')
plt.xlabel('HDO Value')
plt.ylabel('Ratio')
plt.show()

`

Best regards.

@marlin-codes
Copy link
Owner

marlin-codes commented Mar 10, 2024

Thanks for your interest! The following is our code for the HDO figure, for your information.

from os.path import expanduser
import matplotlib.font_manager as font_manager
fontpath = expanduser('~/.local/share/fonts/LinLibertine_R.ttf')
prop = font_manager.FontProperties(fname=fontpath)
from matplotlib.pyplot import MultipleLocator

def one(f):
    return '{:.1f}'.format(f)

def plot_HDO_distribution():
    import numpy as np
    import matplotlib.pyplot as plt
    def get_colors(length, i):
        if i == 0:
            return plt.cm.plasma(np.linspace(-0.5, 1, length))
        else:
            return plt.cm.cool(np.linspace(-0.5, 1, length))

    filepath = './results/distance_curv/icml23/dist_data_final/'
    for dim in [16, 64, 256]:
        for dataset in ['cora', 'citeseer', 'disease_nc', 'airport']:
            data0 = np.loadtxt(filepath + '{}/{}_{}/{}_HDO0.txt'.format(dataset, dataset, dim, dataset))
            data1 = np.loadtxt(filepath + '{}/{}_{}/{}_HDO1.txt'.format(dataset, dataset, dim, dataset))

            key_nodes0 = np.loadtxt(filepath + '{}/{}_{}/{}_d20.txt'.format(dataset, dataset, dim, dataset))
            key_nodes1 = np.loadtxt(filepath + '{}/{}_{}/{}_d21.txt'.format(dataset, dataset, dim, dataset))

            dist0 = np.expand_dims(data0, 1)
            dist1 = np.expand_dims(data1, 1)
            minvalue0, center0, meanvalue0, maxvalue0 = key_nodes0
            minvalue1, center1, meanvalue1, maxvalue1 = key_nodes1
            plt.xlim([0, 8.5])
            plt.ylim([0, 0.5])
            margin = 0.1
            freqs0 = []
            freqs1 = []
            for r in np.arange(0, 7, margin):
                freqs0.append(np.where((dist0 > r) & (dist0 < r + margin))[0].shape[0])
                freqs1.append(np.where((dist1 > r) & (dist1 < r + margin))[0].shape[0])
            # colors0 = get_colors(len(freqs0), 1)
            # colors1 = get_colors(len(freqs0), 0)
            plt.bar(np.arange(0, 7, margin), np.array(freqs0) / np.sum(freqs0), color="#1F77B4", edgecolor="white",
                    width=margin, alpha=0.8, label='HGCN')
            plt.bar(np.arange(0, 7, margin), np.array(freqs1) / np.sum(freqs1), color="#FE7E0D", edgecolor="white",
                    width=margin, alpha=0.9, label='Ours')
            plt.legend(loc='upper right', prop={'size': 15})
            row_labels = ['STATS', 'ROOT', 'MIN', 'MEAN', 'MAX']  # ROOT/ HC
            table_vals = [['HGCN', 'Ours'],
                          [one(center0), one(center1)],
                          [one(minvalue0), one(minvalue1)],
                          [one(meanvalue0), one(meanvalue1)],
                          [one(maxvalue0), one(maxvalue1)]
                          ]
            the_table = plt.table(cellText=table_vals, colWidths=[0.12] * 2,
                                  rowLabels=row_labels,
                                  colLoc='center', rowLoc='left', cellLoc='center',
                                  edges='closed',
                                  bbox=(0.20, 0.55, 0.27, 0.4))
            the_table.auto_set_font_size(False)
            the_table.set_fontsize(14)
            plt.title('{}'.format(dataset.capitalize()) + '($\mathcal{H}^{%d}$)' % dim, fontproperties=prop,
                      fontsize=20)
            plt.yticks(fontproperties=prop, size=20)
            plt.xticks(fontproperties=prop, size=20)
            ax = plt.gca()
            x_major_locator = MultipleLocator(1)
            ax.xaxis.set_major_locator(x_major_locator)
            plt.savefig('./results/icml2023/hdo/pdf/{}_{}.pdf'.format(dataset, dim), bbox_inches='tight', pad_inches=0)
            plt.clf()

The HDO is computed by

if self.manifold.name == 'PoincareBall':
     d2 = self.manifold.dist0(embeddings, c=c).mean()

@wenjiyanli
Copy link
Author

It is very clear. I greatly appreciate your help!

@wenjiyanli
Copy link
Author

I hope I'm not being too bothersome with another question, but could you please provide more details on how to compute 'self.d0', 'self.d2', and 'self.hdo'? I guess that 'd2' below might be used to compute 'self.hdo', but I'm unsure about the methods for calculating 'self.d0' and 'self.d2' in the NCModel. Could you clarify the differences between them for me?"

if self.manifold.name == 'PoincareBall':
     d2 = self.manifold.dist0(embeddings, c=c).mean()
class NCModel(BaseModel):
    def __init__(self, args):
        super(NCModel, self).__init__(args)
        self.args = args
        self.decoder = model2decoder[args.model](self.c, args)
        if args.n_classes > 2:
            self.f1_average = 'micro'
        else:
            self.f1_average = 'binary'
        if args.pos_weight:
            self.weights = torch.Tensor([1., 1. / data['labels'][idx_train].mean()])
        else:
            self.weights = torch.Tensor([1.] * args.n_classes)
        if not args.cuda == -1:
            self.weights = self.weights.to(args.device)
        self.d0 = []
        self.d2 = []
        self.hdo = []
        self.center = None
        self.activation = lambda x: x

I would greatly appreciate any additional details you can provide. Thank you for your patience and assistance.

@marlin-codes
Copy link
Owner

marlin-codes commented Mar 10, 2024

Thanks for your question. The three variables are list:

  • d0 records the root (HC point) distance to the origin and its length equal to the number of real epochs
  • d2 records the mean distance to the origin and its length equal to the number of real epochs
  • hdo records all node distance to the origin and its length equal to the number of nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants