how to replicate the HDO distribution plot? #2

wenjiyanli · 2024-03-06T14:11:15Z

Hello,

I'm attempting to create the HDO distribution graph as depicted in Figure 4, specifically for the Cora dataset. I have acquired the embedding results best_emb from the raw HGCN model and am in the process of calculating the hyperbolic distances from the origin. However, the graph I'm generating does not align well with the one presented in the article. Below is the code I'm using; could you please assist me in pinpointing any possible issues? I would greatly appreciate your help.

`manifold = geoopt.PoincareBall()
origin = torch.zeros(2708, 8)
hyperbolic_distance = manifold.dist(best_emb, origin)
print("Hyperbolic distance to the origin:", hyperbolic_distance)

import networkx as nx
import matplotlib.pyplot as plt

hdo_values = hyperbolic_distance.cpu().detach().numpy()
hdo_values.shape #(2708,)
hdo_values.mean() #5.1924605
hdo_values.min() #2.5806413
hdo_values.max() #6.1599727, especially the max value is very big!

plt.figure(figsize=(10, 6))
plt.hist(hdo_values, color='blue', alpha=0.5)
plt.title('HDO Distribution')
plt.xlabel('HDO Value')
plt.ylabel('Ratio')
plt.show()

`

Best regards.

The text was updated successfully, but these errors were encountered:

marlin-codes · 2024-03-10T01:48:01Z

Thanks for your interest! The following is our code for the HDO figure, for your information.

from os.path import expanduser
import matplotlib.font_manager as font_manager
fontpath = expanduser('~/.local/share/fonts/LinLibertine_R.ttf')
prop = font_manager.FontProperties(fname=fontpath)
from matplotlib.pyplot import MultipleLocator

def one(f):
    return '{:.1f}'.format(f)

def plot_HDO_distribution():
    import numpy as np
    import matplotlib.pyplot as plt
    def get_colors(length, i):
        if i == 0:
            return plt.cm.plasma(np.linspace(-0.5, 1, length))
        else:
            return plt.cm.cool(np.linspace(-0.5, 1, length))

    filepath = './results/distance_curv/icml23/dist_data_final/'
    for dim in [16, 64, 256]:
        for dataset in ['cora', 'citeseer', 'disease_nc', 'airport']:
            data0 = np.loadtxt(filepath + '{}/{}_{}/{}_HDO0.txt'.format(dataset, dataset, dim, dataset))
            data1 = np.loadtxt(filepath + '{}/{}_{}/{}_HDO1.txt'.format(dataset, dataset, dim, dataset))

            key_nodes0 = np.loadtxt(filepath + '{}/{}_{}/{}_d20.txt'.format(dataset, dataset, dim, dataset))
            key_nodes1 = np.loadtxt(filepath + '{}/{}_{}/{}_d21.txt'.format(dataset, dataset, dim, dataset))

            dist0 = np.expand_dims(data0, 1)
            dist1 = np.expand_dims(data1, 1)
            minvalue0, center0, meanvalue0, maxvalue0 = key_nodes0
            minvalue1, center1, meanvalue1, maxvalue1 = key_nodes1
            plt.xlim([0, 8.5])
            plt.ylim([0, 0.5])
            margin = 0.1
            freqs0 = []
            freqs1 = []
            for r in np.arange(0, 7, margin):
                freqs0.append(np.where((dist0 > r) & (dist0 < r + margin))[0].shape[0])
                freqs1.append(np.where((dist1 > r) & (dist1 < r + margin))[0].shape[0])
            # colors0 = get_colors(len(freqs0), 1)
            # colors1 = get_colors(len(freqs0), 0)
            plt.bar(np.arange(0, 7, margin), np.array(freqs0) / np.sum(freqs0), color="#1F77B4", edgecolor="white",
                    width=margin, alpha=0.8, label='HGCN')
            plt.bar(np.arange(0, 7, margin), np.array(freqs1) / np.sum(freqs1), color="#FE7E0D", edgecolor="white",
                    width=margin, alpha=0.9, label='Ours')
            plt.legend(loc='upper right', prop={'size': 15})
            row_labels = ['STATS', 'ROOT', 'MIN', 'MEAN', 'MAX']  # ROOT/ HC
            table_vals = [['HGCN', 'Ours'],
                          [one(center0), one(center1)],
                          [one(minvalue0), one(minvalue1)],
                          [one(meanvalue0), one(meanvalue1)],
                          [one(maxvalue0), one(maxvalue1)]
                          ]
            the_table = plt.table(cellText=table_vals, colWidths=[0.12] * 2,
                                  rowLabels=row_labels,
                                  colLoc='center', rowLoc='left', cellLoc='center',
                                  edges='closed',
                                  bbox=(0.20, 0.55, 0.27, 0.4))
            the_table.auto_set_font_size(False)
            the_table.set_fontsize(14)
            plt.title('{}'.format(dataset.capitalize()) + '($\mathcal{H}^{%d}$)' % dim, fontproperties=prop,
                      fontsize=20)
            plt.yticks(fontproperties=prop, size=20)
            plt.xticks(fontproperties=prop, size=20)
            ax = plt.gca()
            x_major_locator = MultipleLocator(1)
            ax.xaxis.set_major_locator(x_major_locator)
            plt.savefig('./results/icml2023/hdo/pdf/{}_{}.pdf'.format(dataset, dim), bbox_inches='tight', pad_inches=0)
            plt.clf()

The HDO is computed by

if self.manifold.name == 'PoincareBall':
     d2 = self.manifold.dist0(embeddings, c=c).mean()

wenjiyanli · 2024-03-10T02:07:56Z

It is very clear. I greatly appreciate your help!

wenjiyanli · 2024-03-10T02:57:12Z

I hope I'm not being too bothersome with another question, but could you please provide more details on how to compute 'self.d0', 'self.d2', and 'self.hdo'? I guess that 'd2' below might be used to compute 'self.hdo', but I'm unsure about the methods for calculating 'self.d0' and 'self.d2' in the NCModel. Could you clarify the differences between them for me?"

if self.manifold.name == 'PoincareBall':
     d2 = self.manifold.dist0(embeddings, c=c).mean()

class NCModel(BaseModel):
    def __init__(self, args):
        super(NCModel, self).__init__(args)
        self.args = args
        self.decoder = model2decoder[args.model](self.c, args)
        if args.n_classes > 2:
            self.f1_average = 'micro'
        else:
            self.f1_average = 'binary'
        if args.pos_weight:
            self.weights = torch.Tensor([1., 1. / data['labels'][idx_train].mean()])
        else:
            self.weights = torch.Tensor([1.] * args.n_classes)
        if not args.cuda == -1:
            self.weights = self.weights.to(args.device)
        self.d0 = []
        self.d2 = []
        self.hdo = []
        self.center = None
        self.activation = lambda x: x

I would greatly appreciate any additional details you can provide. Thank you for your patience and assistance.

marlin-codes · 2024-03-10T03:36:38Z

Thanks for your question. The three variables are list:

d0 records the root (HC point) distance to the origin and its length equal to the number of real epochs
d2 records the mean distance to the origin and its length equal to the number of real epochs
hdo records all node distance to the origin and its length equal to the number of nodes

wenjiyanli closed this as completed Mar 10, 2024

wenjiyanli reopened this Mar 10, 2024

wenjiyanli closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to replicate the HDO distribution plot? #2

how to replicate the HDO distribution plot? #2

wenjiyanli commented Mar 6, 2024

marlin-codes commented Mar 10, 2024 •

edited

Loading

wenjiyanli commented Mar 10, 2024

wenjiyanli commented Mar 10, 2024

marlin-codes commented Mar 10, 2024 •

edited

Loading

how to replicate the HDO distribution plot? #2

how to replicate the HDO distribution plot? #2

Comments

wenjiyanli commented Mar 6, 2024

marlin-codes commented Mar 10, 2024 • edited Loading

wenjiyanli commented Mar 10, 2024

wenjiyanli commented Mar 10, 2024

marlin-codes commented Mar 10, 2024 • edited Loading

marlin-codes commented Mar 10, 2024 •

edited

Loading

marlin-codes commented Mar 10, 2024 •

edited

Loading