Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison between weighted and unweighted hypergraphs in clustering #99

Closed
young917 opened this issue Sep 9, 2022 · 4 comments
Closed

Comments

@young917
Copy link

young917 commented Sep 9, 2022

Hello, thank you for sharing this excellent library.

I want to compare the performance of clustering on weighted and unweighted hypergraphs as described in K. Hayashi, S. Aksoy, C. Park, H. Park, "Hypergraph random walks, Laplacians, and clustering".
I tried to do this based on Tutorial 11.
However, I think some modifications need.
h = hnx.Hypergraph(hnx.StaticEntitySet(data=data, weights=w))
I think this code does not make a weighted hypergraph.
Instead, this should be revised as below.
h = hnx.Hypergraph(hnx.StaticEntitySet(data=data, weights=w), weights=w)

Additionally, I cannot get the satisfying result that weighted hypergraphs perform better than unweighted ones.
Thus, I want to ask whether the below code is a correct way to make clustering on an unweighted hypergraph and evaluate clustering algorithms by NMI scores.

# get data
categories = all_categories[[1,15]]
twenty_train = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=42)
doc_types=dict()
for i,x in enumerate(twenty_train.filenames):
    doc_types[i]=x.split('/')[-2]
tfidf_vect = TfidfVectorizer()
X_tfidf = tfidf_vect.fit_transform(twenty_train.data)

# construct hypergraph
mat = coo_matrix(X_tfidf)
edges = mat.col
nodes = mat.row
data = np.array([edges,nodes]).T
weights = mat.data
# clustering on weighted hypergraph
h = hnx.Hypergraph(hnx.StaticEntitySet(data=data,weights=weights),weights=weights)
clusters=hnx.spec_clus(h,num_clus,weights=True)
# clustering on unweighted hypergraph
uwh =  hnx.Hypergraph(hnx.StaticEntitySet(data=data,weights=None))
uw_clusters = hnx.spec_clus(uwh,num_clus,weights=False)

# construct clustering label list
categoryindexing = {}
_labels = {} # answer
for i in range(X_tfidf.shape[0]):
    if doc_types[i] not in categoryindexing:
        categoryindexing[doc_types[i]] = len(categoryindexing)
    ci = categoryindexing[doc_types[i]]
    _labels[i] = ci
labels = [_labels[i] for i in range(X_tfidf.shape[0])]
_w_pred = {} #  labels from weighted hypergraph
for i in clusters:
    for v in clusters[i]:
        _w_pred[v] = i
w_pred = [_w_pred[i] for i in range(X_tfidf.shape[0])]
_uw_pred = {} # labels from unweighted hypergraph
for i in uw_clusters:
    for v in uw_clusters[i]:
        _uw_pred[v] = i
uw_pred = [_uw_pred[i] for i in range(X_tfidf.shape[0])]

# evaluation on NMI score
from sklearn.metrics.cluster import normalized_mutual_info_score as NMI_SCORE
score = NMI_SCORE(labels, w_pred)
print(score) # 0.73
score = NMI_SCORE(labels, uw_pred)
print(score) # 0.78
@brendapraggastis
Copy link
Collaborator

We have contacted the authors and will look with them at your code. Will post when we have more information.

@thosvarley
Copy link

thosvarley commented Sep 9, 2022

I have run into the same issue. If I run:

hnx.Hypergraph(edges, weights=[w1, w2, w3...]) the resulting hypergraph has all weights of 1.0

If I try the line that young917 suggested above, it doesn't work either.

@brendapraggastis
Copy link
Collaborator

This is a bug. We are working on it.

@brendapraggastis
Copy link
Collaborator

@thosvarley and @young917 HNX 2.0 will be released on Saturday May 13. You will add cell weights to the incidence matrix using the cell_weights keyword in the hypergraph constructor. Please read the documentation for formatting and let us know if anything is unclear.

bonicim added a commit that referenced this issue Jun 29, 2023
Merge in HYP/hypernetx from releases/v2.0.2 to develop

* commit 'dd76f358ef5d6f2b76c24260bb7edb6ad4ac98c0':
  bump: version 2.0.1 → 2.0.2
  Fix import try catch block; update pypi workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants