Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use heterogeneous pyg object to find anaomalies? #60

Closed
monk1337 opened this issue Nov 30, 2022 · 2 comments
Closed

Comments

@monk1337
Copy link

Hi, Thank you for this awesome package. I am working with heterogeneous and knowledge graphs. For example, if I use the famous MovieLens dataset and construct a heterogeneous graph, Can I feed it to model.fit(data) ?

@monk1337
Copy link
Author

I tried to experiment with hetro graph but getting errors.

My data looks like this:

import pandas as pd
from torch_geometric.data import HeteroData
import torch
from torch_geometric.data import download_url, extract_zip

url = 'https://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
extract_zip(download_url(url, '.'), '.')

movie_path = './ml-latest-small/movies.csv'
rating_path = './ml-latest-small/ratings.csv'

print(pd.read_csv(movie_path).head())
print(pd.read_csv(rating_path).head())

def load_node_csv(path, index_col, encoders=None, **kwargs):
    df = pd.read_csv(path, index_col=index_col, **kwargs)
    mapping = {index: i for i, index in enumerate(df.index.unique())}

    x = None
    if encoders is not None:
        xs = [encoder(df[col]) for col, encoder in encoders.items()]
        x = torch.cat(xs, dim=-1)

    return x, mapping



class SequenceEncoder(object):
    def __init__(self, model_name='all-MiniLM-L6-v2', device=None):
        pass

    @torch.no_grad()
    def __call__(self, df):
        x = torch.rand(df.shape[0], 384)
        # for demo purpose no loading actual model
        print(x.shape)
        return x.cpu()
    
    
class GenresEncoder(object):
    def __init__(self, sep='|'):
        self.sep = sep

    def __call__(self, df):
        genres = set(g for col in df.values for g in col.split(self.sep))
        mapping = {genre: i for i, genre in enumerate(genres)}

        x = torch.zeros(len(df), len(mapping))
        for i, col in enumerate(df.values):
            for genre in col.split(self.sep):
                x[i, mapping[genre]] = 1
        return x
    
    
movie_x, movie_mapping = load_node_csv(
    movie_path, index_col='movieId', encoders={
        'title': SequenceEncoder(),
        'genres': GenresEncoder()
    })


_, user_mapping = load_node_csv(rating_path, index_col='userId')



data = HeteroData()

data['user'].num_nodes = len(user_mapping)  # Users do not have any features.
data['movie'].x = movie_x

print(data)

Now feeding this data to pygod model


# train a dominant detector
from pygod.models import DOMINANT
model = DOMINANT(num_layers=4, epoch=2)  # hyperparameters can be set here

model.fit(data)  # data is a Pytorch Geometric data object

giving error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [39], line 1
----> 1 model.fit(data)  # data is a Pytorch Geometric data object

File ~/miniforge3/envs/M1Max/lib/python3.8/site-packages/pygod/models/dominant.py:129, in DOMINANT.fit(self, G, y_true)
    111 def fit(self, G, y_true=None):
    112     """
    113     Fit detector with input data.
    114 
   (...)
    127         Fitted estimator.
    128     """
--> 129     G.node_idx = torch.arange(G.x.shape[0])
    130     G.s = to_dense_adj(G.edge_index)[0]
    132     # automated balancing by std

File ~/miniforge3/envs/M1Max/lib/python3.8/site-packages/torch_geometric/data/hetero_data.py:133, in HeteroData.__getattr__(self, key)
    131 elif bool(re.search('_dict$', key)):
    132     return self.collect(key[:-5])
--> 133 raise AttributeError(f"'{self.__class__.__name__}' has no "
    134                      f"attribute '{key}'")

AttributeError: 'HeteroData' has no attribute 'x'

@kayzliu
Copy link
Member

kayzliu commented Nov 30, 2022

PyGOD only works with homogeneous graphs by far. One of the potential solutions is converting the heterogeneous graph to a homogeneous one with to_homogeneous. Hope this helps.

@kayzliu kayzliu closed this as completed Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants