Is it possible to use heterogeneous pyg object to find anaomalies? #60

monk1337 · 2022-11-30T09:14:46Z

Hi, Thank you for this awesome package. I am working with heterogeneous and knowledge graphs. For example, if I use the famous MovieLens dataset and construct a heterogeneous graph, Can I feed it to model.fit(data) ?

The text was updated successfully, but these errors were encountered:

monk1337 · 2022-11-30T10:39:01Z

I tried to experiment with hetro graph but getting errors.

My data looks like this:

import pandas as pd
from torch_geometric.data import HeteroData
import torch
from torch_geometric.data import download_url, extract_zip

url = 'https://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
extract_zip(download_url(url, '.'), '.')

movie_path = './ml-latest-small/movies.csv'
rating_path = './ml-latest-small/ratings.csv'

print(pd.read_csv(movie_path).head())
print(pd.read_csv(rating_path).head())

def load_node_csv(path, index_col, encoders=None, **kwargs):
    df = pd.read_csv(path, index_col=index_col, **kwargs)
    mapping = {index: i for i, index in enumerate(df.index.unique())}

    x = None
    if encoders is not None:
        xs = [encoder(df[col]) for col, encoder in encoders.items()]
        x = torch.cat(xs, dim=-1)

    return x, mapping



class SequenceEncoder(object):
    def __init__(self, model_name='all-MiniLM-L6-v2', device=None):
        pass

    @torch.no_grad()
    def __call__(self, df):
        x = torch.rand(df.shape[0], 384)
        # for demo purpose no loading actual model
        print(x.shape)
        return x.cpu()
    
    
class GenresEncoder(object):
    def __init__(self, sep='|'):
        self.sep = sep

    def __call__(self, df):
        genres = set(g for col in df.values for g in col.split(self.sep))
        mapping = {genre: i for i, genre in enumerate(genres)}

        x = torch.zeros(len(df), len(mapping))
        for i, col in enumerate(df.values):
            for genre in col.split(self.sep):
                x[i, mapping[genre]] = 1
        return x
    
    
movie_x, movie_mapping = load_node_csv(
    movie_path, index_col='movieId', encoders={
        'title': SequenceEncoder(),
        'genres': GenresEncoder()
    })


_, user_mapping = load_node_csv(rating_path, index_col='userId')



data = HeteroData()

data['user'].num_nodes = len(user_mapping)  # Users do not have any features.
data['movie'].x = movie_x

print(data)

Now feeding this data to pygod model


# train a dominant detector
from pygod.models import DOMINANT
model = DOMINANT(num_layers=4, epoch=2)  # hyperparameters can be set here

model.fit(data)  # data is a Pytorch Geometric data object

giving error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [39], line 1
----> 1 model.fit(data)  # data is a Pytorch Geometric data object

File ~/miniforge3/envs/M1Max/lib/python3.8/site-packages/pygod/models/dominant.py:129, in DOMINANT.fit(self, G, y_true)
    111 def fit(self, G, y_true=None):
    112     """
    113     Fit detector with input data.
    114 
   (...)
    127         Fitted estimator.
    128     """
--> 129     G.node_idx = torch.arange(G.x.shape[0])
    130     G.s = to_dense_adj(G.edge_index)[0]
    132     # automated balancing by std

File ~/miniforge3/envs/M1Max/lib/python3.8/site-packages/torch_geometric/data/hetero_data.py:133, in HeteroData.__getattr__(self, key)
    131 elif bool(re.search('_dict$', key)):
    132     return self.collect(key[:-5])
--> 133 raise AttributeError(f"'{self.__class__.__name__}' has no "
    134                      f"attribute '{key}'")

AttributeError: 'HeteroData' has no attribute 'x'

kayzliu · 2022-11-30T18:08:15Z

PyGOD only works with homogeneous graphs by far. One of the potential solutions is converting the heterogeneous graph to a homogeneous one with to_homogeneous. Hope this helps.

kayzliu closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use heterogeneous pyg object to find anaomalies? #60

Is it possible to use heterogeneous pyg object to find anaomalies? #60

monk1337 commented Nov 30, 2022

monk1337 commented Nov 30, 2022

kayzliu commented Nov 30, 2022

Is it possible to use heterogeneous pyg object to find anaomalies? #60

Is it possible to use heterogeneous pyg object to find anaomalies? #60

Comments

monk1337 commented Nov 30, 2022

monk1337 commented Nov 30, 2022

kayzliu commented Nov 30, 2022