HinSAGE Node Classification
This is an example of using a Heterogenous extension to the GraphSAGE algorithm , called HinSAGE, to classify the nodes in a heterogeneous network (a network with multiple node and link types).
This example uses the Yelp dataset and aims to predict the 'elite' status of users. It does this as a binary classification task predicting if the user has had 'elite' status in any year or has never had 'elite' status.
Install the StellarGraph library following the instructions at: https://github.com/stellargraph/stellargraph
Additional requirements are Pandas, Numpy and Scikit-Learn. These are installed as depdendencies of the StellarGraph library.
Currently the examples in this directory use the Yelp dataset. The Yelp dataset can be obtained by navigating to https://www.yelp.com/dataset, selecting "Download Dataset" and signing the licence agreement. Then, download the JSON dataset and uncompress it to an appropriate location in your filesystem.
There are several different versions of the dataset, enumerated by "rounds",
to check which round the dataset is, look for the file named "Yelp_Dataset_Challenge_Round_XX.pdf"
where "XX" is the round number. The current script supports rounds 12 and 13.
The path to the Yelp dataset used below should contain the JSON files including
user.json depending on the round number.
The example code uses a preprocessed version of the dataset that is generated
To run this script, specify:
the path to the Yelp dataset that you downloaded (
the output directory (
and the round number (
-r, which defaults to 13).
python yelp-preprocessing.py -r XX -l <path_to_yelp_dataset> -o .
By default the script will filter the graph to contain only businesses in the state of Wisconsin. To change this to another state, or to "false" to use the entire dataset (warning: this will make the machine learning example run very slowly and will require a lot of memory as the entire graph will be loaded).
Example usage to run without filtering:
python yelp-preprocessing.py -l <path_to_yelp_dataset> -o . --filter_state=false
Running the example
The example script can be run on supplying the location of the preprocessed Yelp dataset.
python yelp-example.py -l <location_of_preprocessed_data>
Additional command line arguments are available to tune the training of the model, to see a
description of these arguments use the
python yelp-example.py --help
 W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” presented at NIPS 2017 (arXiv:1706.02216).