Skip to content
forked from fukien/BiANE

Codes for our SIGIR 2020 paper "BiANE: Bipartite Attributed Network Embedding".

License

Notifications You must be signed in to change notification settings

samlidippos/BiANE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BiANE

Codes for our SIGIR 2020 paper BiANE: Bipartite Attributed Network Embedding

Dataset

Dataset should be processed as following:

user_id.tsv: [user_name, '\t', ,user_id], user node id; (user_id should start from 0)

item_id.tsv: [item_name, '\t', ,item_id], item node id; (item_id should start from 0)

adjlist_user_id.tsv: [user_name, '\t', ,user id for adjlist], user node ids of the adjacency list file; (adjlist_user_id should start from 0, which is exactly the same as user_id.)

adjlist_item_id.tsv: [item_name, '\t', ,item_id for adjlist], item node ids of the adjacency list file; (It's suggested that the adjlist_item_id should start from the end of adjlist_user_id. For instance, if adjlist_user_id is from 0 to 100, the adjlist_item_id should start from 101.)

adjlist.txt: [node_itself neighbor_node_0 neighbor_node_1 nerighbor_node_2 neighbor_node_3 ... neighbor_node_k], the adjacency list for the graph (training set), each node is represented as its adjlist id;

train.csv: [user_id, item_id], the dataset for embedding model training. It only contains true links of the inter-partition relations. We take them as positive cases and randomly sample negative cases during the training process to model the inter-partition proximity;

valid.tsv: [user_id, '\t', item_id, '\t', label], the dataset for embedding model validation. It contains both positive cases and negative cases (randomly sampled) for inter-partition links. label indicates that whether the link relation is positive or not. The ratio of positives to negatives is 1:1;

train.tsv: [user_id, '\t', item_id, '\t', label], the dataset for training link prediction model (a logistic regression model). The label information and positive to negative ratio is the same to valid.tsv;

test.tsv: [user_id, '\t', tem_id, '\t', label], the test set for link prediction. The label information and positive to negative ratio is the same to valid.tsv;

user_attr.pkl: user_attr[user_id][:] , a matrix of user attributes;

item_attr.pkl: item_attr[item_id][:], a matrix of item attributes;

emb.txt:
        node_number, dimension (skip this line)
        <\s>(invalid token), embedding (skip this line)
        node_adjlist_id, embedding
        ......                   

, a matrix of high-order structure features for nodes. Each node is adjlist id. This file is the output of metapath2vec++.

Training Output

{dataset}_best_model.pkl, the parameters of the trained AutoEncoders.

Usage

Installation

Please refer to Non-Metric Space Library (NMSLIB) for HNSW installation.

Model Training

  • AMiner:
    cd model
    python gen_metapath.py --dataset ami --path_per_node 10 --path_length 81
    ./code_metapath2vec/metapath2vec -train ../data/ami/metapath_ami.txt -output ../data/ami/emb_ami -pp 1 -size 128 -window 3 -negative 5 -threads 32
    python train.py --dataset ami
    
  • MovieLens
    cd model
    python gen_metapath.py --dataset mvl --path_per_node 10 --path_length 81
    ./code_metapath2vec/metapath2vec -train ../data/mvl/metapath_mvl.txt -output ../data/mvl/emb_mvl -pp 1 -size 128 -window 3 -negative 5 -threads 32
    python train.py --dataset mvl --lambda_6 10 --lambda_9 10 --attr_dim_0_u 23 --attr_dim_0_v 18 --attr_dim_1 32 --attr_dim_2 64 --struc_dim_1 96 --struc_dim_2 64
    

Link Prediction

  • AMiner:
    python link_prediction.py --dataset ami
    
  • MovieLens
    python link_prediction.py --dataset mvl --attr_dim_0_u 23 --attr_dim_0_v 18 --attr_dim_1 32 --attr_dim_2 64 --struc_dim_1 96 --struc_dim_2 64
    

About

Codes for our SIGIR 2020 paper "BiANE: Bipartite Attributed Network Embedding".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.3%
  • C++ 30.3%
  • C 6.0%
  • Makefile 0.4%