This repository contains the code of the paper:
Analyzing Heterogeneous Network with Missing Attributes by Unsupervised Contrastive Learning
which has been accepted by TNNLS.
- Python3.7
- NumPy
- SciPy
- scikit-learn
- NetworkX
- PyTorch
The preprocessed data are available at Baidu Netdisk (password: hgca) or Google Drive.
Please extract the zip file to folder data
.
Before running the code,please make a directory named checkpoint.
python run.py --cuda --dataset ACM --metapath-weight 0.4#0.6
python run.py --cuda --dataset Yelp --metapath-weight 0.2#0.6#0.2
python run.py --cuda --dataset DBLP --metapath-weight 0.1#0.1#0.8
- Download
data.zip
and extract it to folderdata
(you can delete folders other thanraw
). - Run
python preprocess_ACM.py
(indata
folder) to process raw data (generate a folder namedACM
indata
folder). - Run
python sampling.py --dataset ACM
(inpreprocess
folder) to sample nodes for batch training (generate a folder namedindices
indata/ACM
folder). - Run
python walk.py --dataset ACM
(inpreprocess
folder) to generate sampled node sequence for training of metapath2vec (generate a file namedwalks_ACM.txt
inpreprocess
folder). - Learn joint embeddings via metapath2vec and generate a file named
metapath2vec_ACM_embeddings.txt
inpreprocess
folder.cd metapath2vec
./metapath2vec -train ../preprocess/walks_ACM.txt -output ../preprocess/metapath2vec_ACM_embeddings -pp 0 -size 128 -window 4 -negative 10 -threads 32
- Run
python embedding.py --dataset ACM
(inpreprocess
folder) to processmetapath2vec_ACM_embeddings.txt
(generatemetapath2vec_emb_node.npy
andmetapath2vec_emb_word.npy
indata/ACM
folder). - Run
python run.py --cuda --dataset ACM --metapath-weight 0.4#0.6
Please refer to the code for detailed parameters.
Dongxiao He, et al. "Analyzing Heterogeneous Network with Missing Attributes by Unsupervised Contrastive Learning," IEEE Trans. Neural Netw. Learn. Syst.