Skip to content
Permalink
Browse files

edits to readme

  • Loading branch information...
srijankr
srijankr committed Jul 15, 2019
1 parent e849ea6 commit 680cbbce598ada27e19d320a6ae38f4f655e7754
Showing with 127 additions and 23 deletions.
  1. +112 −17 README.md
  2. +2 −1 evaluate_interaction_prediction.py
  3. +2 −1 evaluate_state_change_prediction.py
  4. +8 −1 initialize.sh
  5. +3 −3 jodie.py
129 README.md
@@ -1,34 +1,129 @@
# JODIE
Code for ACM SIGKDD 2019 paper "Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks"
## JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks

Link to paper: https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019.pdf
This repository has the code for the paper: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. Srijan Kumar, Xikun Zhang, Jure Leskovec. The paper is published at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.

Project website with links to datasets: http://snap.stanford.edu/jodie/
#### Authors: [Srijan Kumar](http://cs.stanford.edu/~srijan) (srijan@cs.stanford.edu), [Xikun Zhang]() (xikunz2@illinois.edu)
#### [Project website with links to the datasets](http://snap.stanford.edu/jodie/)
#### [Link to the paper](https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019.pdf)


# Introduction
### Introduction
JODIE is a representation learning framework for temporal interaction networks. Given a sequence of entity-entity interactions, JODIE learns a dynamic embedding trajectory for every entity, which can then be used for various downstream machine learning tasks. JODIE is fast and makes accurate predictions on temporal interaction network.

# Motivation
JODIE can be used for two broad category of tasks:
(1) **Interaction prediction**: Which two entities will interact next? This has applications in recommender system and modeling network evolution.
(2) **State change prediction**: When does the state of an entity change (e.g., from normal to abnormal)? This has applications in anomaly detection, ban prediction, dropout and churn prediction, fraud and account compromise, and more.

If you make use of this code, the JODIE algorithm, the T-batch algorithm, or the datasets in your work, please cite the following paper:

```
@inproceedings{kumar2019predicting,
title={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},
author={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},
booktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},
year={2019},
organization={ACM}
}
```

### Motivation
Temporal interaction networks provide an expressive language to represent time-evolving and dynamic interactions between entities. Representation learning provides a powerful tool to model and reason on networks. However, as networks evolve over time, a single (static) embedding becomes insufficient to represent the changing behavior of the entities and the dynamics of the network.

JODIE is a representation learning framework that embeds every entity in a Euclidean space and their evolution is modeled by an embedding trajectory in this space. JODIE learns to project/forecast the embedding trajectories into the future to make predictions about the entities and their interactions. These trajectories can be trained for downstream tasks, such as recommendations and predictions. JODIE is scalable to large networks by employing a novel t-Batch algorithm that creates batches of independent edges that can be processed simulaneously.

# References
Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. S. Kumar, X. Zhang, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.
### Setup

To initialize the directories needed to store data and outputs, use the following command. This will create `data/`, `saved_models/`, and `results/` directories.
```
$ chmod +x initialize.sh
$ ./initialize.sh
```

To download the datasets used in the paper, use the following command. This will download four datasets under the `data/` directory: `reddit.csv`, `wikipedia.csv`, `mooc.csv`, and `lastfm.csv`.
```
$ chmod +x download_data.sh
$ ./download_data.sh
```

### Run the JODIE code

To train the JODIE model, use the following command. This will save a model for every epoch in the `saved_models/<network>/` directory.
```
$ python jodie.py --network reddit --model jodie --epochs 50
```

You may use the following BibTeX entry:
This code can be given the following command-line arguments:
```
(1) `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`, where `<network> = reddit` in the example above. The dataset format is explained below. This is a required argument.
(2) `--model`: this is the name of the model and the file where the model will be saved in the `saved_models/` directory. Default value: jodie.
(3) `--gpu`: this is the id of the gpu where the model is run. Default value: -1 (to run on the GPU with the most free memory).
(4) `--epochs`: this is the maximum number of interactions to train the model. Default value: 50.
(5) `--embedding_dim`: this is the number of dimensions of the dynamic embedding. Default value: 128.
(6) `--train_proportion`: this is the fraction of interactions (from the beginning) that are used for training. The next 10% are used for validation and the next 10% for testing. Default value: 0.8
(7) `--state_change`: this is a boolean input indicating if the training is done with state change prediction along with interaction prediction. Default value: True.
```

@inproceedings{kumar2019predicting,
### Run the T-Batch code

title={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},
To create T-Batches of a temporal network, use the following command. This will save a file with T-Batches in the `results/tbatches_<network>.csv` file. Note that the entire input will be converted to T-Batches. To convert only training data, please input a file with only the training interactions.

author={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},
```
$ python tbatch.py --network reddit
```

booktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},
This code can be given the following command-line arguments:
```
(1) `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`, where `<network> = reddit` in the example above. The dataset format is explained below. This is a required argument.
```

year={2019},
### Run the evaluation code

To evaluate the trained model's performance in predicting interactions, use the following command.
```
$ python evaluate_interaction_prediction.py --network reddit --model jodie --epoch 49
```

To evaluate the trained model's performance in predicting interactions, use the following command. This will add the performance numbers to the `results/interaction_prediction_<network>.txt` file.
```
$ python evaluate_interaction_prediction.py --network reddit --model jodie --epoch 49
```

To evaluate the trained model's performance in predicting user state change, use the following command. This will add the performance numbers to the `results/state_change_prediction_<network>.txt` file.
```
$ python evaluate_state_change_prediction.py --network reddit --model jodie --epoch 49
```

### Dataset format

The networks are stored under the `data/` folder, one file per network. The filename should be `<network>.csv`.

The network should be in the following format:
```
One line per interaction/edge.
Each line should be: *user, item, timestamp, state label, comma-separated array of features*. First line is the network format.
*User* and *item* fields can be alphanumeric.
*Timestamp* should be in cardinal format (not in datetime).
*State label* should be 1 whenever the user state changes, 0 otherwise. If there are no state labels, use 0 for all interactions.
*Feature list* can be as long as desired. It should be atleast 1 dimensional. If there are no features, use 0 for all interactions.
```

For example, the first few lines of a dataset can be:
```
user,item,timestamp,state_label,comma_separated_list_of_features
0,0,0.0,0,0.1,0.3,10.7
2,1,6.0,0,0.2,0.4,0.6
5,0,41.0,0,0.1,15.0,0.6
3,2,49.0,1,100.7,0.8,0.9
```

### References
Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. S. Kumar, X. Zhang, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.

organization={ACM}
If you make use of this code, the JODIE algorithm, the T-batch algorithm, or the datasets in your work, please cite the following paper:

}
@inproceedings{kumar2019predicting,
title={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},
author={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},
booktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},
year={2019},
organization={ACM}
}
@@ -1,6 +1,7 @@
'''
This code evaluates the validation and test performance of the model trained in jodie.py.
The task is: interaction prediction.
The task is: interaction prediction, i.e., predicting which item will a user interact with?
This has applications in recommender system and modeling network evolution.
How to run:
$ python evaluate_interaction_prediction.py --network reddit --model jodie --epoch 49
@@ -1,6 +1,7 @@
'''
This code evaluates the validation and test performance of the model trained in jodie.py.
The task is: user state change prediction.
The task is: user state change prediction, i.e., when the state of a user changes from one to another, say normal to abnormal.
This has applications in detecting anomaly, fraud, churn, account compromise, and so on.
How to run:
$ python evaluate_state_change_prediction.py --network reddit --model jodie --epoch 49
@@ -1,6 +1,13 @@
#!/bin/bash

mkdir data/
mkdir saved_models/

mkdir results/

mkdir saved_models/
mkdir saved_models/reddit/
mkdir saved_models/wikipedia/
mkdir saved_models/mooc/
mkdir saved_models/lastfm/


@@ -3,7 +3,7 @@
The task is: interaction prediction.
How to run:
$ python jodie.py --network reddit --model jodie --epoch 50
$ python jodie.py --network reddit --model jodie --epochs 50
Paper: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. S. Kumar, X. Zhang, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.
'''
@@ -19,8 +19,8 @@
parser.add_argument('--gpu', default=-1, type=int, help='ID of the gpu to run on. If set to -1 (default), the GPU with most free memory will be chosen.')
parser.add_argument('--epochs', default=50, type=int, help='Number of epochs to train the model')
parser.add_argument('--embedding_dim', default=128, type=int, help='Number of dimensions of the dynamic embedding')
parser.add_argument('--train_proportion', default=0.8, type=float, help='Proportion of data (from beginning) in training')
parser.add_argument('--state_change', default=True, type=bool, help='True if training with state change of users in addition to the next interaction prediction. False otherwise. By default, set to True.')
parser.add_argument('--train_proportion', default=0.8, type=float, help='Fraction of interactions (from the beginning) that are used for training.The next 10% are used for validation and the next 10% for testing')
parser.add_argument('--state_change', default=True, type=bool, help='True if training with state change of users along with interaction prediction. False otherwise. By default, set to True.')
args = parser.parse_args()

args.datapath = "data/%s.csv" % args.network

0 comments on commit 680cbbc

Please sign in to comment.
You can’t perform that action at this time.