Skip to content

Commit

Permalink
update to README
Browse files Browse the repository at this point in the history
  • Loading branch information
srijankr committed Jul 15, 2019
1 parent 4c35350 commit f3d0063
Showing 1 changed file with 36 additions and 22 deletions.
58 changes: 36 additions & 22 deletions README.md
@@ -1,18 +1,20 @@
## JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks
## JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks (ACM SIGKDD 2019)

This repository has the code for the paper:
*Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks*. Srijan Kumar, Xikun Zhang, Jure Leskovec. The paper is published at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.

#### Authors: [Srijan Kumar](http://cs.stanford.edu/~srijan) (srijan@cs.stanford.edu), [Xikun Zhang]() (xikunz2@illinois.edu)
*Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. Srijan Kumar, Xikun Zhang, and Jure Leskovec.*
The paper is published at the *ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.*

#### Code authors: [Srijan Kumar](http://cs.stanford.edu/~srijan) (srijan@cs.stanford.edu), [Xikun Zhang]() (xikunz2@illinois.edu)
#### [Project website with links to the datasets](http://snap.stanford.edu/jodie/)
#### [Link to the paper](https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019.pdf)

### Introduction
JODIE is a representation learning framework for temporal interaction networks. Given a sequence of entity-entity interactions, JODIE learns a dynamic embedding trajectory for every entity, which can then be used for various downstream machine learning tasks. JODIE is fast and makes accurate predictions on temporal interaction network.

JODIE can be used for two broad category of tasks:
1. **Interaction prediction**: Which two entities will interact next? This has applications in recommender system and modeling network evolution.
2. **State change prediction**: When does the state of an entity change (e.g., from normal to abnormal)? This has applications in anomaly detection, ban prediction, dropout and churn prediction, fraud and account compromise, and more.
1. **Interaction prediction**: Which two entities will interact next? Example applications are recommender systems and modeling network evolution.
2. **State change prediction**: When does the state of an entity change (e.g., from normal to abnormal)? Example applications are anomaly detection, ban prediction, dropout and churn prediction, and fraud and account compromise.

If you make use of this code, the JODIE algorithm, the T-batch algorithm, or the datasets in your work, please cite the following paper:
```
Expand Down Expand Up @@ -44,51 +46,63 @@ To download the datasets used in the paper, use the following command. This will
$ ./download_data.sh
```

### Run the JODIE code
### Running the JODIE code

To train the JODIE model, use the following command. This will save a model for every epoch in the `saved_models/<network>/` directory.
To train the JODIE model using the `data/<network>.csv` dataset, use the following command. This will save a model for every epoch in the `saved_models/<network>/` directory.
```
$ python jodie.py --network reddit --model jodie --epochs 50
$ python jodie.py --network <network> --model jodie --epochs 50
```

This code can be given the following command-line arguments:
1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`, where `<network> = reddit` in the example above. The dataset format is explained below. This is a required argument.
1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`. The dataset format is explained below. This is a required argument.
2. `--model`: this is the name of the model and the file where the model will be saved in the `saved_models/` directory. Default value: jodie.
3. `--gpu`: this is the id of the gpu where the model is run. Default value: -1 (to run on the GPU with the most free memory).
4. `--epochs`: this is the maximum number of interactions to train the model. Default value: 50.
5. `--embedding_dim`: this is the number of dimensions of the dynamic embedding. Default value: 128.
6. `--train_proportion`: this is the fraction of interactions (from the beginning) that are used for training. The next 10% are used for validation and the next 10% for testing. Default value: 0.8
7. `--state_change`: this is a boolean input indicating if the training is done with state change prediction along with interaction prediction. Default value: True.

### Run the T-Batch code
### Evaluate the model

To create T-Batches of a temporal network, use the following command. This will save a file with T-Batches in the `results/tbatches_<network>.csv` file. Note that the entire input will be converted to T-Batches. To convert only training data, please input a file with only the training interactions.
#### Interaction prediction

To evaluate the performance of the model for the interaction prediction task, use the following command. The command iteratively evaluates the performance for all epochs of the model and outputs the final test performance.
```
$ python tbatch.py --network reddit
$ chmod +x evaluate_all_epochs.sh
$ ./evaluate_all_epochs.sh <network> interaction
```

This code can be given the following command-line arguments:
1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`, where `<network> = reddit` in the example above. The dataset format is explained below. This is a required argument.
To evaluate the trained model's performance for predicting interactions in **only one epoch**, use the following command. This will output the performance numbers to the `results/interaction_prediction_<network>.txt` file.
```
$ python evaluate_interaction_prediction.py --network <network> --model jodie --epoch 49
```

The file `get_final_performance_numbers.py` reads all the outputs of each epoch, stored in the `results/` folder, and finds the best validation epoch.

### Run the evaluation code
#### State change prediction

To evaluate the trained model's performance in predicting interactions, use the following command.
To evaluate the performance of the model for the state change prediction task, use the following command. The command iteratively evaluates the performance for all epochs of the model and outputs the final test performance.
```
$ python evaluate_interaction_prediction.py --network reddit --model jodie --epoch 49
$ chmod +x evaluate_all_epochs.sh
$ ./evaluate_all_epochs.sh <network> state
```

To evaluate the trained model's performance in predicting interactions, use the following command. This will add the performance numbers to the `results/interaction_prediction_<network>.txt` file.
To evaluate the trained model's performance for predicting state change in **only one epoch**, use the following command. This will output the performance numbers to the `results/state_change_prediction_<network>.txt` file.
```
$ python evaluate_interaction_prediction.py --network reddit --model jodie --epoch 49
$ python evaluate_state_change_prediction.py --network <network> --model jodie --epoch 49
```

To evaluate the trained model's performance in predicting user state change, use the following command. This will add the performance numbers to the `results/state_change_prediction_<network>.txt` file.
### Run the T-Batch code

To create T-Batches of a temporal network, use the following command. This will save a file with T-Batches in the `results/tbatches_<network>.csv` file. Note that the entire input will be converted to T-Batches. To convert only training data, please input a file with only the training interactions.

```
$ python evaluate_state_change_prediction.py --network reddit --model jodie --epoch 49
$ python tbatch.py --network <network>
```

This code can be given the following command-line arguments:
1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `<network>.csv`. The dataset format is explained below. This is a required argument.


### Dataset format

The networks are stored under the `data/` folder, one file per network. The filename should be `<network>.csv`.
Expand Down

0 comments on commit f3d0063

Please sign in to comment.