-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/gcn lstm #1085
Feature/gcn lstm #1085
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
demos/spatio-temporal/input_data.py
Outdated
|
||
import numpy as np | ||
import pandas as pd | ||
import pickle as pkl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider possible security implications associated with pickle module.
Code Climate has analyzed commit 92a2c65 and detected 1 issue on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a few questions about the approach and data used here, as well as a couple of minor things about the Graph_Convolution_LSTM
class.
demos/spatio-temporal/input_data.py
Outdated
import pickle as pkl | ||
|
||
def load_sz_data(dataset): | ||
sz_adj = pd.read_csv(r'data/sz_adj.csv',header=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the sz
data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the alternative dataset of trajectories of taxis. I had it as alternative but not using it. I will remove it.
demos/spatio-temporal/input_data.py
Outdated
return sz_tf, adj | ||
|
||
def load_los_data(dataset): | ||
los_adj = pd.read_csv(r'data/los_adj.csv',header=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the los
data? It would be best if we could add it to stellargraph/datasets/datasets.py
instead of having an extra script for the demo. If it ends up being too awkward to add it there, feel free to open an issue for follow up work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be easily added to the stellargraph/datasets/datasets.py
.
There are a number of other methods that I think of as data related such as formatting the data in a time series to be fed into a time series forecasting model. So having it all in one utility file where anyone interested in just doing spatio-temporal stuff can look them, instead of looking them up in a centralized utility file. This was just for convenience. I can merge it in the stellargraph/datasets/datasets.py
.
stellargraph/layer/gcn_lstm.py
Outdated
from .preprocessing_layer import GraphPreProcessingLayer | ||
|
||
|
||
class GraphConvolution(Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this differs to the existing gcn.GraphConvolution
layer in four ways:
- the adjacency matrix is a constant
- it doesn't squeeze out the batch dimension
- it does some transposes in the matrix multiplications
- it doesn't have special handling for
final_layer=True
Is this accurate? Could you expand on why 1 and 3 in particular are needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this differs to the existing
gcn.GraphConvolution
layer in four ways:
- the adjacency matrix is a constant
Yes, it is. There are two pragmatic reasons for that but neither are binding.
- The temporal data means structure is fixed but features are changing. So its from performance perspective better to just load it at initialization. However, we can make it more general for dynamic graphs and also to make it more aligned with the already existing
graph_convolution
layer. - The graph convolution happens for each timestep of each observation. For example, for a sequence of 10 timesteps, gcn is performed for each one of them with the same static graph but different time (feature), so seems like a big overhead of passing the adjacency matrix in each build.
I am sure there is a more optimal way of doing this. This was just meant to produce a working version.
- it doesn't squeeze out the batch dimension
Yes and this is by design. Keras
require the batch dimension whereas GCN
doesn't so even though a unit batch dimension is passed it gets squeezed inside the layer. However, in the gcn_lstm
, the data is passed as a 3D tensor [batch, sequence, nodes]
, don't need to squeeze the batch dimension.
Maybe we can add the not squeezing as an option to the existing graph_convolution
layer but didn't want to update it as a first pass until I have the actual dimensions I need to gcn_lstm
figured out. This can be merged in the more polished version.
- it does some transposes in the matrix multiplications
Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.
- it doesn't have special handling for
final_layer=True
Since here graph_convolution
is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer
, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution
layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.
Is this accurate? Could you expand on why 1 and 3 in particular are needed?
See above. Any thoughts on these design choices are really appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. There are two pragmatic reasons for that but neither are binding.
Makes sense; seems fine to start with it as written for now 👍
However, in the gcn_lstm, the data is passed as a 3D tensor [batch, sequence, nodes]
And batch > 1
here?
Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.
Ah, right, our other GCN layers have input shape [batch, nodes, features]
, makes sense!
Since here graph_convolution is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.
The other GCN layers are usually stacked into another dense layer to compute predictions (or similar), so they're not the "final layer" of the (whole) model either. The final_layer
handling is so that the full-batch layer only yields predictions for the nodes that were passed to the .flow
method, rather than all nodes.
Given that, this code currently seems to just yield predictions for all nodes. I think this means it's not possible to train on only a subset of labelled nodes? (For instance, do a train/test split across the nodes, rather than the times?) I guess this isn't too important for these sort of datasets, because it's time that is interesting and so it's assumed we'll always have information for every node?
@@ -36,3 +36,4 @@ | |||
from .rgcn import * | |||
from .watch_your_step import * | |||
from .knowledge_graph import * | |||
from .gcn_lstm import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both gcn
and gcn_lstm
now include a class called GraphConvolution
, which means these *
imports will shadow each other as the stellargraph.layer.GraphConvolution
name. I think in particular the gcn_lstm
one will "win" so existing code importing stellargraph.layer.GraphConvolution
will switch from using GCN one to using GCN-LSTM one. They're not interchangeable so that code will break.
One way to fix this would be to rename the gcn_lstm
GraphConvolution
to something like FixedAdjacencyGraphConvolution
or even just LSTMGraphConvolution
so that it no longer overlaps with the gcn
one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it causing conflicts. I'll fix the name.
However, why would the gcn_lstm
's GraphConvolution
would win in the import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's last, and so overwrites the earlier one. I guess one can think of it like:
GraphConvolution = gcn.GraphConvolution
...
GraphConvolution = gcn_lstm.GraphConvolution
stellargraph/layer/gcn_lstm.py
Outdated
return output | ||
|
||
|
||
class Graph_Convolution_LSTM(Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our other "models" aren't actually formal models, meaning they don't inherit from tf.keras.Model
. I think it would be good to stick to that.
Also, Python style is to not have underscores in class names, e.g.
class Graph_Convolution_LSTM(Model): | |
class GraphConvolutionLSTM(Model): |
but potentially it's best to break the "style" slightly and call it:
class Graph_Convolution_LSTM(Model): | |
class GCN_LSTM(Model): |
stellargraph/layer/gcn_lstm.py
Outdated
self.dense = Dense(self.outputs, activation=self.activations[2]) | ||
self.dropout = Dropout(self.dropout) | ||
|
||
def call(self, inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echoing the comment about the name, our "model" classes usually have two methods:
def __call__(self, inputs): # like this `call`, but note the double-underscore, to make it work like `some_model(inputs)`
...
def build(self): # builds appropriate input tensors and then applies the model
inputs = ...
outputs = self(inputs)
return inputs, outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for implementing this! aside from @huonw's feedback I have a comment about sequence_data_preparation
and integrating this with the library.
Also re. the storing the adjacency in the GraphConvolution
, I guess this makes gcn lstm non-inductive? I'm not familiar with spatial temporal modelling but is this important? Are there cases when you'd want to train on one graph and test on another? Or even cases where the adjacency matrix changes with time?
demos/spatio-temporal/input_data.py
Outdated
test_scaled = (test_data - min_speed)/ (max_speed - min_speed) | ||
return train_scaled, test_scaled | ||
|
||
def sequence_data_preparation(seq_len, pre_len, train_data, test_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think seems like something that should be in the stellargraph/mappers
rather than a demo script. It looks like sequence_data_preparation
is essential for using gcn lstm so we should include this in the library.
Also, I think this could refactored be refactored into a stellargraph generator and either a keras
Sequence
or a tf
Dataset
. If I'm understanding this right, say if you have N
nodes, T
time steps, and sequence length L
, this function takes in a NxT
array and returns a NxTxL
array. This could get quite big! with a Sequence
or Dataset
object you could generate batches of N x batch_times x L
on the fly to avoid having to construct the NxTxL
array. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plots in the notebooks look really good 👍
@@ -36,3 +36,4 @@ | |||
from .rgcn import * | |||
from .watch_your_step import * | |||
from .knowledge_graph import * | |||
from .gcn_lstm import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's last, and so overwrites the earlier one. I guess one can think of it like:
GraphConvolution = gcn.GraphConvolution
...
GraphConvolution = gcn_lstm.GraphConvolution
stellargraph/layer/gcn_lstm.py
Outdated
from .preprocessing_layer import GraphPreProcessingLayer | ||
|
||
|
||
class GraphConvolution(Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. There are two pragmatic reasons for that but neither are binding.
Makes sense; seems fine to start with it as written for now 👍
However, in the gcn_lstm, the data is passed as a 3D tensor [batch, sequence, nodes]
And batch > 1
here?
Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.
Ah, right, our other GCN layers have input shape [batch, nodes, features]
, makes sense!
Since here graph_convolution is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.
The other GCN layers are usually stacked into another dense layer to compute predictions (or similar), so they're not the "final layer" of the (whole) model either. The final_layer
handling is so that the full-batch layer only yields predictions for the nodes that were passed to the .flow
method, rather than all nodes.
Given that, this code currently seems to just yield predictions for all nodes. I think this means it's not possible to train on only a subset of labelled nodes? (For instance, do a train/test split across the nodes, rather than the times?) I guess this isn't too important for these sort of datasets, because it's time that is interesting and so it's assumed we'll always have information for every node?
@@ -849,3 +850,55 @@ def load(self): | |||
) | |||
|
|||
return StellarGraph(nodes=nodes, edges=edges, edge_weight_column="time"), edges | |||
|
|||
|
|||
@experimental(reason="the data isn't downloaded automatically", issues=[9999]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 9999 in my suggestion here and in test-demo-notebooks.sh
was a placeholder that needs to be replaced by a real issue number. At the moment, this is creating a link like https://github.com/stellargraph/stellargraph/issues/9999, which doesn't currently exist. Could you open an issue about this and then replace the 9999 here and in test-demo-notebooks.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Ok! I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Reminder:
@experimental(reason="the data isn't downloaded automatically", issues=[9999]) | |
@experimental(reason="the data isn't downloaded automatically", issues=[1303]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! just a few docstring edits needed I think
stellargraph/layer/gcn_lstm.py
Outdated
@@ -212,41 +195,41 @@ def call(self, features): | |||
class GraphConvolutionLSTM: | |||
|
|||
""" | |||
A stack of 2 Graph Convolutional layers followed by an LSTM, Dropout and, Dense layer. | |||
A stack of N1 Graph Convolutional layers followed by N2 LSTM, Dropout and, Dense layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might read a bit clearer:
A stack of N1 Graph Convolutional layers followed by N2 LSTM, Dropout and, Dense layer. | |
A stack of N1 Graph Convolutional layers followed by N2 LSTM layers, a Dropout layer, and a Dense layer. |
2. 1 LSTM layer | ||
The StellarGraph implementation is built as a stack of the following set of layers: | ||
1. User specified no. of Graph Convolutional layers | ||
2. User specified no. of LSTM layers | ||
3. 1 Dense layer | ||
4. 1 Dropout layer | ||
The last two layers consistently showed better performance and regularization experimentally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring has changed a fair bit now, do you better performance compared to the implementation in the paper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually its very hard to compare with the paper. The paper reports the results different from what I get when I run their code. And their implementation is quite different from what they propose in the paper :-).
But this implementation does better than the simple LSTM or Naive baseline.
seq_len: No. of LSTM cells | ||
adj: unweighted/weighted adjacency matrix of [no.of nodes by no. of nodes dimension | ||
gc_layers: No. of Graph Convolution layers in the stack. The output of each layer is equal to sequence length. | ||
lstm_layer_size (list of int): Output sizes of LSTM layers in the stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typoe
lstm_layer_size (list of int): Output sizes of LSTM layers in the stack. | |
lstm_layer_sizes (list of int): Output sizes of LSTM layers in the stack. |
stellargraph/layer/gcn_lstm.py
Outdated
@@ -47,6 +38,7 @@ class FixedAdjacencyGraphConvolution(Layer): | |||
|
|||
Args: | |||
units (int): dimensionality of output feature vectors | |||
A (N x N): weighted/unweighted adjacency matrix | |||
activation (str or func): nonlinear activation applied to layer's output to obtain output features | |||
use_bias (bool): toggles an optional bias | |||
final_layer (bool): If False the layer returns output for all nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a final_layer
arg here
final_layer (bool): If False the layer returns output for all nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
# FIXME #849: CI does not have neo4j | ||
# FIXME #907: socialcomputing.asu.edu is down | ||
# FIXME #1303: METR_LA dataset can't be downloaded automatically | ||
# FIXME #818: datasets can't be downloaded | ||
# FIXME #819: out-of-memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has added a whole pile more lines then it needs to: the #849 and #907 fixme's have been resurrected and the #818 and #819 ones have been duplicated
# FIXME #849: CI does not have neo4j | |
# FIXME #907: socialcomputing.asu.edu is down | |
# FIXME #1303: METR_LA dataset can't be downloaded automatically | |
# FIXME #818: datasets can't be downloaded | |
# FIXME #819: out-of-memory | |
# FIXME #1303: METR_LA dataset can't be downloaded automatically |
@@ -849,3 +850,55 @@ def load(self): | |||
) | |||
|
|||
return StellarGraph(nodes=nodes, edges=edges, edge_weight_column="time"), edges | |||
|
|||
|
|||
@experimental(reason="the data isn't downloaded automatically", issues=[9999]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Reminder:
@experimental(reason="the data isn't downloaded automatically", issues=[9999]) | |
@experimental(reason="the data isn't downloaded automatically", issues=[1303]) |
GCN-LSTM is stack of graph convolution and lstm layers used for time series prediction on spatio-temporal data.
We use a 2-layer graph convolution network to leverage the graph structure. The output of the graph convolution network is fed into an LSTM based sequence to sequence model, that is jointly trained on the output of gcn and the historical speeds of a segment.
This work is inspired by the paper: T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction.
The authors have made available the implementation of their model in their github repo.
There has been a few differences in the architecture proposed in the paper and the implementation with regards to the graph convolution component, such as:
-lehaifeng/T-GCN#18
-lehaifeng/T-GCN#14
Architecture:
We use the 2-layer graph convolution proposed by Kipf and Welling (ICLR2017).
The output of gcn feeds in an LSTM layer that combines gcn output with the time-series history to learn a combined spatio-temporal end to end model.
A dense and dropout layer is further added as it helps improve the model performance. Note, these final two layers are not part of the architecture proposed in the TGCN paper.
Reviewer Checklist