Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gcn lstm #1085

Merged
merged 49 commits into from
Apr 21, 2020
Merged

Feature/gcn lstm #1085

merged 49 commits into from
Apr 21, 2020

Conversation

habiba-h
Copy link
Contributor

GCN-LSTM is stack of graph convolution and lstm layers used for time series prediction on spatio-temporal data.
We use a 2-layer graph convolution network to leverage the graph structure. The output of the graph convolution network is fed into an LSTM based sequence to sequence model, that is jointly trained on the output of gcn and the historical speeds of a segment.

This work is inspired by the paper: T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction.

The authors have made available the implementation of their model in their github repo.

There has been a few differences in the architecture proposed in the paper and the implementation with regards to the graph convolution component, such as:

-lehaifeng/T-GCN#18

-lehaifeng/T-GCN#14

Architecture:

  • We use the 2-layer graph convolution proposed by Kipf and Welling (ICLR2017).

  • The output of gcn feeds in an LSTM layer that combines gcn output with the time-series history to learn a combined spatio-temporal end to end model.

  • A dense and dropout layer is further added as it helps improve the model performance. Note, these final two layers are not part of the architecture proposed in the TGCN paper.

Reviewer Checklist

  • The code for gcn-lstm class is available.
  • The gcn-lstm class implements the architecture explained above.
  • Demo notebook runs and clearly demonstrates how to use GCN-LSTM for spatio-temporal data.

@habiba-h habiba-h added this to the Sprint 26 (20 Mar) milestone Mar 17, 2020
@habiba-h habiba-h self-assigned this Mar 17, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.


import numpy as np
import pandas as pd
import pickle as pkl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider possible security implications associated with pickle module.

@codeclimate
Copy link

codeclimate bot commented Mar 17, 2020

Code Climate has analyzed commit 92a2c65 and detected 1 issue on this pull request.

Here's the issue category breakdown:

Category Count
Security 1

View more on Code Climate.

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few questions about the approach and data used here, as well as a couple of minor things about the Graph_Convolution_LSTM class.

import pickle as pkl

def load_sz_data(dataset):
sz_adj = pd.read_csv(r'data/sz_adj.csv',header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the sz data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the alternative dataset of trajectories of taxis. I had it as alternative but not using it. I will remove it.

return sz_tf, adj

def load_los_data(dataset):
los_adj = pd.read_csv(r'data/los_adj.csv',header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the los data? It would be best if we could add it to stellargraph/datasets/datasets.py instead of having an extra script for the demo. If it ends up being too awkward to add it there, feel free to open an issue for follow up work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be easily added to the stellargraph/datasets/datasets.py.
There are a number of other methods that I think of as data related such as formatting the data in a time series to be fed into a time series forecasting model. So having it all in one utility file where anyone interested in just doing spatio-temporal stuff can look them, instead of looking them up in a centralized utility file. This was just for convenience. I can merge it in the stellargraph/datasets/datasets.py.

from .preprocessing_layer import GraphPreProcessingLayer


class GraphConvolution(Layer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this differs to the existing gcn.GraphConvolution layer in four ways:

  1. the adjacency matrix is a constant
  2. it doesn't squeeze out the batch dimension
  3. it does some transposes in the matrix multiplications
  4. it doesn't have special handling for final_layer=True

Is this accurate? Could you expand on why 1 and 3 in particular are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this differs to the existing gcn.GraphConvolution layer in four ways:

  1. the adjacency matrix is a constant

Yes, it is. There are two pragmatic reasons for that but neither are binding.

  1. The temporal data means structure is fixed but features are changing. So its from performance perspective better to just load it at initialization. However, we can make it more general for dynamic graphs and also to make it more aligned with the already existing graph_convolution layer.
  2. The graph convolution happens for each timestep of each observation. For example, for a sequence of 10 timesteps, gcn is performed for each one of them with the same static graph but different time (feature), so seems like a big overhead of passing the adjacency matrix in each build.
    I am sure there is a more optimal way of doing this. This was just meant to produce a working version.
  1. it doesn't squeeze out the batch dimension

Yes and this is by design. Keras require the batch dimension whereas GCN doesn't so even though a unit batch dimension is passed it gets squeezed inside the layer. However, in the gcn_lstm, the data is passed as a 3D tensor [batch, sequence, nodes], don't need to squeeze the batch dimension.
Maybe we can add the not squeezing as an option to the existing graph_convolution layer but didn't want to update it as a first pass until I have the actual dimensions I need to gcn_lstm figured out. This can be merged in the more polished version.

  1. it does some transposes in the matrix multiplications

Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.

  1. it doesn't have special handling for final_layer=True

Since here graph_convolution is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.

Is this accurate? Could you expand on why 1 and 3 in particular are needed?

See above. Any thoughts on these design choices are really appreciated!

Copy link
Member

@huonw huonw Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. There are two pragmatic reasons for that but neither are binding.

Makes sense; seems fine to start with it as written for now 👍

However, in the gcn_lstm, the data is passed as a 3D tensor [batch, sequence, nodes]

And batch > 1 here?

Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.

Ah, right, our other GCN layers have input shape [batch, nodes, features], makes sense!

Since here graph_convolution is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.

The other GCN layers are usually stacked into another dense layer to compute predictions (or similar), so they're not the "final layer" of the (whole) model either. The final_layer handling is so that the full-batch layer only yields predictions for the nodes that were passed to the .flow method, rather than all nodes.

Given that, this code currently seems to just yield predictions for all nodes. I think this means it's not possible to train on only a subset of labelled nodes? (For instance, do a train/test split across the nodes, rather than the times?) I guess this isn't too important for these sort of datasets, because it's time that is interesting and so it's assumed we'll always have information for every node?

@@ -36,3 +36,4 @@
from .rgcn import *
from .watch_your_step import *
from .knowledge_graph import *
from .gcn_lstm import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both gcn and gcn_lstm now include a class called GraphConvolution, which means these * imports will shadow each other as the stellargraph.layer.GraphConvolution name. I think in particular the gcn_lstm one will "win" so existing code importing stellargraph.layer.GraphConvolution will switch from using GCN one to using GCN-LSTM one. They're not interchangeable so that code will break.

One way to fix this would be to rename the gcn_lstm GraphConvolution to something like FixedAdjacencyGraphConvolution or even just LSTMGraphConvolution so that it no longer overlaps with the gcn one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it causing conflicts. I'll fix the name.
However, why would the gcn_lstm's GraphConvolution would win in the import?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's last, and so overwrites the earlier one. I guess one can think of it like:

GraphConvolution = gcn.GraphConvolution

...

GraphConvolution = gcn_lstm.GraphConvolution

return output


class Graph_Convolution_LSTM(Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other "models" aren't actually formal models, meaning they don't inherit from tf.keras.Model. I think it would be good to stick to that.

Also, Python style is to not have underscores in class names, e.g.

Suggested change
class Graph_Convolution_LSTM(Model):
class GraphConvolutionLSTM(Model):

but potentially it's best to break the "style" slightly and call it:

Suggested change
class Graph_Convolution_LSTM(Model):
class GCN_LSTM(Model):

self.dense = Dense(self.outputs, activation=self.activations[2])
self.dropout = Dropout(self.dropout)

def call(self, inputs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing the comment about the name, our "model" classes usually have two methods:

def __call__(self, inputs): # like this `call`, but note the double-underscore, to make it work like `some_model(inputs)`
    ...

def build(self): # builds appropriate input tensors and then applies the model
    inputs = ...
    outputs = self(inputs)
    return inputs, outputs

kieranricardo
kieranricardo previously approved these changes Mar 20, 2020
Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for implementing this! aside from @huonw's feedback I have a comment about sequence_data_preparation and integrating this with the library.

Also re. the storing the adjacency in the GraphConvolution, I guess this makes gcn lstm non-inductive? I'm not familiar with spatial temporal modelling but is this important? Are there cases when you'd want to train on one graph and test on another? Or even cases where the adjacency matrix changes with time?

test_scaled = (test_data - min_speed)/ (max_speed - min_speed)
return train_scaled, test_scaled

def sequence_data_preparation(seq_len, pre_len, train_data, test_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think seems like something that should be in the stellargraph/mappers rather than a demo script. It looks like sequence_data_preparation is essential for using gcn lstm so we should include this in the library.

Also, I think this could refactored be refactored into a stellargraph generator and either a keras Sequence or a tf Dataset. If I'm understanding this right, say if you have N nodes, T time steps, and sequence length L, this function takes in a NxT array and returns a NxTxL array. This could get quite big! with a Sequence or Dataset object you could generate batches of N x batch_times x L on the fly to avoid having to construct the NxTxL array. What do you think?

@kieranricardo kieranricardo dismissed their stale review March 20, 2020 02:17

didn't mean to approve here, sorry

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plots in the notebooks look really good 👍

@@ -36,3 +36,4 @@
from .rgcn import *
from .watch_your_step import *
from .knowledge_graph import *
from .gcn_lstm import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's last, and so overwrites the earlier one. I guess one can think of it like:

GraphConvolution = gcn.GraphConvolution

...

GraphConvolution = gcn_lstm.GraphConvolution

from .preprocessing_layer import GraphPreProcessingLayer


class GraphConvolution(Layer):
Copy link
Member

@huonw huonw Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. There are two pragmatic reasons for that but neither are binding.

Makes sense; seems fine to start with it as written for now 👍

However, in the gcn_lstm, the data is passed as a 3D tensor [batch, sequence, nodes]

And batch > 1 here?

Yes, due to the dimension. Typically, you have nodes times features dimensions. Here we have batch times features times nodes dimension. Since graph convolution happens on the feature dimension, so i transpose it before convolving on the feature dimension and then flip it back.

Ah, right, our other GCN layers have input shape [batch, nodes, features], makes sense!

Since here graph_convolution is stacked over the RNN layer, i.e. it feeds into the RNN so its never a final_layer, so its never true. I can pass it as false when I define the architecture, so this can easily be handled with the graph_convolution layer we already have. I am just keeping it false without any special handling. I'll add comment to explain this in the code.

The other GCN layers are usually stacked into another dense layer to compute predictions (or similar), so they're not the "final layer" of the (whole) model either. The final_layer handling is so that the full-batch layer only yields predictions for the nodes that were passed to the .flow method, rather than all nodes.

Given that, this code currently seems to just yield predictions for all nodes. I think this means it's not possible to train on only a subset of labelled nodes? (For instance, do a train/test split across the nodes, rather than the times?) I guess this isn't too important for these sort of datasets, because it's time that is interesting and so it's assumed we'll always have information for every node?

@huonw huonw mentioned this pull request Apr 21, 2020
4 tasks
@@ -849,3 +850,55 @@ def load(self):
)

return StellarGraph(nodes=nodes, edges=edges, edge_weight_column="time"), edges


@experimental(reason="the data isn't downloaded automatically", issues=[9999])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 9999 in my suggestion here and in test-demo-notebooks.sh was a placeholder that needs to be replaced by a real issue number. At the moment, this is creating a link like https://github.com/stellargraph/stellargraph/issues/9999, which doesn't currently exist. Could you open an issue about this and then replace the 9999 here and in test-demo-notebooks.sh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Ok! I'll fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Reminder:

Suggested change
@experimental(reason="the data isn't downloaded automatically", issues=[9999])
@experimental(reason="the data isn't downloaded automatically", issues=[1303])

kieranricardo
kieranricardo previously approved these changes Apr 21, 2020
Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! just a few docstring edits needed I think

@@ -212,41 +195,41 @@ def call(self, features):
class GraphConvolutionLSTM:

"""
A stack of 2 Graph Convolutional layers followed by an LSTM, Dropout and, Dense layer.
A stack of N1 Graph Convolutional layers followed by N2 LSTM, Dropout and, Dense layer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might read a bit clearer:

Suggested change
A stack of N1 Graph Convolutional layers followed by N2 LSTM, Dropout and, Dense layer.
A stack of N1 Graph Convolutional layers followed by N2 LSTM layers, a Dropout layer, and a Dense layer.

2. 1 LSTM layer
The StellarGraph implementation is built as a stack of the following set of layers:
1. User specified no. of Graph Convolutional layers
2. User specified no. of LSTM layers
3. 1 Dense layer
4. 1 Dropout layer
The last two layers consistently showed better performance and regularization experimentally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring has changed a fair bit now, do you better performance compared to the implementation in the paper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually its very hard to compare with the paper. The paper reports the results different from what I get when I run their code. And their implementation is quite different from what they propose in the paper :-).
But this implementation does better than the simple LSTM or Naive baseline.

seq_len: No. of LSTM cells
adj: unweighted/weighted adjacency matrix of [no.of nodes by no. of nodes dimension
gc_layers: No. of Graph Convolution layers in the stack. The output of each layer is equal to sequence length.
lstm_layer_size (list of int): Output sizes of LSTM layers in the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typoe

Suggested change
lstm_layer_size (list of int): Output sizes of LSTM layers in the stack.
lstm_layer_sizes (list of int): Output sizes of LSTM layers in the stack.

@@ -47,6 +38,7 @@ class FixedAdjacencyGraphConvolution(Layer):

Args:
units (int): dimensionality of output feature vectors
A (N x N): weighted/unweighted adjacency matrix
activation (str or func): nonlinear activation applied to layer's output to obtain output features
use_bias (bool): toggles an optional bias
final_layer (bool): If False the layer returns output for all nodes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a final_layer arg here

Suggested change
final_layer (bool): If False the layer returns output for all nodes,

@kieranricardo kieranricardo dismissed their stale review April 21, 2020 06:54

ah hit the wrong button again!

Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Comment on lines 41 to 45
# FIXME #849: CI does not have neo4j
# FIXME #907: socialcomputing.asu.edu is down
# FIXME #1303: METR_LA dataset can't be downloaded automatically
# FIXME #818: datasets can't be downloaded
# FIXME #819: out-of-memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has added a whole pile more lines then it needs to: the #849 and #907 fixme's have been resurrected and the #818 and #819 ones have been duplicated

Suggested change
# FIXME #849: CI does not have neo4j
# FIXME #907: socialcomputing.asu.edu is down
# FIXME #1303: METR_LA dataset can't be downloaded automatically
# FIXME #818: datasets can't be downloaded
# FIXME #819: out-of-memory
# FIXME #1303: METR_LA dataset can't be downloaded automatically

@@ -849,3 +850,55 @@ def load(self):
)

return StellarGraph(nodes=nodes, edges=edges, edge_weight_column="time"), edges


@experimental(reason="the data isn't downloaded automatically", issues=[9999])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Reminder:

Suggested change
@experimental(reason="the data isn't downloaded automatically", issues=[9999])
@experimental(reason="the data isn't downloaded automatically", issues=[1303])

@habiba-h habiba-h merged commit ee993bb into develop Apr 21, 2020
@habiba-h habiba-h deleted the feature/gcn-lstm branch April 21, 2020 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants