Graph Networks support #931

timokau · 2019-06-09T17:21:41Z

I'm implementing a deepq agent using graph neural networks as a Q-network.

That means that observations are graphs and (at least in my case) the edges represent possible actions. The graph_net transforms an input graph into an output graph, which has the q-values in the edges. The interesting thing about this is that observation-space and action-space sizes are variable.

These are the minimal changes I had to make in order to support graph_nets. They are not ready to be merged, they currently break existing functionality.

Is there interest in having this upstreamed? If so, I'd appreciate some comments how best to do that. The main challenges were

assumption that num_actions is a constant (replaced by taking the number of columns of the q-value matrix instead)
assumption that the observations are numpy arrays (why does the replay buffer convert to numpy arrays?)

Edit: For reference, this is how I use graph_nets.

This is destructive and will break support for regular q-networks.

yangysc · 2019-08-13T03:10:29Z

@timokau Hi, I find your work is interesting. But why do you choose dqn, instead of ppo? Is ppo better?

timokau · 2019-08-13T15:20:58Z

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.

The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

yangysc · 2019-08-14T01:20:27Z

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.

The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

I use PPO a lot, and its performance is better than DQN. Since there are many graph networks, and their output is node embedding, I'm not sure how to combine them with DRL framework, where each edge is an action. May I ask that, do the DeepMind's graph_net and DQN work well in your experiment? My task needs an LSTM policy, so DQN is not suitable anymore. I'm looking forward to combining ppo with GCN now. But the output of GCN usually is a node embedding matrix, instead of information stored in the edges. It's a little troublesome.

timokau · 2019-08-14T11:58:56Z

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.
The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

I use PPO a lot, and its performance is better than DQN. Since there are many graph networks, and their output is node embedding, I'm not sure how to combine them with DRL framework, where each edge is an action.

There may be a misunderstanding here. DeepMind's graph networks do not produce graph embeddings. Instead, they map an attributed input graph to and attributed output graph of the same shape (but with different attributes). You can then interpret the edge attributes (or any attributes really) as Q-values (or something representing the policy directly, I'm not too familiar with policy-gradient methods).

May I ask that, do the DeepMind's graph_net and DQN work well in your experiment? My task needs an LSTM policy, so DQN is not suitable anymore. I'm looking forward to combining ppo with GCN now. But the output of GCN usually is a node embedding matrix, instead of information stored in the edges. It's a little troublesome.

It didn't achieve the performance I was hoping for, but it did successfully learn. This paper used a similar architecture and apparently had success as well.

yangysc · 2019-08-14T12:22:55Z

There may be a misunderstanding here. DeepMind's graph networks do not produce graph embeddings. Instead, they map an attributed input graph to and attributed output graph of the same shape (but with different attributes). You can then interpret the edge attributes (or any attributes really) as Q-values (or something representing the policy directly, I'm not too familiar with policy-gradient methods).

Yeah, sorry for that I didn't make it clear. I know that DeepMind's graph network is different from current popular ones, like GCN or GAT model, because DeepMind's graph network outputs a new graph, instead node embeddings. Combining DeepMind's graph network with current DRL framework is harder, since DRL framework expects feature matrix as input.

timokau · 2019-08-14T12:27:30Z

You just need to write a function that takes an attributed graph and produces a tensor of action values. I do that here: https://github.com/timokau/wsn-embedding-rl/blob/41d02ac6f27eec005da90cfd1fd699a2127d4704/q_network.py#L169

yangysc · 2019-08-14T13:24:01Z

You just need to write a function that takes an attributed graph and produces a tensor of action values. I do that here: https://github.com/timokau/wsn-embedding-rl/blob/41d02ac6f27eec005da90cfd1fd699a2127d4704/q_network.py#L169

Thanks, I'll check this, and try to combine it with rllib. Hope you do not mind, if I trouble you tomorrow if I misunderstand something.

timokau · 2019-08-14T14:04:47Z

Sure. For what its worth, after a failed experiment with rl-coach I went with modifying baselines instead of using any library. The lack of abstractions in baselines makes it incredibly easy to modify. Maybe you'll have more luck with rllib.

Implement support for graph_nets q-networks

b45c99c

This is destructive and will break support for regular q-networks.

timokau mentioned this pull request Jun 9, 2019

Using coach with graph_nets IntelLabs/coach#331

Closed

Make deepq.py usable with graph networks

c80ac03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Networks support #931

Graph Networks support #931

timokau commented Jun 9, 2019 •

edited

Loading

yangysc commented Aug 13, 2019

timokau commented Aug 13, 2019

yangysc commented Aug 14, 2019 •

edited

Loading

timokau commented Aug 14, 2019 •

edited

Loading

yangysc commented Aug 14, 2019

timokau commented Aug 14, 2019

yangysc commented Aug 14, 2019

timokau commented Aug 14, 2019

Graph Networks support #931

Are you sure you want to change the base?

Graph Networks support #931

Conversation

timokau commented Jun 9, 2019 • edited Loading

yangysc commented Aug 13, 2019

timokau commented Aug 13, 2019

yangysc commented Aug 14, 2019 • edited Loading

timokau commented Aug 14, 2019 • edited Loading

yangysc commented Aug 14, 2019

timokau commented Aug 14, 2019

yangysc commented Aug 14, 2019

timokau commented Aug 14, 2019

timokau commented Jun 9, 2019 •

edited

Loading

yangysc commented Aug 14, 2019 •

edited

Loading

timokau commented Aug 14, 2019 •

edited

Loading