Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph Networks support #931

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

timokau
Copy link
Contributor

@timokau timokau commented Jun 9, 2019

I'm implementing a deepq agent using graph neural networks as a Q-network.

That means that observations are graphs and (at least in my case) the edges represent possible actions. The graph_net transforms an input graph into an output graph, which has the q-values in the edges. The interesting thing about this is that observation-space and action-space sizes are variable.

These are the minimal changes I had to make in order to support graph_nets. They are not ready to be merged, they currently break existing functionality.

Is there interest in having this upstreamed? If so, I'd appreciate some comments how best to do that. The main challenges were

  • assumption that num_actions is a constant (replaced by taking the number of columns of the q-value matrix instead)
  • assumption that the observations are numpy arrays (why does the replay buffer convert to numpy arrays?)

Edit: For reference, this is how I use graph_nets.

This is destructive and will break support for regular q-networks.
@yangysc
Copy link

yangysc commented Aug 13, 2019

@timokau Hi, I find your work is interesting. But why do you choose dqn, instead of ppo? Is ppo better?

@timokau
Copy link
Contributor Author

timokau commented Aug 13, 2019

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.

The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

@yangysc
Copy link

yangysc commented Aug 14, 2019

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.

The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

I use PPO a lot, and its performance is better than DQN. Since there are many graph networks, and their output is node embedding, I'm not sure how to combine them with DRL framework, where each edge is an action. May I ask that, do the DeepMind's graph_net and DQN work well in your experiment? My task needs an LSTM policy, so DQN is not suitable anymore. I'm looking forward to combining ppo with GCN now. But the output of GCN usually is a node embedding matrix, instead of information stored in the edges. It's a little troublesome.

@timokau
Copy link
Contributor Author

timokau commented Aug 14, 2019

Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option.
The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either.

I use PPO a lot, and its performance is better than DQN. Since there are many graph networks, and their output is node embedding, I'm not sure how to combine them with DRL framework, where each edge is an action.

There may be a misunderstanding here. DeepMind's graph networks do not produce graph embeddings. Instead, they map an attributed input graph to and attributed output graph of the same shape (but with different attributes). You can then interpret the edge attributes (or any attributes really) as Q-values (or something representing the policy directly, I'm not too familiar with policy-gradient methods).

May I ask that, do the DeepMind's graph_net and DQN work well in your experiment? My task needs an LSTM policy, so DQN is not suitable anymore. I'm looking forward to combining ppo with GCN now. But the output of GCN usually is a node embedding matrix, instead of information stored in the edges. It's a little troublesome.

It didn't achieve the performance I was hoping for, but it did successfully learn. This paper used a similar architecture and apparently had success as well.

@yangysc
Copy link

yangysc commented Aug 14, 2019

There may be a misunderstanding here. DeepMind's graph networks do not produce graph embeddings. Instead, they map an attributed input graph to and attributed output graph of the same shape (but with different attributes). You can then interpret the edge attributes (or any attributes really) as Q-values (or something representing the policy directly, I'm not too familiar with policy-gradient methods).

Yeah, sorry for that I didn't make it clear. I know that DeepMind's graph network is different from current popular ones, like GCN or GAT model, because DeepMind's graph network outputs a new graph, instead node embeddings. Combining DeepMind's graph network with current DRL framework is harder, since DRL framework expects feature matrix as input.

@timokau
Copy link
Contributor Author

timokau commented Aug 14, 2019

You just need to write a function that takes an attributed graph and produces a tensor of action values. I do that here: https://github.com/timokau/wsn-embedding-rl/blob/41d02ac6f27eec005da90cfd1fd699a2127d4704/q_network.py#L169

@yangysc
Copy link

yangysc commented Aug 14, 2019

You just need to write a function that takes an attributed graph and produces a tensor of action values. I do that here: https://github.com/timokau/wsn-embedding-rl/blob/41d02ac6f27eec005da90cfd1fd699a2127d4704/q_network.py#L169

Thanks, I'll check this, and try to combine it with rllib. Hope you do not mind, if I trouble you tomorrow if I misunderstand something.

@timokau
Copy link
Contributor Author

timokau commented Aug 14, 2019

Sure. For what its worth, after a failed experiment with rl-coach I went with modifying baselines instead of using any library. The lack of abstractions in baselines makes it incredibly easy to modify. Maybe you'll have more luck with rllib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants