-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph Networks support #931
base: master
Are you sure you want to change the base?
Conversation
This is destructive and will break support for regular q-networks.
@timokau Hi, I find your work is interesting. But why do you choose dqn, instead of ppo? Is ppo better? |
Basically because DQN is a lot simpler. PPO would be interesting to explore in the future, but I wanted to start with a simple option. The nice thing about policy-gradient methods is their better theoretical convergence guarantees. I think practical results tend to be a bit better too, though I'm not sure. I think policy-gradient is generally on-policy and therefore less sample efficient, but I'm not entirely certain on that either. |
I use PPO a lot, and its performance is better than DQN. Since there are many graph networks, and their output is node embedding, I'm not sure how to combine them with DRL framework, where each edge is an action. May I ask that, do the DeepMind's graph_net and DQN work well in your experiment? My task needs an LSTM policy, so DQN is not suitable anymore. I'm looking forward to combining ppo with GCN now. But the output of GCN usually is a node embedding matrix, instead of information stored in the edges. It's a little troublesome. |
There may be a misunderstanding here. DeepMind's graph networks do not produce graph embeddings. Instead, they map an attributed input graph to and attributed output graph of the same shape (but with different attributes). You can then interpret the edge attributes (or any attributes really) as Q-values (or something representing the policy directly, I'm not too familiar with policy-gradient methods).
It didn't achieve the performance I was hoping for, but it did successfully learn. This paper used a similar architecture and apparently had success as well. |
Yeah, sorry for that I didn't make it clear. I know that DeepMind's graph network is different from current popular ones, like GCN or GAT model, because DeepMind's graph network outputs a new graph, instead node embeddings. Combining DeepMind's graph network with current DRL framework is harder, since DRL framework expects feature matrix as input. |
You just need to write a function that takes an attributed graph and produces a tensor of action values. I do that here: https://github.com/timokau/wsn-embedding-rl/blob/41d02ac6f27eec005da90cfd1fd699a2127d4704/q_network.py#L169 |
Thanks, I'll check this, and try to combine it with rllib. Hope you do not mind, if I trouble you tomorrow if I misunderstand something. |
Sure. For what its worth, after a failed experiment with rl-coach I went with modifying baselines instead of using any library. The lack of abstractions in baselines makes it incredibly easy to modify. Maybe you'll have more luck with rllib. |
I'm implementing a deepq agent using graph neural networks as a Q-network.
That means that observations are graphs and (at least in my case) the edges represent possible actions. The graph_net transforms an input graph into an output graph, which has the q-values in the edges. The interesting thing about this is that observation-space and action-space sizes are variable.
These are the minimal changes I had to make in order to support graph_nets. They are not ready to be merged, they currently break existing functionality.
Is there interest in having this upstreamed? If so, I'd appreciate some comments how best to do that. The main challenges were
num_actions
is a constant (replaced by taking the number of columns of the q-value matrix instead)Edit: For reference, this is how I use graph_nets.