Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Theoretical bound for the Connector env #238

Open
ashok-arora opened this issue Apr 18, 2024 · 5 comments
Open

[question] Theoretical bound for the Connector env #238

ashok-arora opened this issue Apr 18, 2024 · 5 comments

Comments

@ashok-arora
Copy link

Hey,
Is there a theoretical bound on the Connector env that the solution will always exist (even if suboptimal) for the agents such that there's no overlapping paths?

@clement-bonnet
Copy link
Collaborator

Hi @ashok-arora,
For now, Connector comes with two implemented generators: UniformRandomGenerator and RandomWalkGenerator. The former just randomly samples pairs of points (starts and targets) and does not guarantee solvability (that there exists a solution). On the contrary, the latter guarantees a solution by constructing a random walk, with the downside of being slower and generating potentially easier instances.
Using the RandomWalkGenerator, there is no known optimal return but you can use the ratio_connections metric which indicates an agent's performance between 0 and 1 (optimal policy).
Hope this helps!

@ashok-arora
Copy link
Author

Hi @clement-bonnet,
Thank you for the quick reply. I found the generators here but wasn't able to find the ratio_connections metric, is it in a seperate place?
Also, I was wondering if it makes sense to penalise the agent for the episode if the solvability is not guaranteed?

@clement-bonnet
Copy link
Collaborator

ratio_connections is returned as an extras metric inside the timestep object which is the output of the reset or step function. The ratio is computed as part of the environment dynamics in here.

Regarding the reward, you may implement the reward you wish here. The dense reward formulation which is already implemented gives a small penalty per timestep, encouraging fast wiring. Indeed, when the instance is not solvable, the penalty will be given at each timestep until the horizon is reached. Although I don't think this is a problem, it does make more sense to combine this reward with a solvable generator. I would recommend using the solvable generator (RandomWalkGenerator). Hope this answers your question.

@ashok-arora
Copy link
Author

Thank you so much for the response Clement. Lastly, was the connector env introduced in the jumanji paper or is there any precursor to it?

@sash-a
Copy link
Collaborator

sash-a commented Apr 23, 2024

Hey @ashok-arora I implemented this a while ago, but I'm not aware of any previous environment that this is based on. It was just meant to be a very simple PCB routing env

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants