-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made AutoDiffCostFunction._tmp_optim_vars copies of original #155
Conversation
I don't have much context regarding the effect of this change, but thanks for fixing! |
@@ -135,7 +135,7 @@ def _register_vars_in_list(var_list_, is_optim=False): | |||
|
|||
# The following are auxiliary Variable objects to hold tensor data | |||
# during jacobian computation without modifying the original Variable objects | |||
self._tmp_optim_vars = tuple(Variable(data=v.data) for v in optim_vars) | |||
self._tmp_optim_vars = tuple(v.copy() for v in optim_vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I dug again into Variable.copy
to find that data.clone()
is called. Want to make sure that the intent here is to have _tmp_optim_vars
connected to the compute graph.
More generally, is this okay or should we actually be doing data.detach().clone()
in all our copy methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, I don't think we should be doing detach().clone()
in our copy methods, but maybe we should add a flag that controls for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can provide a separate detach() function? If it makes sense we can flag an issue for this for now.
When you call Objective.clone() to get a new objective, is it still connected to the previous compute graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on offline conversation we might revisit the detach idea later since it could be tricky to setup, what should be just cloned vs also detached when calling everything recursively from Objective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on where you call it. If you do something like the code below, I'm almost sure that the resulting graph would be connected.
objective = create_objective()
optimizer = SomeOptimizer(objective)
layer = TheseusLayer(objective)
input_dict = compute_some_initial_vars()
sol_dict = layer.forward(input_dict)
new_objective = objective.copy()
new_optimizer = SomeOptimizer(new_objective)
layer2 = TheseusLayer(new_optimizer)
sol = layer2.forward() # no input here, keep whatever variables you cloned
loss = do_some_stuff(sol)
loss.backward()
This preserves optimization variables types and can solve errors like this.
Besides running unit tests, I also ran the
backward_modes.py
script and the first tutorial. Seems to be working w/o issues after this change.cc @exhaustin