Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

TorchModuleGraph traced_model parameter need to be traced in a specific way #2574

Closed
zheng-ningxin opened this issue Jun 17, 2020 · 0 comments
Labels

Comments

@zheng-ningxin
Copy link
Contributor

zheng-ningxin commented Jun 17, 2020

Environment:

  • NNI version: master(latest)
  • NNI mode (local|remote|pai): local
  • Server OS (for remote mode only): Linux
  • Python version: 3.7
  • PyTorch/TensorFlow version: 1.4.0
  • Is conda/virtualenv/venv used?: yes
  • Is running in Docker?: No

I met an interesting problem of TorchModuleGraph. In the latest version of NNI, TorchModuleGraph has two sets of interfaces to build the graph, specifically, users can provide either the model and dummy input to trace the model, or an already traced model so that we don't need to trace the model again. The model and dummy input works fine, however, I found that if we use the traced_model as input for the TorchModuleGraph, we need to trace the model in a specific way, else it may meet some problems.
To reproduce:

class Mymodule(nn.Module):
    def __init__(self):
        super(Mymodule, self).__init__()
        self.c1=nn.Conv2d(4,20,3,groups=2)
        self.c2=nn.Conv2d(20,20,2,groups=2)
        self.c3=nn.Conv2d(20,20,3,groups=10)
    def forward(self,data):
        out=self.c1(data)
        out=self.c2(out)
        out=self.c3(out)
        return out

net = Mymodule().cuda()
data = torch.rand(1, 4, 224, 244).cuda()

When we trace the model in the following way(the same way with TrochModuleGraph to trace the model using model and dummy input) and pass the traced_model to TorchModuleGraph, it works fine.

with torch.onnx.set_training(net, False):
    trace = torch.jit.trace(net, data)
    torch._C._jit_pass_inline(trace.graph)
    _graph = TorchModuleGraph(net, data, trace)

In contrast, when we trace the model in a second way(as shown in the following code), it will raise an error in the code of tensorboard.

net.eval()
trace = torch.jit.trace(net, data)
torch._C._jit_pass_inline(trace.graph)
_graph = TorchModuleGraph(net, data, trace)
Output: 
Traceback (most recent call last):
  File "test_torchmodule_graph.py", line 32, in <module>
    _graph = TorchModuleGraph(net, data, trace)
  File "/home/core/znx/nni/build/nni/_graph_utils.py", line 238, in __init__
    self.name_to_node, self.input_to_node, self.output_to_node = self._build_graph()
  File "/home/core/znx/nni/build/nni/_graph_utils.py", line 525, in _build_graph
    node_cpps, input_to_node, output_to_node, 'module')
  File "/home/core/znx/nni/build/nni/_graph_utils.py", line 364, in _expand_module_node
    node_group, inputs=inputs, outputs=outputs)
  File "/home/core/znx/nni/build/nni/_graph_utils.py", line 210, in __init__
    self.add_nodes(node_cpps)
  File "/home/core/znx/nni/build/nni/_graph_utils.py", line 216, in add_nodes
    nodepy = NodePyOP(node_cpp)
  File "/home/core/anaconda3/envs/nnidoc/lib/python3.7/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 92, in __init__
    self.attributes = str({k: node_cpp[k] for k in node_cpp.attributeNames()}).replace("'", ' ')
  File "/home/core/anaconda3/envs/nnidoc/lib/python3.7/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 92, in <dictcomp>
    self.attributes = str({k: node_cpp[k] for k in node_cpp.attributeNames()}).replace("'", ' ')
TypeError: 'torch._C.Node' object is not subscriptable

More interestingly, when I trace the model in the second way(torch.onnx.set_training), but import torchvision this time, then everything works fine again.

import torchvision
net.eval()
trace = torch.jit.trace(net, data)
torch._C._jit_pass_inline(trace.graph)
_graph = TorchModuleGraph(net, data, trace)

I'll keep updating if I find something new.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants