You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use an existing network definition and weights from the model zoo as the backbone for a new network. In this specific example the architecture will be squeezenet, and the new network simply has a different shape for the top parameterized layers ['conv10_w', 'conv10_b'], to accommodate a different set of classes from Imagenet.
Unfortunately, it is not clear from the documentation, tutorials, or examples how to achieve this (to me). Some OS notes: I have built caffe2+OpenCV from source with the current master, into a python2.7.12 virtualenv, cuda 9.0, cuDNN 7.0.
I get the following error message:
Traceback (most recent call last):
File "dogsvscats.py", line 184, in
shtyp = workspace.InferShapesAndTypes([train_model.net])
File "/home/john/Code/pytorch/build/caffe2/python/workspace.py", line 258, in InferShapesAndTypes
blobdesc_prototxt = C.infer_shapes_and_types_from_workspace(net_protos)
MemoryError: std::bad_alloc
which isn't very helpful (especially since cross referencing against caffe2 docs doesn't yield anything).
When I comment out the offending line and try to continue to training I recieve a seg fault that I have narrowed down to coming from line 204, workspace.RunNet(train_model.net). lldb returns the following stack trace:
Per the pytorch/caffe2 Readme I am asking here.
I would like to use an existing network definition and weights from the model zoo as the backbone for a new network. In this specific example the architecture will be squeezenet, and the new network simply has a different shape for the top parameterized layers ['conv10_w', 'conv10_b'], to accommodate a different set of classes from Imagenet.
Unfortunately, it is not clear from the documentation, tutorials, or examples how to achieve this (to me). Some OS notes: I have built caffe2+OpenCV from source with the current master, into a python2.7.12 virtualenv, cuda 9.0, cuDNN 7.0.
I wrote a script ( based on https://nbviewer.jupyter.org/gist/kyamagu/6cff70840c10ca374e069a3a7eb00cb4/dogs-vs-cats.ipynb )
that I think should do this: https://gist.github.com/johncorring/d735675e75add96fbdfbcc40fa00f3ba
I get the following error message:
Traceback (most recent call last):
File "dogsvscats.py", line 184, in
shtyp = workspace.InferShapesAndTypes([train_model.net])
File "/home/john/Code/pytorch/build/caffe2/python/workspace.py", line 258, in InferShapesAndTypes
blobdesc_prototxt = C.infer_shapes_and_types_from_workspace(net_protos)
MemoryError: std::bad_alloc
which isn't very helpful (especially since cross referencing against caffe2 docs doesn't yield anything).
When I comment out the offending line and try to continue to training I recieve a seg fault that I have narrowed down to coming from line 204, workspace.RunNet(train_model.net). lldb returns the following stack trace:
thread #1: tid = 9130, 0x00007fffaa112240 libcaffe2.so`void caffe2::math::CopyMatrix<float, caffe2::CPUContext>(int, int, float const*, int, int, float*, int, int, caffe2::CPUContext*) + 208, name = 'python', stop reason = signal SIGSEGV: address access protected (fault address: 0xb15400000)
void caffe2::math::CopyMatrix<float, caffe2::CPUContext>(int, int, float const*, int, int, float*, int, int, caffe2::CPUContext*) + 208 frame #1: 0x00007fffaa11392f libcaffe2.so
void caffe2::math::Im2Col<float, caffe2::CPUContext, (caffe2::StorageOrder)2>(int, int, int, int, int, int, int, int, int, int, int, int, int, float const*, float*, caffe2::CPUContext*, int) + 1087frame Don't support legacy Python #2: 0x00007fffaa3f52b1 libcaffe2.so
caffe2::ConvOp<float, caffe2::CPUContext>::RunOnDeviceWithOrderNCHW()::{lambda(caffe2::Tensor*)#1}::operator()(caffe2::Tensor*) const + 1169 frame #3: 0x00007fffaa3f77f8 libcaffe2.so
caffe2::ConvOp<float, caffe2::CPUContext>::RunOnDeviceWithOrderNCHW() + 2712frame PEP8 #4: 0x00007fffaa1c93ed libcaffe2.so
caffe2::ConvPoolOpBase<caffe2::CPUContext>::RunOnDevice() + 301 frame #5: 0x00007fffa9fb52e5 libcaffe2.so
caffe2::Operatorcaffe2::CPUContext::Run(int) + 229frame Remove dampening from SGD #6: 0x00007fffaa09275c libcaffe2.so
caffe2::SimpleNet::Run() + 460 frame #7: 0x00007fffaa0aeb8a libcaffe2.so
caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 954frame fake commit #8: 0x00007fffab11a277 caffe2_pybind11_state_gpu.so
void pybind11::cpp_function::initialize<caffe2::python::addGlobalMethods(pybind11::module&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool)#21}, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, pybind11::name, pybind11::scope, pybind11::sibling>(caffe2::python::addGlobalMethods(pybind11::module&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool)#21}&&, bool (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 311 frame #9: 0x00007fffab160220 caffe2_pybind11_state_gpu.so
pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3552frame Tensors don't print sometimes #10: 0x00000000004c30ce python
PyEval_EvalFrameEx + 29342 frame #11: 0x00000000004b9ab6 python
PyEval_EvalCodeEx + 774frame Initial utils implementation + bug fixes #12: 0x00000000004c1e6f python
PyEval_EvalFrameEx + 24639 frame #13: 0x00000000004b9ab6 python
PyEval_EvalCodeEx + 774frame Clean up Module forward and __call__ #14: 0x00000000004c16e7 python
PyEval_EvalFrameEx + 22711 frame #15: 0x00000000004b9ab6 python
PyEval_EvalCodeEx + 774frame Error on legacy.nn serialization #16: 0x00000000004eb30f python
??? + 63 frame #17: 0x00000000004e5422 python
PyRun_FileExFlags + 130frame OS X build issue in THP_decodeInt64Buffer #18: 0x00000000004e3cd6 python
PyRun_SimpleFileExFlags + 390 frame #19: 0x0000000000493ae2 python
Py_Main + 1554frame Figure out and fix Tensor(Storage) constructor #20: 0x00007ffff7810830 libc.so.6
__libc_start_main(main=(python
main), argc=2, argv=0x00007fffffffda18, init=, fini=, rtld_fini=, stack_end=0x00007fffffffda08) + 240 at libc-start.c:291frame import torch works in ipython but not in python (_THRefcountedMapAllocator) #21: 0x00000000004933e9 python`_start + 41
The text was updated successfully, but these errors were encountered: