Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

issue during CreateNet in MNIST tutorial #1012

Closed
fbadaud opened this issue Aug 3, 2017 · 9 comments
Closed

issue during CreateNet in MNIST tutorial #1012

fbadaud opened this issue Aug 3, 2017 · 9 comments

Comments

@fbadaud
Copy link

fbadaud commented Aug 3, 2017

Hello
I follow https://caffe2.ai/docs/tutorial-MNIST.html on my installation of Caffe2 on Ubuntu 16.04.
the command: workspace.CreateNet(train_model.net) is producing following error:

Traceback for operator 26 in network mnist_train
:11
:8
/home/b2bot/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2881
/home/b2bot/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2821
/home/b2bot/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py:501
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py:196
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py:390
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py:228
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py:276
/home/b2bot/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py:275
/home/b2bot/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py:414
/home/b2bot/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py:472
/home/b2bot/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py:440
/home/b2bot/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py:275
/home/b2bot/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py:887
/home/b2bot/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py:177
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py:474
/home/b2bot/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py:658
/home/b2bot/anaconda2/lib/python2.7/site-packages/ipykernel/main.py:3
/home/b2bot/anaconda2/lib/python2.7/runpy.py:72
/home/b2bot/anaconda2/lib/python2.7/runpy.py:174

RuntimeError Traceback (most recent call last)
in ()
----> 1 workspace.CreateNet(train_model.net)

/home/b2bot/Project/caffe2/caffe2/build/caffe2/python/workspace.pyc in CreateNet(net, overwrite, input_blobs)
145 C.Workspace.current._last_failed_op_net_position,
146 GetNetName(net),
--> 147 StringifyProto(net), overwrite,
148 )
149

/home/b2bot/Project/caffe2/caffe2/build/caffe2/python/workspace.pyc in CallWithExceptionIntercept(func, op_id_fetcher, net_name, *args, **kwargs)
164 def CallWithExceptionIntercept(func, op_id_fetcher, net_name, *args, **kwargs):
165 try:
--> 166 return func(*args, **kwargs)
167 except Exception:
168 op_id = op_id_fetcher()

RuntimeError: [enforce fail at operator.cc:73] schema->Verify(operator_def). Operator def did not pass schema checking: input: "iter" output: "mnist_train/Iter" name: "" type: "Iter"

any idea to debug and fix this?
thanks in advance, Fran

@fbadaud
Copy link
Author

fbadaud commented Aug 4, 2017

This is solved by following instructions from: https://github.com/caffe2/caffe2/blob/master/caffe2/python/tutorials/MNIST.ipynb
was working the line inside the def section of AddTrainingOperators is modified to
ITER = brew.iter(model, "iter")

then the AddTrainingOperators does not report any warning anymore and the creation of the train network is not crashing anymore.
it might be useful to modify the corresponding line in the page: https://caffe2.ai/docs/tutorial-MNIST.html
BR
Fran

@wxie2017
Copy link

wxie2017 commented Aug 6, 2017

Hi, @fbadaud

There is error for in running the tutorial:

RuntimeError: [enforce fail at leveldb.cc:70] status.ok(). Failed to open leveldb /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb. Invalid argument: /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb: does not exist (create_if_missing is false) Error from operator:
output: "dbreader_/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" name: "" type: "CreateDB" arg { name: "db_type" s: "leveldb" } arg { name: "db" s: "/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" }

I have those image files in the folder:
-rwxrwxr-x 1 1000 1000 7840016 Mar 17 05:43 t10k-images-idx3-ubyte
-rwxrwxr-x 1 1000 1000 10008 Mar 17 05:43 t10k-labels-idx1-ubyte
-rwxrwxr-x 1 1000 1000 47040016 Mar 17 05:43 train-images-idx3-ubyte
-rwxrwxr-x 1 1000 1000 60008 Mar 17 05:43 train-labels-idx1-ubyte

Could you please check what the error here is?

best regards,
wxie

@fbadaud
Copy link
Author

fbadaud commented Aug 7, 2017

Hi wxie
I am not connected and able to look in details at your error but you might xcheck that path between your command and the leveldb files are matching.

@wxie2017
Copy link

Hi, @fbadaud

Thanks. I am sure what you want me to check, so I post the complete error:

execfile("mnist-cnn-tutorial.py")
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Necessities imported!
Bookkeeping function created
WARNING:root:You are creating an op that the ModelHelper does not recognize: Iter.
name: "mnist_train_init"
op {
output: "dbreader_/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb"
name: ""
type: "CreateDB"
arg {
name: "db_type"
s: "leveldb"
}
arg {
name: "db"
s: "/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb"
}
}
op {
output: "conv1_w"
name: ""
type: "XavierFill"
arg {
name: "shape"
ints: 20
ints: 1

...
Protocol buffers files have been created in your root folder: /root/work/caffe2-example
Traceback for operator 0 in network mnist_train_init
/usr/local/caffe2/python/model_helper.py:418
mnist-cnn-tutorial.py:69
mnist-cnn-tutorial.py:151
:1
Traceback (most recent call last):
File "", line 1, in
File "mnist-cnn-tutorial.py", line 200, in
workspace.RunNetOnce(train_model.param_init_net)
File "/usr/local/caffe2/python/workspace.py", line 183, in RunNetOnce
StringifyProto(net),
File "/usr/local/caffe2/python/workspace.py", line 175, in CallWithExceptionIntercept
raise ex
RuntimeError: [enforce fail at leveldb.cc:70] status.ok(). Failed to open leveldb /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb. Invalid argument: /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb: does not exist (create_if_missing is false) Error from operator:
output: "dbreader_/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" name: "" type: "CreateDB" arg { name: "db_type" s: "leveldb" } arg { name: "db" s: "/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" }

And also my data directory:
~/work/caffe2-example/mnist-cnn # ls
backup mnist-train-nchw-leveldb t10k-images-idx3-ubyte t10k-labels-idx1-ubyte train-images-idx3-ubyte train-labels-idx1-ubyte
~/work/caffe2-example/mnist-cnn # ls mnist-train-nchw-leveldb/
LOCK LOG LOG.old

Could you please check?

regards,
wxie

@fbadaud
Copy link
Author

fbadaud commented Aug 22, 2017

hi Wxie

in your ~/work/caffe2-example/mnist-cnn/ make a pwd and verify if the path correspond to :
/root/work/caffe2-example/mnist-cnn/

if it is matching, you might looks at the right of your files in this directory

hope it helps
BR
Francois

@wxie2017
Copy link

Hi, @fbadaud

Thanks. I am sure what you want me to check, so I post the complete error:

execfile("mnist-cnn-tutorial.py")
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Necessities imported!
Bookkeeping function created
WARNING:root:You are creating an op that the ModelHelper does not recognize: Iter.
name: "mnist_train_init"
op {
output: "dbreader_/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb"
name: ""
type: "CreateDB"
arg {
name: "db_type"
s: "leveldb"
}
arg {
name: "db"
s: "/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb"
}
}
op {
output: "conv1_w"
name: ""
type: "XavierFill"
arg {
name: "shape"
ints: 20
ints: 1

...
Protocol buffers files have been created in your root folder: /root/work/caffe2-example
Traceback for operator 0 in network mnist_train_init
/usr/local/caffe2/python/model_helper.py:418
mnist-cnn-tutorial.py:69
mnist-cnn-tutorial.py:151
:1
Traceback (most recent call last):
File "", line 1, in
File "mnist-cnn-tutorial.py", line 200, in
workspace.RunNetOnce(train_model.param_init_net)
File "/usr/local/caffe2/python/workspace.py", line 183, in RunNetOnce
StringifyProto(net),
File "/usr/local/caffe2/python/workspace.py", line 175, in CallWithExceptionIntercept
raise ex
RuntimeError: [enforce fail at leveldb.cc:70] status.ok(). Failed to open leveldb /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb. Invalid argument: /root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb: does not exist (create_if_missing is false) Error from operator:
output: "dbreader_/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" name: "" type: "CreateDB" arg { name: "db_type" s: "leveldb" } arg { name: "db" s: "/root/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb" }

And also my data directory:
~/work/caffe2-example/mnist-cnn # ls
backup mnist-train-nchw-leveldb t10k-images-idx3-ubyte t10k-labels-idx1-ubyte train-images-idx3-ubyte train-labels-idx1-ubyte
~/work/caffe2-example/mnist-cnn # ls mnist-train-nchw-leveldb/
LOCK LOG LOG.old

Could you please check?

regards,
wxie

@wxie2017
Copy link

Hi, @fbadaud:

The pwd is ok.
The right is like this:
~/work/caffe2-example/mnist-cnn # pwd
/root/work/caffe2-example/mnist-cnn
~/work/caffe2-example/mnist-cnn # ls -al
total 53678
drwxr-xr-x 4 root root 4096 2017-08-06 10:50 .
drwxr-xr-x 4 root root 4096 2017-08-06 10:57 ..
drwxr-xr-x 3 root root 4096 2017-08-06 10:49 backup
drwxr-xr-x 2 root root 4096 2017-08-19 09:38 mnist-train-nchw-leveldb
-rwxrwxr-x 1 1000 1000 7840016 2017-03-17 05:43 t10k-images-idx3-ubyte
-rwxrwxr-x 1 1000 1000 10008 2017-03-17 05:43 t10k-labels-idx1-ubyte
-rwxrwxr-x 1 1000 1000 47040016 2017-03-17 05:43 train-images-idx3-ubyte
-rwxrwxr-x 1 1000 1000 60008 2017-03-17 05:43 train-labels-idx1-ubyte
~/work/caffe2-example/mnist-cnn # cd mnist-train-nchw-leveldb/
~/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb # ls -al
total 8
drwxr-xr-x 2 root root 4096 2017-08-19 09:38 .
drwxr-xr-x 4 root root 4096 2017-08-06 10:50 ..
-rw-r--r-- 1 root root 0 2017-08-06 10:50 LOCK
-rw-r--r-- 1 root root 0 2017-08-19 09:38 LOG
-rw-r--r-- 1 root root 0 2017-08-06 10:50 LOG.old
~/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb # chmod 755 *
~/work/caffe2-example/mnist-cnn/mnist-train-nchw-leveldb # ls -al
total 8
drwxr-xr-x 2 root root 4096 2017-08-19 09:38 .
drwxr-xr-x 4 root root 4096 2017-08-06 10:50 ..
-rwxr-xr-x 1 root root 0 2017-08-06 10:50 LOCK
-rwxr-xr-x 1 root root 0 2017-08-19 09:38 LOG
-rwxr-xr-x 1 root root 0 2017-08-06 10:50 LOG.old

The problem remains the same, and file right is like:
~/work/caffe2-example # ls -al mnist-cnn/mnist-train-nchw-leveldb/
total 8
drwxr-xr-x 2 root root 4096 2017-08-26 10:36 .
drwxr-xr-x 4 root root 4096 2017-08-06 10:50 ..
-rwxr-xr-x 1 root root 0 2017-08-06 10:50 LOCK
-rw-r--r-- 1 root root 0 2017-08-26 10:36 LOG
-rwxr-xr-x 1 root root 0 2017-08-19 09:38 LOG.old

Where could be the problem?

best regards,
wxie

@fbadaud
Copy link
Author

fbadaud commented Aug 29, 2017

Hi Wxie2017

I run again the trial on my side and got an error similar as you i think now. so if i find a solution i will share.
I also think it is a different error as the first I post, so it is better to close this one and open a new.
BR
Francois

@fbadaud fbadaud closed this as completed Aug 29, 2017
@fbadaud
Copy link
Author

fbadaud commented Aug 30, 2017

hi again Wxie

As I said yesterday I got almost similar error as you during execution of my script for this trial.

This is another error versus the initial ticket I report about ITER and this new one relative to leveldb files. In my case it occurs during the running of the network and it crash due to previous execution of the script and remaining .leveldb files in my root directory. I then rm -r *.leveldb in my root directory and the run is passing again.

on your side the error occurs earlier during the initialization of the network it looks that some leveldb can not be produced and used by the RunNetOnce step. may be you have a write permission issue in your working directory or I suggest you post a new ticket with this problem
Best regards
FBadaud

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants