Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

LMDB example failed on Power8 #517

Open
hma02 opened this issue May 5, 2017 · 7 comments
Open

LMDB example failed on Power8 #517

hma02 opened this issue May 5, 2017 · 7 comments

Comments

@hma02
Copy link

hma02 commented May 5, 2017

Hi,

I have successfully built caffe2 on Power8 minsky. When I run the lmdb example in caffe2/python/example, I get the following error. Any suggestions where might be wrong? Thanks in advance.

$ python -u lmdb_create_example.py --output_file ./example_db
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1746794
>>> Read database...
Traceback (most recent call last):
  File "lmdb_create_example.py", line 108, in <module>
    main()
  File "lmdb_create_example.py", line 104, in main
    read_db_with_caffe2(args.output_file, checksum)
  File "lmdb_create_example.py", line 76, in read_db_with_caffe2
    workspace.RunNetOnce(model.param_init_net)
  File "/mnt/data/hma02/caffe2/usr/local/caffe2/python/workspace.py", line 161, in RunNetOnce
    return C.run_net_once(StringifyProto(net))
RuntimeError: [enforce fail at db.h:174] db_. Cannot open db: ./example_db of type lmdb Error from operator: 
output: "dbreader_./example_db" name: "" type: "CreateDB" arg { name: "db_type" s: "lmdb" } arg { name: "db" s: "./example_db" }
@lukeyeager
Copy link
Contributor

Try giving it an absolute path. e.g. $(readlink -f ./example_db)

@hma02
Copy link
Author

hma02 commented May 6, 2017

@lukeyeager
Thanks. but the error persists with absolute path.

$ python -u lmdb_create_example.py --output_file $(readlink -f ./example_db)
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1743005
>>> Read database...
Traceback (most recent call last):
  File "lmdb_create_example.py", line 108, in <module>
    main()
  File "lmdb_create_example.py", line 104, in main
    read_db_with_caffe2(args.output_file, checksum)
  File "lmdb_create_example.py", line 76, in read_db_with_caffe2
    workspace.RunNetOnce(model.param_init_net)
  File "/mnt/data/hma02/caffe2/usr/local/caffe2/python/workspace.py", line 161, in RunNetOnce
    return C.run_net_once(StringifyProto(net))
RuntimeError: [enforce fail at db.h:174] db_. Cannot open db: /mnt/data/hma02/caffe2/tmp/practice/lmdb_caffe2/example_db of type lmdb Error from operator: 
output: "dbreader_/mnt/data/hma02/caffe2/tmp/practice/lmdb_caffe2/example_db" name: "" type: "CreateDB" arg { name: "db_type" s: "lmdb" } arg { name: "db" s: "/mnt/data/hma02/caffe2/tmp/practice/lmdb_caffe2/example_db" }

@hyc
Copy link

hyc commented May 15, 2017

What kind of filesystem is /mnt/data? At a guess you're using NFS or some other remote filesystem; this is explicitly not supportable in LMDB.

@hma02
Copy link
Author

hma02 commented May 15, 2017

@hyc
Thanks for looking at this issue. The /mnt/data is a network file system. Then I tried on /scratch which is xfs and it seems still not working as shown below. I wonder what kind of file system should I choose then.Thanks.

$ df -Th
Filesystem                      Type      Size  Used Avail Use% Mounted on
udev                            devtmpfs  243G     0  243G   0% /dev
tmpfs                           tmpfs      52G   19M   52G   1% /run
/dev/sda4                       ext4      321G   13G  293G   5% /
tmpfs                           tmpfs     256G     0  256G   0% /dev/shm
tmpfs                           tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                           tmpfs     256G     0  256G   0% /sys/fs/cgroup
/dev/sdb1                       xfs       447G  249G  198G  56% /scratch
/dev/sda2                       ext4      923M   99M  761M  12% /boot
nas1.mlrg.soe.uoguelph.ca:/home nfs4       23T  8.1T   15T  36% /export/mlrg
tmpfs                           tmpfs      52G     0   52G   0% /run/user/65983
nas1.mlrg.soe.uoguelph.ca:/data nfs4       44T   29T   15T  67% /mnt/data
tmpfs                           tmpfs      52G   64K   52G   1% /run/user/65716

$ python -c 'import caffe2;print caffe2.__file__'
/usr/local/caffe2/__init__.pyc

$ python -u lmdb_create_example.py --output_file /scratch/example_db
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1745234
>>> Read database...
Traceback (most recent call last):
  File "lmdb_create_example.py", line 108, in <module>
    main()
  File "lmdb_create_example.py", line 104, in main
    read_db_with_caffe2(args.output_file, checksum)
  File "lmdb_create_example.py", line 76, in read_db_with_caffe2
    workspace.RunNetOnce(model.param_init_net)
  File "/usr/local/caffe2/python/workspace.py", line 161, in RunNetOnce
    return C.run_net_once(StringifyProto(net))
RuntimeError: [enforce fail at db.h:174] db_. Cannot open db: /scratch/example_db of type lmdb Error from operator: 
output: "dbreader_/scratch/example_db" name: "" type: "CreateDB" arg { name: "db_type" s: "lmdb" } arg { name: "db" s: "/scratch/example_db" }

@hyc
Copy link

hyc commented May 15, 2017

Sorry, no ideas now. Your output isn't showing the specific error code returned from the underlying system so no way to know why it's failing.

@hma02
Copy link
Author

hma02 commented May 19, 2017

Tried building caffe2 on x86_64 ubuntu 16 and this example works there.

$ python -u lmdb_create_example.py --output_file $(readlink -f ./example_db)
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1744241
>>> Read database...
Checksum/read: 1744241

@deepali-c
Copy link

I tried this on a Power8 minsky and it showed the following error:

python -u lmdb_create_example.py --output_file ~/example_db
>>> Write database...
Traceback (most recent call last):
  File "lmdb_create_example.py", line 108, in <module>
    main()
  File "lmdb_create_example.py", line 101, in main
    checksum = create_db(args.output_file)
  File "lmdb_create_example.py", line 47, in create_db
    img_tensor.float_data.extend(flatten_img)
  File "/usr/lib/python2.7/dist-packages/google/protobuf/internal/containers.py", line 123, in extend
    if not elem_seq:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So, I updated code to use:

-            img_tensor.float_data.extend(flatten_img)
+            img_tensor.float_data.extend(flatten_img.flat)

It worked then:

 python -u lmdb_create_example.py --output_file ~/example_db
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1746043
>>> Read database...
Checksum/read: 1746043

The target file system is xfs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants