We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When running the caffe-model job with FfDL. The databroker_s3 always having issue to pull one of the file from s3://mnist_lmdb_data/train/data.mdb
Using Object Storage account test at http://s3.default.svc.cluster.local Download start: Mon Feb 12 19:38:10 UTC 2018 Downloading from bucket mnist_lmdb_data to /job/mnist_lmdb_data Completed 256.0 KiB/68.8 MiB (213.3 KiB/s) with 4 file(s) remaining Completed 264.0 KiB/68.8 MiB (217.1 KiB/s) with 4 file(s) remaining download: s3://mnist_lmdb_data/test/lock.mdb to job/mnist_lmdb_data/test/lock.mdb Completed 264.0 KiB/68.8 MiB (217.1 KiB/s) with 3 file(s) remaining Completed 520.0 KiB/68.8 MiB (259.9 KiB/s) with 3 file(s) remaining Completed 776.0 KiB/68.8 MiB (369.5 KiB/s) with 3 file(s) remaining Completed 1.0 MiB/68.8 MiB (469.2 KiB/s) with 3 file(s) remaining Completed 1.3 MiB/68.8 MiB (585.1 KiB/s) with 3 file(s) remaining Completed 1.5 MiB/68.8 MiB (701.2 KiB/s) with 3 file(s) remaining Completed 1.8 MiB/68.8 MiB (817.2 KiB/s) with 3 file(s) remaining Completed 2.0 MiB/68.8 MiB (933.2 KiB/s) with 3 file(s) remaining Completed 2.3 MiB/68.8 MiB (825.7 KiB/s) with 3 file(s) remaining Completed 2.5 MiB/68.8 MiB (916.3 KiB/s) with 3 file(s) remaining Completed 2.8 MiB/68.8 MiB (973.5 KiB/s) with 3 file(s) remaining Completed 3.0 MiB/68.8 MiB (1.0 MiB/s) with 3 file(s) remaining Completed 3.3 MiB/68.8 MiB (1.1 MiB/s) with 3 file(s) remaining Completed 3.5 MiB/68.8 MiB (1.2 MiB/s) with 3 file(s) remaining Completed 3.8 MiB/68.8 MiB (1.3 MiB/s) with 3 file(s) remaining Completed 4.0 MiB/68.8 MiB (1.3 MiB/s) with 3 file(s) remaining Completed 4.3 MiB/68.8 MiB (1.4 MiB/s) with 3 file(s) remaining Completed 4.5 MiB/68.8 MiB (1.5 MiB/s) with 3 file(s) remaining Completed 4.8 MiB/68.8 MiB (1.4 MiB/s) with 3 file(s) remaining Completed 5.0 MiB/68.8 MiB (1.5 MiB/s) with 3 file(s) remaining Completed 5.3 MiB/68.8 MiB (1.6 MiB/s) with 3 file(s) remaining Completed 5.5 MiB/68.8 MiB (1.7 MiB/s) with 3 file(s) remaining Completed 5.6 MiB/68.8 MiB (1.7 MiB/s) with 3 file(s) remaining Completed 5.9 MiB/68.8 MiB (1.6 MiB/s) with 3 file(s) remaining Completed 6.1 MiB/68.8 MiB (1.7 MiB/s) with 3 file(s) remaining Completed 6.4 MiB/68.8 MiB (1.7 MiB/s) with 3 file(s) remaining Completed 6.6 MiB/68.8 MiB (1.8 MiB/s) with 3 file(s) remaining Completed 6.9 MiB/68.8 MiB (1.9 MiB/s) with 3 file(s) remaining Completed 7.1 MiB/68.8 MiB (1.8 MiB/s) with 3 file(s) remaining Completed 7.4 MiB/68.8 MiB (1.9 MiB/s) with 3 file(s) remaining Completed 7.6 MiB/68.8 MiB (2.0 MiB/s) with 3 file(s) remaining Completed 7.9 MiB/68.8 MiB (2.0 MiB/s) with 3 file(s) remaining Completed 8.1 MiB/68.8 MiB (2.1 MiB/s) with 3 file(s) remaining Completed 8.4 MiB/68.8 MiB (2.1 MiB/s) with 3 file(s) remaining Completed 8.6 MiB/68.8 MiB (2.2 MiB/s) with 3 file(s) remaining Completed 8.9 MiB/68.8 MiB (1.9 MiB/s) with 3 file(s) remaining Completed 8.9 MiB/68.8 MiB (1.8 MiB/s) with 3 file(s) remaining download: s3://mnist_lmdb_data/train/lock.mdb to job/mnist_lmdb_data/train/lock.mdb Killed Killed download failed: s3://mnist_lmdb_data/train/data.mdb to job/mnist_lmdb_data/train/data.mdb [Errno 12] Cannot allocate memory
I also tried to increase the job memory and use IBM Cloud Object storage and still have the same issue. So I believe the issue could be
The text was updated successfully, but these errors were encountered:
We may have to increase the loadTrainingDataMemInMB configuration:
loadTrainingDataMemInMB
FfDL/lcm/service/lcm/constants.go
Line 77 in 8ddc3b8
300
Sorry, something went wrong.
@whummer increasing the loadTrainingDataMemInMB did solve this problem, thanks.
Lets make 300 default?
Sure, 300 seems to work for all of our examples.
Merge pull request IBM#8 from sboagibm/disable_metrics
b193515
hard-coded disable push metrics
No branches or pull requests
When running the caffe-model job with FfDL. The databroker_s3 always having issue to pull one of the file from s3://mnist_lmdb_data/train/data.mdb
I also tried to increase the job memory and use IBM Cloud Object storage and still have the same issue. So I believe the issue could be
or
The text was updated successfully, but these errors were encountered: