-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed is imbalanced and slow #195
Comments
Your mapsize is a terabyte, which is probably too big for your system. It won't matter though, since your file isn't that big. Your iterable [k for k, v in txn.cursor()] is receiving the entire value for each key and then dropping it. Much faster would be |
yeah, this is slow, however txn.get(k) is slow too. It seems random get method is much slower then iteration method |
That's expected. Iteration is going to always be faster, because the keys are going to be next together on disk and in memory, so your operating system's and storage system's read-ahead is going to load the next key while you're doing other stuff. This is particularly true for spinning disks, where random access is 100x slow than serial access. LMDB is going to be slow if either you have more data than RAM, or you have spinning disk storage. |
Yes, that's exactly what I find. Thank you for your reply. |
This pull request inhances the speed of the cache creation for lsun dataset. For the "kitchen_train" the speed was getting slow with cache creation taking more then two hours. This speeds up to cache creation in within minutes. The issue was pulling the large image values each time and dropping them. For more details on this please refer this issue jnwatson/py-lmdb#195.
* Only pull keys from db in lsun for faster cache. This pull request inhances the speed of the cache creation for lsun dataset. For the "kitchen_train" the speed was getting slow with cache creation taking more then two hours. This speeds up to cache creation in within minutes. The issue was pulling the large image values each time and dropping them. For more details on this please refer this issue jnwatson/py-lmdb#195. * Fixed bug in lsun.py when loading multiple categories * Make linter happy
* Only pull keys from db in lsun for faster cache. This pull request inhances the speed of the cache creation for lsun dataset. For the "kitchen_train" the speed was getting slow with cache creation taking more then two hours. This speeds up to cache creation in within minutes. The issue was pulling the large image values each time and dropping them. For more details on this please refer this issue jnwatson/py-lmdb#195. * Fixed bug in lsun.py when loading multiple categories * Make linter happy
Hi, I use lmdb library to train my deep learning model with pytorch, however, the speed is imbalanced and slow. I think maybe I can do better with you help.
I'm using ubuntu, and the version of lmdb is 0.94 which is installed by pip3.
The output of free -m is:
total used free shared buffers cached
Mem: 15938 6131 159 756 9647 8711
Swap: 2047 0 2047
My lmdb datasets is about 150GB. The key is uuid with some additional info, and the value is undecoded jpeg image in bytes. I'm reading my lmdb datasets in random (random read).
Here's my script in python, I think it's easy to read even if you don't know anything about pytorch:
Pytorch will launch 16 works to use this class to random read images.
Can you help me out?
The text was updated successfully, but these errors were encountered: