Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neural-style running multi-thread on one cpu but not on another #124

Closed
1SittingDuck opened this issue Jan 21, 2016 · 3 comments
Closed

Comments

@1SittingDuck
Copy link

Hi JC

Firstly thanks for publishing your work with Neural Algortithm in LUA. I have learned a great deal from following your guides and advice. The speed from CUDNN is amazing for testing variouse parameters, however from having GPUs only with 2gb memory and wanting to have larger than 512 pixel images I have been running neural-style in cpu mode with 1024-1600 pixel images.

On one server with a 3820 CPU and 64gb memory I can start four 1024pixel runs which use approx 15-16gb memory. System load runs around 4 which indicates luajit is only utilizing one core per process. Result are that it takes days to complete an image.

On a second server with a 4820K CPU and 64gb memory I can start 2 1024pixel runs which use approx 30gb memory each luajit process and system load is around 8 indicating it is utilizing all 8 cores. Results are that it only takes a few hours for each image.
Both servers are running the same Ubuntu 14.04 with latest patches and identical torch cunn neural-style etc.

I would really like to get the system with the 3820 CPU to utilize more memory and processors per luajit process. Is there some inherent difference in these two processors which limits the 3820 thay you may be aware of? From Intel specs and benchmarks these two processors should be very close in performance, but my experience has shown quite the contrary when running neural-style and luajit.

Any ideas where I can look to uncover the reasons or perhaps even configure the 3820 system to utilzie the same speed and resources that I'm getting from the 4820K would be greatly appreciated.

@jcjohnson
Copy link
Owner

neural-style depends on torch7 for multithreaded CPU speedup, which in turn depends on a BLAS implementation (usually OpenBLAS). My guess is that one one machine you have torch and BLAS properly configured to use multiple cores, and on the other machine your either your torch install is not properly linked to your BLAS install, or that your BLAS install was configured to use only a single core.

I've never had to debug this myself, but some starting points might be this this GitHub issue concerning torch7 installation and OpenBLAS and the OpenBLAS documentation.

@1SittingDuck
Copy link
Author

Many thanks! you pinpointed the issue correctly. I had installed torch7 on the non-multi-threading server sometime in September 2015.
So I just reinstalled torch7 again on that server and voila! it is mutli-threading.
yippeee!

@jcjohnson
Copy link
Owner

Glad you figured it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants