You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the first service running Caffe almost determines the leak by leaving a leak behind. The size of the leak is directly proportional to the size of the training dataset
the leak appears to be independent from the input connector (witnessed for both CSV and text)
leak grow slowly with every new service, proportionally to initial size. This suggests that a structure somewhere is being incremented, with initial size set by the first run service.
First set of investigations appears to rule out the Caffe's net destruction, and first valgrind pass does not reveal much yet.
The text was updated successfully, but these errors were encountered:
in GPU-only mode, valgrind reports large chunks held by CUDA (libcuda.so), this is also visible while running with nvidia-smi.
Typical used memory values on a 4GB GPU during a run:
before start: 152MB
training start: 198MB
training: 589MB
training finished, net is up for prediction: 436MB
service destruction: 200MB
server termination: 152MB
Caffe is allocating some GPU data and not releasing it or there's a way to clear the GPU memory overhead after using it within a net with Caffe.
There is a weird leak behavior with Caffe:
First set of investigations appears to rule out the Caffe's net destruction, and first valgrind pass does not reveal much yet.
The text was updated successfully, but these errors were encountered: