CAT 2000 Dataset/Memory Use Issues #9

Hamcastle · 2019-03-19T16:10:24Z

Hi Matthias,

Nice work -- this package's great.

Having an issue with the code for the CAT 2000 training data set/the package's memory footprint.

I've created some saliency maps for the CAT 2000 training data outside pysaliency itself.

I can load the stimuli and fixations for the set from a local copy using the "pysaliency.get_cat2000_train()" command.

The saliency maps are organized into the same folder structure as the source data, with each saliency map contained in its category specific folder.

If I try to load them from the folder containing these category specific sub-folders using the "pysaliency.SaliencyMapModelFromDirectory" command, I get the error in the attached.

Looks like this comes from the fact that this function doesn't work recursively. I can get around this by using the "pysaliency.SaliencyMapModelFromFiles" function with a list of paths to each salience map.

The trouble is that this seems to use quite a bit of memory.

My computer has 32gb of ram and a 32gb swap partition. Calculating the Judd-AUC score using the "SaliencyMapModelFromDirectory" function for the MIT 1003 dataset consumes about 8.1gb. I note also that even after the score is calculated, the additional memory used is not released without restarting the ipython kernel.

If I try to calculate the same score for the CAT 2000 training set using the "SaliencyMapModelFromFiles" function, it fills both the ram and the swap partition completely, causing the ipython kernel to die.

Could you make a recommendation on how to work with this dataset in a slightly more memory efficient way? Do you have a sense of what might be responsible for the memory use issue otherwise?

In case you think this might be a system/python environment specific issue, here are (I think all of) the relevant specs:

OS: Ubuntu 16.04 LTS
Python environment:

Anaconda
Python 3.5.5
Numpy 1.14.5
imageio 2.3.0
boltons 18.0.0
scipy 1.1.0
pysaliency built and installed from source using the version hosted here (looks like I cloned the repo on September 24, 2018, so the state of the codebase is at commit: 6d9c394

Thanks again!

The text was updated successfully, but these errors were encountered:

matthias-k · 2019-03-19T16:35:13Z

Hi Dylan,

thanks for reporting this bug to me! In October in 4d014c4 I implemented nested directories for HDF5 models but apparently I forgot to do so for directory based models. I'll fix this over the next days. By the way, pysaliency changed quite a bit since September 2018, so it might be worth updating :).

Regarding your memory issue: pysaliency uses a caching mechanism for keeping saliency maps in memory in order to avoid having to recompute them all the time. I admit that for file based models that might usually be unnecessary and I should change the default in those cases. You can always disable the caching using caching=False as keyword argument for the model constructor as in SaliencyMapModelFromFiles(stimuli, filenames, caching=False).

Hamcastle · 2019-03-19T20:18:57Z

Matthias,

Thanks for the very speedy reply. Setting the caching flag to False solved the issue. Will do on the update! Feel free to mark closed, unless you want to wait for whatever changes you end up making :P

Hamcastle mentioned this issue Apr 29, 2019

Fixation Based KL-Divergence Memory Use Issue #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAT 2000 Dataset/Memory Use Issues #9

CAT 2000 Dataset/Memory Use Issues #9

Hamcastle commented Mar 19, 2019

matthias-k commented Mar 19, 2019 •

edited

Loading

Hamcastle commented Mar 19, 2019

CAT 2000 Dataset/Memory Use Issues #9

CAT 2000 Dataset/Memory Use Issues #9

Comments

Hamcastle commented Mar 19, 2019

matthias-k commented Mar 19, 2019 • edited Loading

Hamcastle commented Mar 19, 2019

matthias-k commented Mar 19, 2019 •

edited

Loading