Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT 2000 Dataset/Memory Use Issues #9

Open
Hamcastle opened this issue Mar 19, 2019 · 2 comments
Open

CAT 2000 Dataset/Memory Use Issues #9

Hamcastle opened this issue Mar 19, 2019 · 2 comments

Comments

@Hamcastle
Copy link

Hi Matthias,

Nice work -- this package's great.

Having an issue with the code for the CAT 2000 training data set/the package's memory footprint.

I've created some saliency maps for the CAT 2000 training data outside pysaliency itself.

I can load the stimuli and fixations for the set from a local copy using the "pysaliency.get_cat2000_train()" command.

The saliency maps are organized into the same folder structure as the source data, with each saliency map contained in its category specific folder.

If I try to load them from the folder containing these category specific sub-folders using the "pysaliency.SaliencyMapModelFromDirectory" command, I get the error in the attached.

Looks like this comes from the fact that this function doesn't work recursively. I can get around this by using the "pysaliency.SaliencyMapModelFromFiles" function with a list of paths to each salience map.

The trouble is that this seems to use quite a bit of memory.

My computer has 32gb of ram and a 32gb swap partition. Calculating the Judd-AUC score using the "SaliencyMapModelFromDirectory" function for the MIT 1003 dataset consumes about 8.1gb. I note also that even after the score is calculated, the additional memory used is not released without restarting the ipython kernel.

If I try to calculate the same score for the CAT 2000 training set using the "SaliencyMapModelFromFiles" function, it fills both the ram and the swap partition completely, causing the ipython kernel to die.

Could you make a recommendation on how to work with this dataset in a slightly more memory efficient way? Do you have a sense of what might be responsible for the memory use issue otherwise?

In case you think this might be a system/python environment specific issue, here are (I think all of) the relevant specs:

OS: Ubuntu 16.04 LTS
Python environment:

  • Anaconda
  • Python 3.5.5
  • Numpy 1.14.5
  • imageio 2.3.0
  • boltons 18.0.0
  • scipy 1.1.0
  • pysaliency built and installed from source using the version hosted here (looks like I cloned the repo on September 24, 2018, so the state of the codebase is at commit: 6d9c394

from_directory_error

Thanks again!

@matthias-k
Copy link
Owner

matthias-k commented Mar 19, 2019

Hi Dylan,

thanks for reporting this bug to me! In October in 4d014c4 I implemented nested directories for HDF5 models but apparently I forgot to do so for directory based models. I'll fix this over the next days. By the way, pysaliency changed quite a bit since September 2018, so it might be worth updating :).

Regarding your memory issue: pysaliency uses a caching mechanism for keeping saliency maps in memory in order to avoid having to recompute them all the time. I admit that for file based models that might usually be unnecessary and I should change the default in those cases. You can always disable the caching using caching=False as keyword argument for the model constructor as in SaliencyMapModelFromFiles(stimuli, filenames, caching=False).

@Hamcastle
Copy link
Author

Matthias,

Thanks for the very speedy reply. Setting the caching flag to False solved the issue. Will do on the update! Feel free to mark closed, unless you want to wait for whatever changes you end up making :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants