Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

TICA appears 100x slower (on large dataset) in 2.5.1 vs. 2.4 #1284

Closed
rafwiewiora opened this issue Mar 29, 2018 · 13 comments
Closed

TICA appears 100x slower (on large dataset) in 2.5.1 vs. 2.4 #1284

rafwiewiora opened this issue Mar 29, 2018 · 13 comments

Comments

@rafwiewiora
Copy link

rafwiewiora commented Mar 29, 2018

Hi guys,

I'm re-running scoring on my 5 ms dataset and after update to 2.5.1 things look weird - we've gone from 3 hours for TICA covariances calc. to 300 hours:

2.4:

calculate covariances:   0% (  11/2509) [ ] eta 3:30:14 |

2.5.1:

calculate covariances:   0%| | 8/467871 [00:17<319:26:36,  2.46s/it]

(it's not just at the beginning of the calculation somehow, I let it run overnight too and progress was consistent with this ETA)

This is 2509 trajectories, 492,961 frames, 6557 dimensions.

Code:

import pyemma
import glob
import numpy as np

splits_train = np.load('/cbio/jclab/home/rafal.wiewiora/repos/MSM_play/set8_apo_11707_11709_FINAL/feat_choice_scoring_new_scheme_2/splits_train_dist.npy')

split = 0
source = pyemma.coordinates.source(list(np.array(glob.glob('../data_cut_start_noH_stride10_featurized/dist_cross/*/*.npy'))[list(splits_train[split])]))


lag_time = 10
tica_kinetic = pyemma.coordinates.tica(lag=lag_time, kinetic_map=True, var_cutoff=1)
stages = [source, tica_kinetic]
pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)

(also run with data in memory (pyemma.coordinates.load) - same thing).

conda info:

Current conda install:

               platform : linux-64
          conda version : 4.3.34
       conda is private : False
      conda-env version : 4.3.34
    conda-build version : not installed
         python version : 3.5.5.final.0
       requests version : 2.18.3
       root environment : /cbio/jclab/home/rafal.wiewiora/anaconda3  (writable)
    default environment : /cbio/jclab/home/rafal.wiewiora/anaconda3
       envs directories : /cbio/jclab/home/rafal.wiewiora/anaconda3/envs
                          /cbio/jclab/home/rafal.wiewiora/.conda/envs
          package cache : /cbio/jclab/home/rafal.wiewiora/anaconda3/pkgs
                          /cbio/jclab/home/rafal.wiewiora/.conda/pkgs
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/omnia/linux-64
                          https://conda.anaconda.org/omnia/noarch
                          https://conda.anaconda.org/salilab/linux-64
                          https://conda.anaconda.org/salilab/noarch
                          https://repo.continuum.io/pkgs/main/linux-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/linux-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : /cbio/jclab/home/rafal.wiewiora/.condarc
             netrc file : None
           offline mode : False
             user-agent : conda/4.3.34 requests/2.18.3 CPython/3.5.5 Linux/2.6.32-573.18.1.el6.x86_64 CentOS/6.7 glibc/2.12    
                UID:GID : 20077:3008

conda list attached: conda_list.txt

I don't imagine you would have really put out a 100x slower new version, so it's something wrong on my side? I don't have time right now to transfer all this data to a different system to try, so I'm sticking to 2.4 for now and putting this out here for comments.

@rafwiewiora
Copy link
Author

Realized I hadn't tried in a fresh conda environment - actually today is the last day of Torque license on our old cluster where I was reporting the above from and appears like it has just gone off - so I can't try anymore. Transfering the data to our new cluster and will check there on a fresh conda install - will report soon.

@marscher
Copy link
Member

marscher commented Mar 30, 2018 via email

@marscher
Copy link
Member

marscher commented Mar 30, 2018 via email

@marscher
Copy link
Member

marscher commented Apr 4, 2018

Actually I'm certain that this is the fault of the automatic chunk size computation, as the output of the progress bar shows how many work pieces are used (2509 vs. 467871). This adds a tremendous Python overhead. So by passing the chunksize parameter != None, you should regain the old runtime speed.

@franknoe
Copy link
Contributor

franknoe commented Apr 4, 2018 via email

@franknoe
Copy link
Contributor

franknoe commented Apr 4, 2018

I think in 2.5.1 we did add Nystroem TICA and as part of that you made a code reorganization, correct? Is that the point when this problem was introduced?

@marscher, can you confirm Rafal's observation on our tutorial notebooks?

@rafwiewiora
Copy link
Author

Indeed setting the chunksize to 0 or big enough solves it, but this still persists if I use the pipeline.

(using here only 1k trajectories rather than 2509 above)

source = pyemma.coordinates.source(trajs[:1000])
tica_kinetic = pyemma.coordinates.tica(source, lag=10, kinetic_map=True, var_cutoff=1, chunksize=0)

gives normal speed:

calculate covariances
0% 3/1000 [00:07<38:58, 2.35s/it]

BUT

source = pyemma.coordinates.source(trajs[:1000])
tica_kinetic = pyemma.coordinates.tica(lag=10, kinetic_map=True, var_cutoff=1, chunksize=0)
stages = [source, tica_kinetic]
pipeline = pyemma.coordinates.pipeline(stages, chunksize=0)

still chunks into single frames:

calculate covariances
0% 7/186618 [00:12<85:52:42, 1.66s/it]

why?

@clonker
Copy link
Member

clonker commented Apr 7, 2018

The chunk size was not passed on correctly when using a pipeline. Also there was a bug in the default chunk size calculation where the number of possible frames was again divided by the dimension, yielding a much smaller chunk size than what would actually have been possible.

The PR should fix this behavior.

@franknoe
Copy link
Contributor

franknoe commented Apr 7, 2018 via email

@clonker
Copy link
Member

clonker commented Apr 9, 2018

When using the BPTI notebook I can confirm that the chunksize before the fix was off:

  • before:

    • chunksize: tica_obj.chunksize = 2216
    • nbytes: tica_obj.chunksize * inp.dimension() * 4 = 1542336 which are about 1.5 megabytes
  • after:

    • chunksize: tica_obj.chunksize = 385683
    • tica_obj.chunksize * inp.dimension() * 4 = 268435368 which are about 270 MB

The 270 megabytes are to be understood in the decimal representation, i.e., 1 MB = 1000 kB = 1000 * 1000 byte, whereas the config is built on the 2^10 representation, i.e., 1 MiB = 2^10 KiB = 1024 * 1024 byte. In that representation it is 255 MiB. Since in the config file it is 256m and not 256MiB as maximal chunk size I think we should take the decimal representation (see here).

The actual runtime is pretty much the same, as the data is comparably low dimensional (6500 dimensions in Rafal's case, 170 in the BPTI case) and in the chunksize calculation there was a division by number of dimensions too much. I have timed the tica estimation and output process with the following results:

  • before the fix: 2.2 s ± 45.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • after the fix: 2.23 s ± 31.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@franknoe
Copy link
Contributor

franknoe commented Apr 9, 2018 via email

@marscher
Copy link
Member

marscher commented Apr 9, 2018 via email

@franknoe
Copy link
Contributor

franknoe commented Apr 9, 2018 via email

marscher added a commit that referenced this issue Apr 10, 2018
- Fixed the calculation of the default chunk size in `iterable.py` and pass the chunk size into estimation when using a pipeline.
- DataInMemory now returns the dtype of the first array as output_type
- output_type now returns an instance of dtype rather than the class definition
- Fixes #1284
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants