Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy decomposition raises error on mask check #2605

Closed
thomasaarholt opened this issue Dec 22, 2020 · 10 comments · Fixed by #2657
Closed

Lazy decomposition raises error on mask check #2605

thomasaarholt opened this issue Dec 22, 2020 · 10 comments · Fixed by #2657

Comments

@thomasaarholt
Copy link
Contributor

Noticed by @erh3cq

if navigation_mask or signal_mask:

The above line should read if navigation_mask.any() or signal_mask.any(). There might be more places with a similar error as well. Not sure if navigation_mask is None would raise the same error or not.

Note that @erh3cq might have a more general error with lazy decomposition, worth taking a look at (see link above).

@erh3cq
Copy link
Contributor

erh3cq commented Dec 23, 2020

@thomasaarholt
My hyperspy/hyperspy/_signals/lazy.py file does not have the same decomposition() function. I rebased to release_next_minor yesterday. Is this from a different branch? It doesn't contain
For example:

       if algorithm != "svd" and output_dimension is None:
            raise ValueError("With the %s the output_dimension "
                             "must be specified" % algorithm)
        if output_dimension and blocksize / output_dimension < num_chunks:
            num_chunks = np.ceil(blocksize / output_dimension)
        blocksize *= num_chunks
        # LEARN
        if algorithm == 'PCA':
            from sklearn.decomposition import IncrementalPCA
            obj = IncrementalPCA(n_components=output_dimension)
            method = partial(obj.partial_fit, **kwargs)
            reproject = True

        elif algorithm == 'ORPCA':
            from hyperspy.learn.rpca import ORPCA
            kwg = {'fast': True}
            kwg.update(kwargs)
            obj = ORPCA(output_dimension, **kwg)
            method = partial(obj.fit, iterating=True)

        elif algorithm == 'ONMF':
            from hyperspy.learn.onmf import ONMF
            batch_size = kwargs.pop('batch_size', None)
            obj = ONMF(output_dimension, **kwargs)
            method = partial(obj.fit, batch_size=batch_size)
        elif algorithm != "svd":
            raise ValueError('algorithm not known')

@ericpre
Copy link
Member

ericpre commented Dec 23, 2020

@thomasaarholt's link is pointing to the current RELEASE_next_minor branch.

@thomasaarholt
Copy link
Contributor Author

@erh3cq any chance you didn't git fetch upstream before rebasing? I suggest you double check by installing the latest master via pip and then reporting back :)

@erh3cq
Copy link
Contributor

erh3cq commented Dec 24, 2020

@thomasaarholt I think the git desktop automatically fetches before rebase to origin (not upstream). My repo is showing updates that are one day old. I will give it another whirl then report back.

@ericpre
Copy link
Member

ericpre commented Dec 27, 2020

Is there a traceback or a minimum example to reproduce?

@erh3cq
Copy link
Contributor

erh3cq commented Dec 28, 2020

@thomasaarholt I didn't reinstall with PIP but I did a forced reboot of my repo to upstream.

C:\Users\Owner\Documents\GitHub\hyperspy>git checkout RELEASE_next_minor
Already on 'RELEASE_next_minor'
Your branch is up to date with 'origin/RELEASE_next_minor'.

C:\Users\Owner\Documents\GitHub\hyperspy>git pull upstream RELEASE_next_minor
From https://github.com/hyperspy/hyperspy
 * branch                RELEASE_next_minor -> FETCH_HEAD
Already up to date.

C:\Users\Owner\Documents\GitHub\hyperspy>git reset --hard upstream/RELEASE_next_minor
HEAD is now at 400250823 Merge pull request #2488 from sempicor/RELEASE_next_minor

C:\Users\Owner\Documents\GitHub\hyperspy>git push origin RELEASE_next_minor --force
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/erh3cq/hyperspy.git
 + e98543815...400250823 RELEASE_next_minor -> RELEASE_next_minor (forced update)

I checked and my erh3cq/RELEASE_next_minor (origin) matches the one in hyperspy/hyperspy (upstream) but not the one you mentioned above. It looks like that is from a different branch than RELEASE_next_minor. Are we supposed to be using a different base branch now?

@ericpre I don't have a MWE but I pasted the Traceback in gitter last week. Recopied below:
In:

SI = hs.load(file+'.emd', select_type='spectrum_image', sum_frames=False, sum_EDS_detectors=True, SI_dtype=float, lazy=True)
SI.decomposition(algorithm='svd')

Out:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-0b692a368f27> in <module>
----> 1 SI.decomposition(algorithm='svd')

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\hyperspy\_signals\eds_tem.py in decomposition(self, normalize_poissonian_noise, navigation_mask, closing, *args, **kwargs)
    527         if isinstance(navigation_mask, float):
    528             navigation_mask = self.vacuum_mask(navigation_mask, closing).data
--> 529         super().decomposition(
    530             normalize_poissonian_noise=normalize_poissonian_noise,
    531             navigation_mask=navigation_mask, *args, **kwargs)

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\hyperspy\_signals\lazy.py in decomposition(self, normalize_poissonian_noise, algorithm, output_dimension, signal_mask, navigation_mask, get, num_chunks, reproject, bounds, **kwargs)
    784                     self._unfolded4decomposition = self.unfold()
    785                     # TODO: implement masking
--> 786                     if navigation_mask or signal_mask:
    787                         raise NotImplemented(
    788                             "Masking is not yet implemented for lazy SVD."

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@ericpre
Copy link
Member

ericpre commented Dec 28, 2020

In the mean time, you can work around this issue by using navigation_mask=None in the decomposition call.

@erh3cq
Copy link
Contributor

erh3cq commented Dec 29, 2020

@ericpre That fixed the mask error but raised the following

C:\Users\Owner\Programs\Anaconda\envs\hyperspy\lib\site-packages\dask\array\core.py:3901: PerformanceWarning: Increasing number of chunks by factor of 24
  result = blockwise(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-77a8fb05694c> in <module>
----> 1 SI.decomposition(algorithm='svd', navigation_mask=None)

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\hyperspy\_signals\eds_tem.py in decomposition(self, normalize_poissonian_noise, navigation_mask, closing, *args, **kwargs)
    527         if isinstance(navigation_mask, float):
    528             navigation_mask = self.vacuum_mask(navigation_mask, closing).data
--> 529         super().decomposition(
    530             normalize_poissonian_noise=normalize_poissonian_noise,
    531             navigation_mask=navigation_mask, *args, **kwargs)

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\hyperspy\_signals\lazy.py in decomposition(self, normalize_poissonian_noise, algorithm, output_dimension, signal_mask, navigation_mask, get, num_chunks, reproject, bounds, **kwargs)
    788                             "Masking is not yet implemented for lazy SVD."
    789                         )
--> 790                     U, S, V = svd(self.data)
    791                     factors = V.T
    792                     explained_variance = S ** 2 / self.data.shape[0]

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\dask\array\linalg.py in svd(a)
    805     dask.array.linalg.tsqr: Implementation for tall-and-skinny arrays
    806     """
--> 807     return tsqr(a, compute_svd=True)
    808 
    809 

~\Programs\Anaconda\envs\hyperspy\lib\site-packages\dask\array\linalg.py in tsqr(data, compute_svd, _max_vchunk_size)
    108 
    109     if not (data.ndim == 2 and nc == 1):  # Is a matrix  # Only one column block
--> 110         raise ValueError(
    111             "Input must have the following properties:\n"
    112             "  1. Have two dimensions\n"

ValueError: Input must have the following properties:
  1. Have two dimensions
  2. Have only one column of blocks

Note: This function (tsqr) supports QR decomposition in the case of
tall-and-skinny matrices (single column chunk/block; see qr)Current shape: (2710016, 4096),
Current chunksize: (308736, 64)

@erh3cq
Copy link
Contributor

erh3cq commented Dec 29, 2020

"Unchunking" the spectral axis as suggested by @pquinn-dls solves the last error.

@francisco-dlp
Copy link
Member

francisco-dlp commented Mar 9, 2021

Fixed in #2657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants