Skip to content
This repository has been archived by the owner on Nov 26, 2023. It is now read-only.

KShape bug #61

Closed
pr4deepr opened this issue Jul 21, 2021 · 11 comments
Closed

KShape bug #61

pr4deepr opened this issue Jul 21, 2021 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@pr4deepr
Copy link
Contributor

Describe the bug
I am using k-Shape Clustering to determine if I can find clusters in my traces based on the shapes of the peak. I get the following error when I click Start.

D:\Anaconda3\envs\mesmerize\lib\site-packages\sklearn\utils\deprecation.py:143: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\kshape_process.py", line 16, in <module>
    from mesmerize.common.configuration import HAS_TSLEARN, get_sys_config
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\__init__.py", line 1, in <module>
    from .analysis import *
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\__init__.py", line 3, in <module>
    from .math import cross_correlation, drfft_dtw, tvregdiff
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\math\drfft_dtw.py", line 17, in <module>
    raw_curve: np.ndarray = None, rf_curve: np.ndarray = None) -> list:
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\managers.py", line 513, in start
    self._process.start()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

===== 2021.07.21 10:07:11 =====

I followed your video to perform Peak detection and got to the point where I got peak features. I can successfully view different peak parameters using the BeeSwarm Plot. When I go back to the flowchart and connect K-shape to the Peak Detect node, and try the clustering with default options I get the error above. This happens for any peak parameter I select in the data column.

image

On a related note for K-shape, when I try _pf_peak_curve in the data column, I get this error:

Traceback (most recent call last):
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\common\qdialogs.py", line 52, in fn
    return func(self, *args, **kwargs)
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\widget.py", line 713, in start_process
    padded = self.pad_input_data(self.input_arrays, method='fill-size')
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\widget.py", line 664, in pad_input_data
    s = c.size
AttributeError: 'float' object has no attribute 'size'

Operating System & specs (CPU, RAM etc.). Please complete the following information:

  • Windows 10, 32 GB RAM, AMD Ryzen 9 5900X 12-Core

Details about your Mesmerize install

  • Anaconda navigator
  • Python 3.6
  • I had a lot of trouble with Windows installation esp with pandas version conflicts and caiman taking too long to install, and eventually settled on this:
    • Use this first: conda create -n mesmerize -c conda-forge caiman pandas~=0.25.3 python=3.6
    • Follow the rest of instructions on mesmerize doc website, except for caiman and pandas installation...

Thanks for developing and supporting Mesmerize.

Cheers
Pradeep

@pr4deepr pr4deepr added the bug Something isn't working label Jul 21, 2021
@kushalkolar
Copy link
Owner

@pr4deepr thanks for providing details!

The freeze issue has to do with the peculiarity of windows w.r.t. forking processes, I have to figure out which module(s) to protect with a if __name__ == '__main__' to solve this, if you know what I'm talking about you can try protecting mesmerize\analysis\math\drfft_dtw.pywith anif name == __main__ . I'm currently travelling so I could try in a few days.

I suspect that your other issue ('float' object has no attribute 'size') is because some of your peak curves are NaNs for some reason. If you open the peak editor GUI make sure that each peak is flanked by two bases on either side.
The DropNa node will let you drop NaNs:
http://docs.mesmerizelab.org/en/master/user_guides/flowchart/nodes.html#dropna

I think that if you choose set axis as _pf_peak_curve and how as any it should remove the NaNs from the _pf_peak_curve data. Play around with the settings because I don't remember exactly.

For the installation, yea pandas, h5py, numpy and tensorflow are in a bit of a compatibility mess at the moment w.r.t. verisons.

mamba might help reduce the installation time, see:
#53 (comment)

@pr4deepr
Copy link
Contributor Author

Thanks heaps for the feedback.

  • I may try using module protection as you mentioned..

  • The dropna node is really useful, but I'll make sure my peaks are flanked by bases.. Yes, you are right there are NaNs..

  • I'll try the pf_peak_curve tip you mentioned. Hoping that helps.

  • Will give mamba a go. With the command I mentioned it installs reasonably quickly, but perhaps mamba is the long term solution!!

Happy Travels. Glad you can atleast travel, don't think we can do that anytime soon.. !! 😷

@pr4deepr
Copy link
Contributor Author

pr4deepr commented Jul 22, 2021

So, I went to
mesmerize\analysis\ init.py
and added this part

if __name__ == '__main__':
    from .math import drfft_dtw

It seemed to fix the issue.

**
One of my peaks didn't have a base. Is there an easy way to detect which cell/trace has it? In the console, I Saw this line:

\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\compute_peak_features.py:106: UserWarning: Peak at curve index: <0> is not flanked by bases on both sides, ignoring
  warn(f"Peak at curve index: <{p_ix}> is not flanked by bases on both sides, ignoring")

The dropna node suggestion worked well.
**

@kushalkolar
Copy link
Owner

kushalkolar commented Jul 22, 2021 via email

@pr4deepr
Copy link
Contributor Author

pr4deepr commented Jul 22, 2021

Actually, scrap that previous comment, that doesn't make sense..

@pr4deepr
Copy link
Contributor Author

pr4deepr commented Jul 22, 2021

So, Is this good practice?
Wrap the drfft_dtw code in a main() function and then add this at the end of the module??

if __name__ == '__main__':
    main()

@kushalkolar
Copy link
Owner

I just took a look at that module and I think it'd be better placed in the mesmerize_manuscripts_repo than within mesmerize since it's doesn't exactly fit anywhere with mesmerize itself. I'll reorganize it in a few days and maybe make a new release, but for now you should be able to just safely remove that module's import from mesmerize.analysis.__init__

@pr4deepr
Copy link
Contributor Author

Thanks for that.
Also, is there updated documentation or video available for KShapes with the new options?
Thanks for all the tutorial videos you posted, they were extremely helpful! If it helps, I attended your I2K tutorial last year which is how I got onto using MESmerize..

@kushalkolar
Copy link
Owner

@pr4deepr
I haven't update the kshape docs with the gridsearch options. Here's an overview from an internal email:

I added a gridsearch feature to the kshape clustering GUI in mesmerize. It allows you to select a partition range, npart rng, & a combination number, ncombs.

npart rng is a range of partition values for the “search space”.
In each iteration of the gridsearch, it will sort the data (by either peak width or amplitude, see the sortby param), and randomly select cluster seeds from each partition.

ncombs is the number of random cluster seed combinations to try for each partition value.

Note that the ncombs is 10^, so the default value of 2 will do 100 combinations.

Unlike single kShape, the gridsearch is multithreaded so it will simultaneously perform npartitions * ncombs kshape-iterations, as per the number of threads that you’ve set in your system configuration.

When it’s done you’ll get a heatmap like this (but bigger). n_clusters (i.e. npartitions) are along the y-axis (labels are color coded), and seeded combinations are along the x-axis. The heatmap visualizes the inertia from each k-Shape model. The inertia is within cluster sum of squares, so the smaller the inertia value the tighter the clusters are. You can click on the squares in the heatmap to visualize the model in the rest of the kshape GUI.

Note: If you close the heatmap you’ll have to call this.kga_inertia_heatmap.show() in the console to get it back, haven’t made a button yet.

You shouldn't necessarily pick the model with the lowest inertia, but it will help narrow down on a suitable model. Some models with very low inertia might have empty cluster(s) which skews the inertia value and these models should be avoided.
=> I think that current state of the heatmap is that it will indicate models that have empty clusters with a specific color so you can easily avoid them (I think it makes them white?).

You could google "gridsearch parameters" to learn about what a gridsearch is.

Glad to hear you found the tutorials and I2K workshop helpful! :)

@pr4deepr
Copy link
Contributor Author

Thanks for this detailed comment.
I'll be testing it out and will post here or in gitter chat if anything is unclear!!

@kushalkolar
Copy link
Owner

Closing due to inactivity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants