Improve `GPCCA.optimize()`: Change complex conjugate splitting handling etc #13

msmdev · 2021-01-25T12:43:16Z

Currently it is not possible to optimize for cluster numbers m in [m_min, m_max], if the interval contains cluster numbers m that would lead to splitting of complex conjugate eigenvalues, since this raises a ValueError. This renders GPCCA.optimize() dysfunctional for some typical use cases.
Thus I made changes to adapt this behavior: Now it will only raise, if we are only optimizing for a single cluster number m and if this m would lead to splitting, or, if m_max would cause splitting.
Further I added some properties, like a vector crispness_values of all crispness values for m in [m_min, m_max] and a vector eigenvalues of all eigenvalues associated with m in [m_min, m_max].
Additionally, I changed some property names like cluster_crispness -> n_macrostates_crispness and n_metastable -> n_macrostates.

@michalk8 could you please take a look at this and tell me what you think?
@Marius1311 I know you are very busy, but could you mb. also take a quick look, since this would be a significant change?

…f more then one cluster number was supplied.

…tead of only conjugated eigenvalues.

…ate splitting handling.

…t_sections in _checks. Co-authored-by: Michal Klein <michal.klein@protonmail.com>

pygpcca/_gpcca.py

msmdev · 2021-01-25T15:16:39Z

Hmm, why is doc building failing? Locally it works fine for me... mb. the example-notebook.ipynb...

michalk8 · 2021-01-25T15:34:45Z

Hmm, why is doc building failing? Locally it works fine for me... mb. the example-notebook.ipynb...

If you have any warnings, it's set to fail on RTD. This is changed in #12 (in .readthedocs.yml, set fail_on_warning: false) . In this, it warns me with
Pygments lexer name 'ipython3' is not known

Marius1311 · 2021-01-25T16:57:23Z

Will try to go over this tomorrow!

Marius1311

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

pygpcca/_gpcca.py

Marius1311 · 2021-01-26T10:38:16Z

pygpcca/_gpcca.py

+
+    @property  # type: ignore[misc]
+    @d.dedent
+    def crispness_values(self) -> OArray:


Maybe a better naming convention is needed here, you have top_eigenvalues <-> n_macrostates_crispness and eigenvalues <-> crispness_values, but it's not very obvious from the naming that one relates to the optimal cluster number whereas the other is an iterable over a range of cluster numbers.

Yes, it is difficult here, since we also accept the interval to only contain one cluster number and thus I wouldn't really call this optimal then...
But yes, strictly speaking even then it would be the optimal number out of an "interval" with only one number in it.
I don't really had good ideas here, mb. you guys @Marius1311 @michalk8 have nicer ones in mind?

Tried to settle this here.
@Marius1311 what is your opinion on this?

Sorry, I don't get it, what's the new naming convention you propose?

I not only renamed some properties, but also reordered all the properties. Mb. take a look here

pygpcca/_gpcca.py

msmdev · 2021-01-26T10:51:02Z

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

Both GPCCA.optimize and GPCCA.minChi fail (also in the main branch), if m_max causes a split, since _do_schur_helper and _sorted_schur trigger a raise, if splitting occurs.
I don't see a way to change this, since computing a partial Schur decomposition as well as sorting a Schur form for a dimension splitting a complex eigenvalue pair will fail.

…cccurs.

…rties alphabetically, fix docstrings.

Marius1311 · 2021-01-26T17:18:55Z

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

Both GPCCA.optimize and GPCCA.minChi fail (also in the main branch), if m_max causes a split, since _do_schur_helper and _sorted_schur trigger a raise, if splitting occurs.
I don't see a way to change this, since computing a partial Schur decomposition as well as sorting a Schur form for a dimension splitting a complex eigenvalue pair will fail.

I think that's okay for now.

msmdev · 2021-01-26T17:25:01Z

@Marius1311 @michalk8 currently (since one of the commits I made today) I have a lot of strange errors, when trying to optimize for any interval of cluster numbers for the toy matrix from ML-2020-10-17_GPCCA_toy_example.ipynb.
Before that everything worked fine...
I guess I have to dive in to find the reason...

msmdev · 2021-01-26T17:31:28Z

@Marius1311 @michalk8 currently (since one of the commits I made today) I have a lot of strange errors, when trying to optimize for any interval of cluster numbers for the toy matrix from ML-2020-10-17_GPCCA_toy_example.ipynb.
Before that everything worked fine...
I guess I have to dive in to find the reason...

@Marius1311 @michalk8 this seems to originate from using np.nan, since np.argmax
here will take np.nan as the maximum, if it occurs.

Any suggestions what to do instead?

msmdev · 2021-01-26T17:34:10Z

np.nanargmax might do the job, but I don't like the following
Warning: the results cannot be trusted if a slice contains only NaNs and -Infs. in the docs.

msmdev · 2021-01-26T17:41:35Z

np.nanargmax might do the job, but I don't like the following
Warning: the results cannot be trusted if a slice contains only NaNs and -Infs. in the docs.

Ok, this helps to get rid of some errors, but now there are runtime warnings

/home/breuter/g-pcca/pyGPCCA/pygpcca/_gpcca.py:994: UserWarning: Clustering into 4 clusters will split complex conjugate eigenvalues. Skipping clustering into 4 clusters.
  f"Clustering into {m} clusters will split complex conjugate eigenvalues. "
/home/breuter/g-pcca/pyGPCCA/pygpcca/_gpcca.py:1025: RuntimeWarning: invalid value encountered in greater
  if np.any(np.array(crispness_list) > 0.0):

Using NaN is really unconvenient.

@Marius1311 Shouldn't the first m to warn about because of splitting be 7 for your toy example?? Here it warns already at 4.

michalk8 · 2021-01-26T17:54:35Z

Using NaN is really unconvenient.

@msmdev I'd then use -1 or something which is not in the [0, 1] interval.

msmdev · 2021-01-26T17:59:51Z

Using NaN is really unconvenient.

@msmdev I'd then use -1 or something which is not in the [0, 1] interval.

Ok for crispness and chi, but not for rot_matrix, since negative values are legal there.

msmdev · 2021-01-29T08:20:14Z

@michalk8 any idea whats the reason for the following warning?

/Users/runner/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
      return f(*args, **kwds)

msmdev · 2021-01-29T08:34:20Z

@michalk8 thank you very much!
Anyway, it is kind of concerning that m=0 is again not working on macOS - I thought we had fixed it.

@michalk8 have you tried the following regarding the failing tests on macOS?:

change dtype of the elements of p from np.complex128 to np.complex64 here;
change type of p back to list here.

michalk8 · 2021-01-29T08:51:47Z

@msmdev not yet, doing so now. However, even if it works on c64/list, I think it's too brittle to consider this a fix.

michalk8 · 2021-01-29T09:15:01Z

@msmdev changing to c64 and removing the skipped tests and it works. Changing to list does not.

msmdev · 2021-01-29T09:22:02Z

@msmdev changing to c64 and removing the skipped tests and it works. Changing to list does not.

Ok, this is strange... let me think about this for a moment...
Mb. whilst you could check in #17 if the matrix is indeed always near singular for mu=0?

msmdev · 2021-01-29T16:41:41Z

@michalk8 after having done the final fixes, could you please accept the changes?
Then I can merge and deploy - or did I forget anything?

michalk8 · 2021-01-29T16:46:50Z

Done. The tests should pass (I've added a test which checks for the np.linalg.LinAlgError using the matrices in #17 .
After they pass, you can merge and deploy (hopefully it goes smoothly).

msmdev · 2021-01-29T16:49:03Z

Cool, thank you again @michalk8 for your excellent work!
I am very happy that we could track the macOS complex issue down to the source - otherwise I would have had aches to deploy.

msmdev · 2021-01-29T16:51:19Z

@michalk8 hmmm, it fails on macOS since RuntimeError in the linalg error test is not raised x.x

michalk8 · 2021-01-29T16:54:56Z

@michalk8 hmmm, it fails on macOS since RuntimeError in the linalg error test is not raised x.x

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

msmdev · 2021-01-29T17:02:36Z

@michalk8 again test_sort_real_schur_linalg_error failing on mac...

michalk8 · 2021-01-29T17:11:58Z

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

Ok, I think I will remove the test - in #17, I've swapped the Python versions so that it's more comparable, here it stay the same (so that we have PETSc/SLEPc coverage on 3.7 and 3.8). The matrix there was only py37...

Regarding

@michalk8 just recognized that there are some test completely based on mu=0 like test_schur_b_pos so mb. complete removal is not practical.

I removed mu=0 seems to be problematic ones (number of macrostates={8,9}), for 3/4 macrostates in other tests, this doesn't seem to be a problem. If you want, I can change the mu(0) calls to e.g. mu(10), but then this begs the question, what about the ground truth e.g. in test_cluster_by_isa, where we have the memberships for mu=0 and all the GT vectors for mu0 in conftest?

msmdev · 2021-01-29T17:13:23Z

@michalk8 another issue: I tried to run the example from the docs with binder. It loads and everything, but fails when I try tho execute the first cell to import the packages (matplotlib, numpy, gpcca), because it can't find matplotlib - numpy and gpcca are found.

msmdev · 2021-01-29T17:16:30Z

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

Ok, I think I will remove the test - in #17, I've swapped the Python versions so that it's more comparable, here it stay the same (so that we have PETSc/SLEPc coverage on 3.7 and 3.8). The matrix there was only py37...

Regarding

@michalk8 just recognized that there are some test completely based on mu=0 like test_schur_b_pos so mb. complete removal is not practical.

I removed mu=0 seems to be problematic ones (number of macrostates={8,9}), for 3/4 macrostates in other tests, this doesn't seem to be a problem. If you want, I can change the mu(0) calls to e.g. mu(10), but then this begs the question, what about the ground truth e.g. in test_cluster_by_isa, where we have the memberships for mu=0 and all the GT vectors for mu0 in conftest?

I think we should stay with mu=0 as far as possible, since there is in principle nothing wrong with it and the issue with macOS was merely technical in my opinion.

michalk8 · 2021-01-29T17:19:17Z

@michalk8 another issue: I tried to run the example from the docs with binder. It loads and everything, but fails when I try tho execute the first cell to import the packages (matplotlib, numpy, gpcca), because it can't find matplotlib - numpy and gpcca are found.

Can confirm - the binder requirements are not executed - I think it gets the requirements.txt instead of docs/source/binder/requirements (will take a look into this, but on master matplotlib is specified there.

I think we should stay with mu=0 as far as possible, since there is in principle nothing wrong with it and the issue with macOS was merely technical in my opinion.

Agreed.

msmdev · 2021-01-29T17:28:13Z

@michalk8 should I wait with merging until the binder issue is fixed?

michalk8 · 2021-01-29T17:28:58Z

Yes please, almost there!

msmdev · 2021-01-29T17:30:06Z

sure! :D

michalk8 · 2021-01-29T17:33:10Z

Ok, done, once this is merged, the binder should work (it will pip install from master, since we're not on PyPI + the docs always show latest release, I think it's fine).
Feel free to merge.

msmdev · 2021-01-29T17:36:08Z

Alright! Lets go with it then!

msmdev · 2021-01-29T17:54:03Z

Ok, done, once this is merged, the binder should work (it will pip install from master, since we're not on PyPI + the docs always show latest release, I think it's fine).
Feel free to merge.

@michalk8 yup, binder works smoothly now! :)

msmdev and others added 6 commits January 25, 2021 10:30

Skip instead of raise in GPCCA.optimize when splitting eigenvalues, i…

2ce7439

…f more then one cluster number was supplied.

Add missing properties.

dcbaea1

Fix slicing typo.

4909010

Warn/raise indicating splitting of complex conjugated eigenvalues ins…

24560dc

…tead of only conjugated eigenvalues.

Add example notebook adapted to changed properties and complex conjug…

c94757c

…ate splitting handling.

Change dtype to np.complex64 in _sort_real_schur; get_sectionsf -> ge…

d6fb0f5

…t_sections in _checks. Co-authored-by: Michal Klein <michal.klein@protonmail.com>

msmdev requested a review from michalk8 January 25, 2021 12:43

michalk8 requested changes Jan 25, 2021

View reviewed changes

pygpcca/_gpcca.py Outdated Show resolved Hide resolved

msmdev added 4 commits January 25, 2021 15:53

Improve grammar in utils/_docs.py.

dce3521

Fix typo in utils/_docs.py.

e846e33

Correct crispness formula in utils/_docs.py.

24c51ef

Improve readability.

911c92b

Set 'fail_on_warning: false' in .readthedocs.yml.

693bcc3

Marius1311 reviewed Jan 26, 2021

View reviewed changes

msmdev added 5 commits January 26, 2021 12:45

Set crispness, chi, rot_matrix to np.nan instead of 0, if splitting o…

a2c47c9

…cccurs.

Return eigenvalues starting from the first eigenvalue (index 0).

5f9bd45

Replace metastable by macrostate.

30d3f6d

Fix some typos caused by metastable -> macrostate.

980a5a6

Improve property names, add missing attributes, sort attributes/prope…

0872869

…rties alphabetically, fix docstrings.

michalk8 mentioned this pull request Jan 29, 2021

Hypothesis tests #17

Closed

Remove from example_matrix_mu=0, add linalg test

a0b6c53

michalk8 approved these changes Jan 29, 2021

View reviewed changes

Use only z=LM

05161b4

michalk8 force-pushed the improve_optimize branch from 821a877 to 05161b4 Compare January 29, 2021 16:56

Remove linalg error test

54f29b1

[ci skip] Add environment.yml for binder

823f37b

msmdev merged commit 7006afb into main Jan 29, 2021

Marius1311 deleted the improve_optimize branch September 21, 2021 13:07

Improve GPCCA.optimize(): Change complex conjugate splitting handling etc #13

Improve GPCCA.optimize(): Change complex conjugate splitting handling etc #13

Conversation

msmdev commented Jan 25, 2021 • edited

msmdev commented Jan 25, 2021

michalk8 commented Jan 25, 2021

Marius1311 commented Jan 25, 2021

Marius1311 left a comment

Choose a reason for hiding this comment

Marius1311 Jan 26, 2021

Choose a reason for hiding this comment

msmdev Jan 26, 2021

Choose a reason for hiding this comment

msmdev Jan 26, 2021

Choose a reason for hiding this comment

Marius1311 Jan 26, 2021

Choose a reason for hiding this comment

msmdev Jan 26, 2021

Choose a reason for hiding this comment

msmdev commented Jan 26, 2021 • edited

Marius1311 commented Jan 26, 2021

msmdev commented Jan 26, 2021 • edited

msmdev commented Jan 26, 2021

msmdev commented Jan 26, 2021

msmdev commented Jan 26, 2021

michalk8 commented Jan 26, 2021

msmdev commented Jan 26, 2021

msmdev commented Jan 29, 2021

msmdev commented Jan 29, 2021 • edited

michalk8 commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021 • edited

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

michalk8 commented Jan 29, 2021

msmdev commented Jan 29, 2021

msmdev commented Jan 29, 2021

Improve `GPCCA.optimize()`: Change complex conjugate splitting handling etc #13

Improve `GPCCA.optimize()`: Change complex conjugate splitting handling etc #13

msmdev commented Jan 25, 2021 •

edited

msmdev commented Jan 26, 2021 •

edited

msmdev commented Jan 26, 2021 •

edited

msmdev commented Jan 29, 2021 •

edited

michalk8 commented Jan 29, 2021 •

edited