Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GPCCA.optimize(): Change complex conjugate splitting handling etc #13

Merged
merged 46 commits into from Jan 29, 2021

Conversation

msmdev
Copy link
Owner

@msmdev msmdev commented Jan 25, 2021

Currently it is not possible to optimize for cluster numbers m in [m_min, m_max], if the interval contains cluster numbers m that would lead to splitting of complex conjugate eigenvalues, since this raises a ValueError. This renders GPCCA.optimize() dysfunctional for some typical use cases.
Thus I made changes to adapt this behavior: Now it will only raise, if we are only optimizing for a single cluster number m and if this m would lead to splitting, or, if m_max would cause splitting.
Further I added some properties, like a vector crispness_values of all crispness values for m in [m_min, m_max] and a vector eigenvalues of all eigenvalues associated with m in [m_min, m_max].
Additionally, I changed some property names like cluster_crispness -> n_macrostates_crispness and n_metastable -> n_macrostates.

@michalk8 could you please take a look at this and tell me what you think?
@Marius1311 I know you are very busy, but could you mb. also take a quick look, since this would be a significant change?

@msmdev msmdev requested a review from michalk8 January 25, 2021 12:43
pygpcca/_gpcca.py Outdated Show resolved Hide resolved
@msmdev
Copy link
Owner Author

msmdev commented Jan 25, 2021

Hmm, why is doc building failing? Locally it works fine for me... mb. the example-notebook.ipynb...

@michalk8
Copy link
Collaborator

Hmm, why is doc building failing? Locally it works fine for me... mb. the example-notebook.ipynb...

If you have any warnings, it's set to fail on RTD. This is changed in #12 (in .readthedocs.yml, set fail_on_warning: false) . In this, it warns me with
Pygments lexer name 'ipython3' is not known

@Marius1311
Copy link
Collaborator

Will try to go over this tomorrow!

Copy link
Collaborator

@Marius1311 Marius1311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

pygpcca/_gpcca.py Outdated Show resolved Hide resolved
pygpcca/_gpcca.py Outdated Show resolved Hide resolved
pygpcca/_gpcca.py Show resolved Hide resolved
pygpcca/_gpcca.py Outdated Show resolved Hide resolved
pygpcca/_gpcca.py Outdated Show resolved Hide resolved

@property # type: ignore[misc]
@d.dedent
def crispness_values(self) -> OArray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better naming convention is needed here, you have top_eigenvalues <-> n_macrostates_crispness and eigenvalues <-> crispness_values, but it's not very obvious from the naming that one relates to the optimal cluster number whereas the other is an iterable over a range of cluster numbers.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is difficult here, since we also accept the interval to only contain one cluster number and thus I wouldn't really call this optimal then...
But yes, strictly speaking even then it would be the optimal number out of an "interval" with only one number in it.
I don't really had good ideas here, mb. you guys @Marius1311 @michalk8 have nicer ones in mind?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to settle this here.
@Marius1311 what is your opinion on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get it, what's the new naming convention you propose?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not only renamed some properties, but also reordered all the properties. Mb. take a look here

pygpcca/_gpcca.py Outdated Show resolved Hide resolved
@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

Both GPCCA.optimize and GPCCA.minChi fail (also in the main branch), if m_max causes a split, since _do_schur_helper and _sorted_schur trigger a raise, if splitting occurs.
I don't see a way to change this, since computing a partial Schur decomposition as well as sorting a Schur form for a dimension splitting a complex eigenvalue pair will fail.

@Marius1311
Copy link
Collaborator

I think it's really important that we have tests for this. Also, @msmdev , you mentioned that this would still fail if m_max from the given range splits complex conjugates, has that been resolved?

Both GPCCA.optimize and GPCCA.minChi fail (also in the main branch), if m_max causes a split, since _do_schur_helper and _sorted_schur trigger a raise, if splitting occurs.
I don't see a way to change this, since computing a partial Schur decomposition as well as sorting a Schur form for a dimension splitting a complex eigenvalue pair will fail.

I think that's okay for now.

@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

@Marius1311 @michalk8 currently (since one of the commits I made today) I have a lot of strange errors, when trying to optimize for any interval of cluster numbers for the toy matrix from ML-2020-10-17_GPCCA_toy_example.ipynb.
Before that everything worked fine...
I guess I have to dive in to find the reason...

@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

@Marius1311 @michalk8 currently (since one of the commits I made today) I have a lot of strange errors, when trying to optimize for any interval of cluster numbers for the toy matrix from ML-2020-10-17_GPCCA_toy_example.ipynb.
Before that everything worked fine...
I guess I have to dive in to find the reason...

@Marius1311 @michalk8 this seems to originate from using np.nan, since np.argmax
here will take np.nan as the maximum, if it occurs.

Any suggestions what to do instead?

@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

np.nanargmax might do the job, but I don't like the following
Warning: the results cannot be trusted if a slice contains only NaNs and -Infs. in the docs.

@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

np.nanargmax might do the job, but I don't like the following
Warning: the results cannot be trusted if a slice contains only NaNs and -Infs. in the docs.

Ok, this helps to get rid of some errors, but now there are runtime warnings

/home/breuter/g-pcca/pyGPCCA/pygpcca/_gpcca.py:994: UserWarning: Clustering into 4 clusters will split complex conjugate eigenvalues. Skipping clustering into 4 clusters.
  f"Clustering into {m} clusters will split complex conjugate eigenvalues. "
/home/breuter/g-pcca/pyGPCCA/pygpcca/_gpcca.py:1025: RuntimeWarning: invalid value encountered in greater
  if np.any(np.array(crispness_list) > 0.0):

Using NaN is really unconvenient.

@Marius1311 Shouldn't the first m to warn about because of splitting be 7 for your toy example?? Here it warns already at 4.

@michalk8
Copy link
Collaborator

Using NaN is really unconvenient.

@msmdev I'd then use -1 or something which is not in the [0, 1] interval.

@msmdev
Copy link
Owner Author

msmdev commented Jan 26, 2021

Using NaN is really unconvenient.

@msmdev I'd then use -1 or something which is not in the [0, 1] interval.

Ok for crispness and chi, but not for rot_matrix, since negative values are legal there.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 any idea whats the reason for the following warning?

/Users/runner/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
      return f(*args, **kwds)

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 thank you very much!
Anyway, it is kind of concerning that m=0 is again not working on macOS - I thought we had fixed it.

@michalk8 have you tried the following regarding the failing tests on macOS?:

  • change dtype of the elements of p from np.complex128 to np.complex64 here;
  • change type of p back to list here.

@michalk8 michalk8 mentioned this pull request Jan 29, 2021
@michalk8
Copy link
Collaborator

@msmdev not yet, doing so now. However, even if it works on c64/list, I think it's too brittle to consider this a fix.

@michalk8
Copy link
Collaborator

@msmdev changing to c64 and removing the skipped tests and it works. Changing to list does not.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@msmdev changing to c64 and removing the skipped tests and it works. Changing to list does not.

Ok, this is strange... let me think about this for a moment...
Mb. whilst you could check in #17 if the matrix is indeed always near singular for mu=0?

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 after having done the final fixes, could you please accept the changes?
Then I can merge and deploy - or did I forget anything?

@michalk8
Copy link
Collaborator

Done. The tests should pass (I've added a test which checks for the np.linalg.LinAlgError using the matrices in #17 .
After they pass, you can merge and deploy (hopefully it goes smoothly).

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

Cool, thank you again @michalk8 for your excellent work!
I am very happy that we could track the macOS complex issue down to the source - otherwise I would have had aches to deploy.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 hmmm, it fails on macOS since RuntimeError in the linalg error test is not raised x.x

@michalk8
Copy link
Collaborator

michalk8 commented Jan 29, 2021

@michalk8 hmmm, it fails on macOS since RuntimeError in the linalg error test is not raised x.x

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 again test_sort_real_schur_linalg_error failing on mac...

@michalk8
Copy link
Collaborator

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

Ok, I think I will remove the test - in #17, I've swapped the Python versions so that it's more comparable, here it stay the same (so that we have PETSc/SLEPc coverage on 3.7 and 3.8). The matrix there was only py37...

Regarding

@michalk8 just recognized that there are some test completely based on mu=0 like test_schur_b_pos so mb. complete removal is not practical.

I removed mu=0 seems to be problematic ones (number of macrostates={8,9}), for 3/4 macrostates in other tests, this doesn't seem to be a problem. If you want, I can change the mu(0) calls to e.g. mu(10), but then this begs the question, what about the ground truth e.g. in test_cluster_by_isa, where we have the memberships for mu=0 and all the GT vectors for mu0 in conftest?

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 another issue: I tried to run the example from the docs with binder. It loads and everything, but fails when I try tho execute the first cell to import the packages (matplotlib, numpy, gpcca), because it can't find matplotlib - numpy and gpcca are found.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

I wanted to be clever by parametrizing z, wasn't a good idea. LM should be the correct one.

Ok, I think I will remove the test - in #17, I've swapped the Python versions so that it's more comparable, here it stay the same (so that we have PETSc/SLEPc coverage on 3.7 and 3.8). The matrix there was only py37...

Regarding

@michalk8 just recognized that there are some test completely based on mu=0 like test_schur_b_pos so mb. complete removal is not practical.

I removed mu=0 seems to be problematic ones (number of macrostates={8,9}), for 3/4 macrostates in other tests, this doesn't seem to be a problem. If you want, I can change the mu(0) calls to e.g. mu(10), but then this begs the question, what about the ground truth e.g. in test_cluster_by_isa, where we have the memberships for mu=0 and all the GT vectors for mu0 in conftest?

I think we should stay with mu=0 as far as possible, since there is in principle nothing wrong with it and the issue with macOS was merely technical in my opinion.

@michalk8
Copy link
Collaborator

@michalk8 another issue: I tried to run the example from the docs with binder. It loads and everything, but fails when I try tho execute the first cell to import the packages (matplotlib, numpy, gpcca), because it can't find matplotlib - numpy and gpcca are found.

Can confirm - the binder requirements are not executed - I think it gets the requirements.txt instead of docs/source/binder/requirements (will take a look into this, but on master matplotlib is specified there.

I think we should stay with mu=0 as far as possible, since there is in principle nothing wrong with it and the issue with macOS was merely technical in my opinion.

Agreed.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

@michalk8 should I wait with merging until the binder issue is fixed?

@michalk8
Copy link
Collaborator

Yes please, almost there!

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

sure! :D

@michalk8
Copy link
Collaborator

Ok, done, once this is merged, the binder should work (it will pip install from master, since we're not on PyPI + the docs always show latest release, I think it's fine).
Feel free to merge.

@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

Alright! Lets go with it then!

@msmdev msmdev merged commit 7006afb into main Jan 29, 2021
@msmdev
Copy link
Owner Author

msmdev commented Jan 29, 2021

Ok, done, once this is merged, the binder should work (it will pip install from master, since we're not on PyPI + the docs always show latest release, I think it's fine).
Feel free to merge.

@michalk8 yup, binder works smoothly now! :)

@Marius1311 Marius1311 deleted the improve_optimize branch September 21, 2021 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants