-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplex #22
Comments
The central differentiation is to select which parameters to use based upon their derivatives. If the derivatives are already known, it skips the central differentiation. How else do you propose we select parameters? |
Based on the differentiation in the preceding Gradient step, even though it has made a change in between. Doing a very expensive differentiation at the start of the Simplex sort of defeats the purpose of it. This is what we did before. Have you actually analyzed my original code (Unix, awk, C), or did you go only from Elaine’s python? /Per-Ola From: Eric Hansen [mailto:notifications@github.com] The central differentiation is to select which parameters to use based upon their derivatives. If the derivatives are already known, it skips the central differentiation. How else do you propose we select parameters? — Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful. |
It's a simple change to take the derivatives from the starting FF from gradient. What about when you don't want to use gradient at all, just simplex? I propose leaving simplex as is in case you really don't have the derivatives, and having gradient give the initial derivatives to whatever FF it finds was best. |
At one point didn't the code try and only optimize 10 parameters at a time? It would be nice to avoid that extra calculation cost if it is unnecessary. For example some of my optimizations are looking at all the force constants (26 parameters) and it then generates 52 FF's calculating data from 30 structures. This amounts to parametrizations that go beyond two days, and especially when we consider our Schrödinger Licenses being a constant problem, it becomes more of a problem beyond just calculation time. |
For simplex it limits to a certain number of parameters, which is indeed 10. You can change that with I don't have any max parameter limit on the gradient based methods. Also, I finished coding my suggestion in the last comment. Just testing it now. |
You only need the derivatives for selection. If you ONLY run simplex, it's your own responsibility to select the <=10 parameters. You could do this, for example, from a previously written out Jacobian, or from knowing... Sent from my iPhone On 3 juni 2016, at 17:05, Eric Hansen <notifications@github.commailto:notifications@github.com> wrote: For simplex it limits to a certain number of parameters, which is indeed 10. You can change that with simlex.Simplex.max_paramshttps://github.com/ericchansen/q2mm/blob/0a4f2c75e1ef36e81a339c1cc0fbf574a44b635b/simplex.py#L59. I don't have any max parameter limit on the gradient based methods. Also, I finished coding my suggestion in the last comment. Just testing it now. You are receiving this because you authored the thread. Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful. |
Warning: This hasn't been thoroughly tested. Gradient determines the parameter derivatives based off of central differentiation. It then uses these derivatives to generate many trial FFs, and ultimately returns the best FF. Typically, this best FF didn't include any parameter derivatives, since it has changed somewhat from the initial FF. Now, the best FF is given the derivatives from the initial FF to use as an estimate. These estimated derivatives can be used by simplex for parameter selection, saving us the time of another central differentiation step. Other warning: This may mess up loop cycles that don't include gradient. I'll look into this when I get more time. If gradient only calculates parameter derivatives when they're missing, then looping through gradient over and over again would never do central differentiation more than once.
Check out commit c7ad106. Not tested as much as I'd like, so please let me know if I broke anything. I have to hold off on submitting any Q2MM jobs due to a token dispute going on now between our lab and several other labs on campus. If you guys could test this and let me know, that'd be great. Also, I have another concern. This change may result in some undesired behavior when we loop only using gradient, or when simplex doesn't produce any improvements. See the commit message for more information. Just warning everyone. I can work on it soon. |
Numerical differentiation should not be dependent on whether gradients exist or not. GRAD should always do differentiation, SIMP never. The only case when SIMP should need derivatives (mainly 2nd) is when it does parameter selection. Forward them when available; if not present, the SIMP runs with all parameters it is given. |
... but OK, SIMP should be allowed to complain and break if it is given more than 10 parameters (preferably as a constant that can be modified by expert users). |
I have been trying out the commit that is referenced above. On line 347 of simplex.py should that be "o" instead of "i"? |
Tony pointed out a variable misname in issue #22. Fixed.
Yes, it appears that it should be. I'm going to change the o on line 342 to an i though. I typically use i as my integer in enumerations. Might as well stick to that. |
I'm running into a problem with the updated version of simplex. If the "if" statement on line 105 is false and goes to the else statement on line 136, then on line 149, you reference ff_rows and ff_cols that aren't defined. |
Simplex has been broken since commit c7ad106. I would advise not using until someone has time to work on it. Unfortunately, that commit should have been in its own branch until it was thoroughly tested. Then we could have used the old version. |
Calculating derivatives in this fashion doesn't take much time, so let's just do it every time. Previously, in commit c7ad106, I was worried about a potential malfunction in gradient if we didn't cycle between gradient and simplex or if simplex didn't make any improvements. I don't think that's actually a problem anymore because central differentiation is done every time we use gradient no matter what. Still working on issue #22. Will have to do a lot of work with simplex still.
Each MM3* parameter has the attributes ff_row and ff_col. If the number of parameters has to be reduced during a simplex optimization, simplex will save the values of the MM3* FF row and column for the parameters with the lowest 2nd derivatives. These are then used later on to reduce the number of parameters in various FFs. Note that this method is specific to MM3*, and should be replaced with a more general method that will work with other FFs. Anyway, the lists that contained the FF rows and columns, ff_rows and ff_cols, were declared within an if statement, and sometimes they have to be used outside of that if statement. This commit moves their declaration outside the if statement. Also added more comments. Working on issue #22.
I'm not having any issues anymore. Anyone else want to double check me? |
Should "params" be defined with ff_row and f_cols? Also, around line 69 should you have a return statement? Should the 'return best_ff' be before the else statement and 'return self.new_ffs' be in the else statement? |
Just moved a few things around in the best_ff property. Working on issue #22.
I made a commit earlier that may do what you're talking about, but I didn't reference this issue. I just updated that now. Do the changes in commit a809c7b do what you are talking about? |
Yes, exactly. |
I downloaded the iss22 branch (first time I tried accessing another branch :-) ), running it, not crashing. However, the Simplex also is not behaving as it should. It has thrown away my best point, continued with a worse set. Will continue to test, se what I can find. To help understanding, I think it would helpt if the simplex ORDERED FF SCORES were printed after each cycle, now I had to infer from behavior what happened. But it was clearly going wrong, I started out with a simplex of only two parameters and the scores "93.8 396.1 0.3", and a bit later it tried to expand in a score of "23.7", impossible unless it threw away the "0.3" score first. Will do my own modifications of the code and test, but I remember having a simple, fast test case with only two bond lengths in one small structure (I'm using glycol), fast! |
And now my example crashed, after the simplex, in a new place, datatypes.py line 281, seems "self.path" is NoneType. I'll work on it today, upload a simple example if I cannot crack it. |
If anyone wants it, here is my error message from the example where I'm just doing two simple bond lengths: opt |
Merging Per-Ola's changes to simplex, which were made in reference to issue #22.
Merged your changes Per-Ola, they look good. However, I'm guessing those changes don't address the bug in your last comment. Here's my take on your error message. At the end of
Because that's so big, I only store it on Apparently, whatever FF it is trying to write doesn't have that property, so instead it attempts to read the file again, rather than use the lines stored in memory. However, it also can't do that because the property These properties must be copied to the the FF it is attempting to write at some point. Somewhere in your example, I'm assuming the code goes off the standard path and ends up at this point where the attributes don't get copied like they should. I say that because my examples I tried running very recently last night seemed to work fine. We should probably look for a diverging |
Tabs were changed back to spaces. Added comment. Reference issue #22.
It's possible that simplex will generate a set of trial FFs where all of the scores are the same. At this point, simplex should exit. Previously, there was a break instead of a raise that prevented simplex from exiting properly. Also added in another layer of checking thanks to Per-Ola's suggestions. Now there are two chances to catch the potential error. Working on issue #22.
In going over other changes, I noticed that one aspect of simplex was not behaving as intended in the 1998 publication on Q2MM methodology. This is corrected here. Reference the publication, specifically page 1163, for more details. Norrby, P.-O. & Liljefors, T. Automated molecular mechanics parameterization with simultaneous utilization of experimental and quantum mechanical data Journal of Computational Chemistry, Wiley-Blackwell, 1998, 19, 1146-1166. Reference issue #22.
Very simple merge resolution here. Nothing to say really. Referencing issue #22. simplex.py
The following was taken from the pull request. I wish it updated on here automatically, but I guess I have to copy and paste it. Also unfortunate, but many of the commits in that pull request will not appear here because issue #22 was not referenced in the commit message. Lots of helpful changes inside here [I was referencing the pull request of q2mm/q2mm:mysimplex. Still need to do testing before going forward. Here's some things to consider.
|
I merged your iss22 branch into my own branch and worked from that, but couldn’t push back since I had no write access to your code, and there was no such branch under Q2MM/q2mm. I had to create a new branch and push that to Q2MM/q2mm, where I had write access. Any other way I could have done it? /Per-Ola From: Eric Hansen [mailto:notifications@github.com] The following was taken from the pull request. I wish it updated on here automatically, but I guess I have to copy and paste it. Also unfortunate, but many of the commits in that pull request will not appear here because issue #22#22 was not referenced in the commit message. Lots of helpful changes inside here [I was referencing the pull request of q2mm/q2mm:mysimplexhttps://github.com/Q2MM/q2mm/tree/mysimplex]. Still need to do testing before going forward. Here's some things to consider.
— Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful. |
Not sure. It's pretty messing working between forks like this (we just have to be careful where we submit issues in the future). Typically everything between forks is handled with pull requests, which isn't very conducive for close collaboration or coding together. Better suited for adopting fully fleshed out features. |
A float that didn't change was being recalculated over and over inside a loop. Changed such that it is only calculcated once before the loop. Also changed error handling to accomodate this. Changes made based upon Per-Ola's suggestion. Reference issue #22.
There are FF objects, instances of datatypes.MM3, that have the attribute lines. datatypes.MM3.lines is a list of strings, literally every line from the mm3.fld file. Since this is rather large, I try not to store it in memory more than once. By the way, datatypes.MM3.lines is used to write FFs in datatypes.MM3.export_ff. datatypes.MM3.export_ff accepts several arguments. The following is copied from its __doc__. def export_ff(self, path=None, params=None, lines=None): """ Exports the force field to a file, typically mm3.fld. Parameters ---------- path : string File to be written or overwritten. params : list of `datatypes.Param` (or subclass) lines : list of strings This is what is generated when you read mm3.fld using readlines(). """ The code used to check if lines and self.lines was None, however, self.lines is now actually a property. When self.lines was updated to being a property, this negatively affected that behavior. For example, if you were to say if self.lines is None: do_stuff() This would actually call upon the property self.lines. This property looks like this @Property def lines(self): if self._lines is None: with open(self.path, 'r') as f: self._lines = f.readlines() return self._lines @lines.setter def lines(self, x): self._lines = x This means that you can call upon self.lines, it will check if self._lines is None. If it is, then it attempts to read the lines. This would cause problems when self.path was also None because it couldn't open the file to read the lines. Basically, self.lines was being called when self.path was None, which resulted in errors. This works on issue #22. There's also a very good chance this will fix many of the problems Per-Ola has been reporting. More fine tuning may be needed, but I think this is the core issue.
Simplex was previously misnaming FFs during differentiation. That has now been corrected. Reference issue #22.
A float that didn't change was being recalculated over and over inside a loop. Changed such that it is only calculcated once before the loop. Also changed error handling to accomodate this. Changes made based upon Per-Ola's suggestion. Reference issue #22.
There are FF objects, instances of datatypes.MM3, that have the attribute lines. datatypes.MM3.lines is a list of strings, literally every line from the mm3.fld file. Since this is rather large, I try not to store it in memory more than once. By the way, datatypes.MM3.lines is used to write FFs in datatypes.MM3.export_ff. datatypes.MM3.export_ff accepts several arguments. The following is copied from its __doc__. def export_ff(self, path=None, params=None, lines=None): """ Exports the force field to a file, typically mm3.fld. Parameters ---------- path : string File to be written or overwritten. params : list of `datatypes.Param` (or subclass) lines : list of strings This is what is generated when you read mm3.fld using readlines(). """ The code used to check if lines and self.lines was None, however, self.lines is now actually a property. When self.lines was updated to being a property, this negatively affected that behavior. For example, if you were to say if self.lines is None: do_stuff() This would actually call upon the property self.lines. This property looks like this @Property def lines(self): if self._lines is None: with open(self.path, 'r') as f: self._lines = f.readlines() return self._lines @lines.setter def lines(self, x): self._lines = x This means that you can call upon self.lines, it will check if self._lines is None. If it is, then it attempts to read the lines. This would cause problems when self.path was also None because it couldn't open the file to read the lines. Basically, self.lines was being called when self.path was None, which resulted in errors. This works on issue #22. There's also a very good chance this will fix many of the problems Per-Ola has been reporting. More fine tuning may be needed, but I think this is the core issue.
Simplex was previously misnaming FFs during differentiation. That has now been corrected. Reference issue #22.
It looks like issue #22 is resolved. Just ran some tests, and all looks good.
Okay, we could still go for some functionality changes, but I think all the bugs are gone. |
Sometimes unrestricted optimizations techniques generate absurd parameter values. If these values exceed what is allowed in the MM3* FF format (I'm not sure exactly what this is, but I think it's probably 999.9999), then the entire method of reading and writing FFs breaks down. THIS IS A TEMPORARY FIX. In this commit, I simply prevent absurd values from being written. Ideally, export_ff should raise an exception when attempting to write rediculous parameter values. This exception should then be handled appropriately in the various sections of the code that call upon export_ff.
Simplex shouldn't experience the bugs mentioned in issue #22. However, other design changes to simplex still need to be made.
Something is badly wrong with the Simplex. When I'm testing it now, for a large set of parameters, when it starts the simplex, it initializes by doing numerical differentiation, for 150 parameters! This is never meaningful. First of all, a subset of no more than 10 parameters must be selected from the preceding Grad calculation (the initial Jacobian of that one). Second, we should never do forward AND backward when starting a Simplex. We usually use the initial point, and each of the <=10 parameters changed in the forward direction only, that is the starting Simplex.
The text was updated successfully, but these errors were encountered: