Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplex #22

Closed
peonor opened this issue Jun 3, 2016 · 25 comments
Closed

Simplex #22

peonor opened this issue Jun 3, 2016 · 25 comments
Labels

Comments

@peonor
Copy link
Contributor

peonor commented Jun 3, 2016

Something is badly wrong with the Simplex. When I'm testing it now, for a large set of parameters, when it starts the simplex, it initializes by doing numerical differentiation, for 150 parameters! This is never meaningful. First of all, a subset of no more than 10 parameters must be selected from the preceding Grad calculation (the initial Jacobian of that one). Second, we should never do forward AND backward when starting a Simplex. We usually use the initial point, and each of the <=10 parameters changed in the forward direction only, that is the starting Simplex.

@ericchansen
Copy link
Owner

The central differentiation is to select which parameters to use based upon their derivatives. If the derivatives are already known, it skips the central differentiation.

How else do you propose we select parameters?

@peonor
Copy link
Contributor Author

peonor commented Jun 3, 2016

Based on the differentiation in the preceding Gradient step, even though it has made a change in between. Doing a very expensive differentiation at the start of the Simplex sort of defeats the purpose of it. This is what we did before.

Have you actually analyzed my original code (Unix, awk, C), or did you go only from Elaine’s python?

/Per-Ola

From: Eric Hansen [mailto:notifications@github.com]
Sent: den 3 juni 2016 16:31
To: ericchansen/q2mm q2mm@noreply.github.com
Cc: Norrby, Per-Ola Per-Ola.Norrby@astrazeneca.com; Author author@noreply.github.com
Subject: Re: [ericchansen/q2mm] Simplex (#22)

The central differentiation is to select which parameters to use based upon their derivatives. If the derivatives are already known, it skips the central differentiation.

How else do you propose we select parameters?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-223594702, or mute the threadhttps://github.com/notifications/unsubscribe/ANVSNBISXnI9f_avzfoiQCaAwVPDSUhOks5qIDqggaJpZM4Itj0_.


Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.

@ericchansen
Copy link
Owner

It's a simple change to take the derivatives from the starting FF from gradient.

What about when you don't want to use gradient at all, just simplex?

I propose leaving simplex as is in case you really don't have the derivatives, and having gradient give the initial derivatives to whatever FF it finds was best.

@arosale4
Copy link
Contributor

arosale4 commented Jun 3, 2016

At one point didn't the code try and only optimize 10 parameters at a time? It would be nice to avoid that extra calculation cost if it is unnecessary. For example some of my optimizations are looking at all the force constants (26 parameters) and it then generates 52 FF's calculating data from 30 structures. This amounts to parametrizations that go beyond two days, and especially when we consider our Schrödinger Licenses being a constant problem, it becomes more of a problem beyond just calculation time.

@ericchansen
Copy link
Owner

For simplex it limits to a certain number of parameters, which is indeed 10. You can change that with simlex.Simplex.max_params.

I don't have any max parameter limit on the gradient based methods.

Also, I finished coding my suggestion in the last comment. Just testing it now.

@peonor
Copy link
Contributor Author

peonor commented Jun 3, 2016

You only need the derivatives for selection. If you ONLY run simplex, it's your own responsibility to select the <=10 parameters. You could do this, for example, from a previously written out Jacobian, or from knowing...

Sent from my iPhone

On 3 juni 2016, at 17:05, Eric Hansen <notifications@github.commailto:notifications@github.com> wrote:

For simplex it limits to a certain number of parameters, which is indeed 10. You can change that with simlex.Simplex.max_paramshttps://github.com/ericchansen/q2mm/blob/0a4f2c75e1ef36e81a339c1cc0fbf574a44b635b/simplex.py#L59.

I don't have any max parameter limit on the gradient based methods.

Also, I finished coding my suggestion in the last comment. Just testing it now.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-223604368, or mute the threadhttps://github.com/notifications/unsubscribe/ANVSNO5JOV6-NcrVinnlZiX9nL8pWkzqks5qIEKigaJpZM4Itj0_.


Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.

ericchansen added a commit that referenced this issue Jun 3, 2016
Warning: This hasn't been thoroughly tested.

Gradient determines the parameter derivatives based off of central
differentiation. It then uses these derivatives to generate many
trial FFs, and ultimately returns the best FF.

Typically, this best FF didn't include any parameter derivatives,
since it has changed somewhat from the initial FF. Now, the best
FF is given the derivatives from the initial FF to use as an
estimate.

These estimated derivatives can be used by simplex for parameter
selection, saving us the time of another central differentiation
step.

Other warning: This may mess up loop cycles that don't include
gradient. I'll look into this when I get more time. If gradient
only calculates parameter derivatives when they're missing, then
looping through gradient over and over again would never do
central differentiation more than once.
@ericchansen
Copy link
Owner

Check out commit c7ad106.

Not tested as much as I'd like, so please let me know if I broke anything. I have to hold off on submitting any Q2MM jobs due to a token dispute going on now between our lab and several other labs on campus. If you guys could test this and let me know, that'd be great.

Also, I have another concern. This change may result in some undesired behavior when we loop only using gradient, or when simplex doesn't produce any improvements. See the commit message for more information. Just warning everyone. I can work on it soon.

@peonor
Copy link
Contributor Author

peonor commented Jun 7, 2016

Numerical differentiation should not be dependent on whether gradients exist or not. GRAD should always do differentiation, SIMP never. The only case when SIMP should need derivatives (mainly 2nd) is when it does parameter selection. Forward them when available; if not present, the SIMP runs with all parameters it is given.

@peonor
Copy link
Contributor Author

peonor commented Jun 7, 2016

... but OK, SIMP should be allowed to complain and break if it is given more than 10 parameters (preferably as a constant that can be modified by expert users).

@arosale4
Copy link
Contributor

arosale4 commented Jun 8, 2016

I have been trying out the commit that is referenced above. On line 347 of simplex.py should that be "o" instead of "i"?

ericchansen added a commit that referenced this issue Jun 8, 2016
Tony pointed out a variable misname in issue #22. Fixed.
@ericchansen
Copy link
Owner

Yes, it appears that it should be. I'm going to change the o on line 342 to an i though. I typically use i as my integer in enumerations. Might as well stick to that.

@nberkel05
Copy link
Contributor

I'm running into a problem with the updated version of simplex. If the "if" statement on line 105 is false and goes to the else statement on line 136, then on line 149, you reference ff_rows and ff_cols that aren't defined.

@ericchansen
Copy link
Owner

ericchansen commented Jun 21, 2016

Simplex has been broken since commit c7ad106. I would advise not using until someone has time to work on it.

Unfortunately, that commit should have been in its own branch until it was thoroughly tested. Then we could have used the old version.

ericchansen pushed a commit that referenced this issue Jun 23, 2016
ericchansen added a commit that referenced this issue Jun 23, 2016
Calculating derivatives in this fashion doesn't take much time, so
let's just do it every time.

Previously, in commit c7ad106, I was worried about a potential
malfunction in gradient if we didn't cycle between gradient and
simplex or if simplex didn't make any improvements. I don't think
that's  actually a problem anymore because central differentiation is
done every time we use gradient no matter what.

Still working on issue #22. Will have to do a lot of work with
simplex still.
ericchansen added a commit that referenced this issue Jun 27, 2016
Each MM3* parameter has the attributes ff_row and ff_col. If the number of
parameters has to be reduced during a simplex optimization, simplex will save
the values of the MM3* FF row and column for the parameters with the lowest 2nd
derivatives. These are then used later on to reduce the number of parameters in
various FFs.

Note that this method is specific to MM3*, and should be replaced with a more
general method that will work with other FFs.

Anyway, the lists that contained the FF rows and columns, ff_rows and ff_cols,
were declared within an if statement, and sometimes they have to be used outside
of that if statement. This commit moves their declaration outside the if
statement.

Also added more comments.

Working on issue #22.
@ericchansen
Copy link
Owner

I'm not having any issues anymore. Anyone else want to double check me?

@nberkel05
Copy link
Contributor

Should "params" be defined with ff_row and f_cols? Also, around line 69 should you have a return statement? Should the 'return best_ff' be before the else statement and 'return self.new_ffs' be in the else statement?

ericchansen added a commit that referenced this issue Jun 28, 2016
Just moved a few things around in the best_ff property.

Working on issue #22.
@ericchansen
Copy link
Owner

I made a commit earlier that may do what you're talking about, but I didn't reference this issue. I just updated that now. Do the changes in commit a809c7b do what you are talking about?

@nberkel05
Copy link
Contributor

Yes, exactly.

@peonor
Copy link
Contributor Author

peonor commented Jun 28, 2016

I downloaded the iss22 branch (first time I tried accessing another branch :-) ), running it, not crashing. However, the Simplex also is not behaving as it should. It has thrown away my best point, continued with a worse set. Will continue to test, se what I can find.

To help understanding, I think it would helpt if the simplex ORDERED FF SCORES were printed after each cycle, now I had to infer from behavior what happened. But it was clearly going wrong, I started out with a simplex of only two parameters and the scores "93.8 396.1 0.3", and a bit later it tried to expand in a score of "23.7", impossible unless it threw away the "0.3" score first. Will do my own modifications of the code and test, but I remember having a simple, fast test case with only two bond lengths in one small structure (I'm using glycol), fast!

@peonor
Copy link
Contributor Author

peonor commented Jun 28, 2016

And now my example crashed, after the simplex, in a new place, datatypes.py line 281, seems "self.path" is NoneType. I'll work on it today, upload a simple example if I cannot crack it.

@peonor
Copy link
Contributor Author

peonor commented Jun 28, 2016

If anyone wants it, here is my error message from the example where I'm just doing two simple bond lengths:

opt
simplex -- Writing best force field from simplex.
Traceback (most recent call last):
File "/home/kcjh508/q2mm/loop.py", line 192, in
main(sys.argv[1:])
File "/home/kcjh508/q2mm/loop.py", line 186, in main
loop.run_loop_input(lines)
File "/home/kcjh508/q2mm/loop.py", line 108, in run_loop_input
self.ff = loop.opt_loop()
File "/home/kcjh508/q2mm/loop.py", line 42, in opt_loop
self.loop_lines, score=self.ff.score)
File "/home/kcjh508/q2mm/loop.py", line 148, in run_loop_input
self.ff = simp.run(r_data=self.ref_data)
File "/home/kcjh508/q2mm/opt.py", line 32, in wrapper
return func(_args, *_kwargs)
File "/home/kcjh508/q2mm/simplex.py", line 318, in run
best_ff.export_ff(best_ff.path)
File "/home/kcjh508/q2mm/datatypes.py", line 633, in export_ff
if lines is None and self.lines is None:
File "/home/kcjh508/q2mm/datatypes.py", line 281, in lines
with open(self.path, 'r') as f:
TypeError: coercing to Unicode: need string or buffer, NoneType found

ericchansen added a commit that referenced this issue Jun 28, 2016
Merging Per-Ola's changes to simplex, which were made in reference to issue #22.
@ericchansen
Copy link
Owner

Merged your changes Per-Ola, they look good.

However, I'm guessing those changes don't address the bug in your last comment.

Here's my take on your error message. At the end of simplex.run, it wants to write the best FF. It then goes to datatypes.export_ff to do so. Each FF object has an attribute called lines (might look something like ff.lines), which is list of strings, literally it has every line of mm3.fld. It is generated by something like

with open(somefilename, 'r') as f:
    ff.lines = f.readlines()

Because that's so big, I only store it on self.ff, or the initial FF, not all the copies.

Apparently, whatever FF it is trying to write doesn't have that property, so instead it attempts to read the file again, rather than use the lines stored in memory. However, it also can't do that because the property path (maybe ff.path or best_ff.path) is also missing.

These properties must be copied to the the FF it is attempting to write at some point.

Somewhere in your example, I'm assuming the code goes off the standard path and ends up at this point where the attributes don't get copied like they should. I say that because my examples I tried running very recently last night seemed to work fine. We should probably look for a diverging if statement or something similar. I'll check it out more after a few meetings I have this morning.

@ericchansen ericchansen mentioned this issue Jun 28, 2016
ericchansen added a commit that referenced this issue Jun 28, 2016
Tabs were changed back to spaces. Added comment.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 28, 2016
It's possible that simplex will generate a set of trial FFs where
all of the scores are the same. At this point, simplex should exit.

Previously, there was a break instead of a raise that prevented
simplex from exiting properly.

Also added in another layer of checking thanks to Per-Ola's
suggestions. Now there are two chances to catch the potential
error.

Working on issue #22.
ericchansen added a commit that referenced this issue Jun 28, 2016
In going over other changes, I noticed that one aspect of simplex was
not behaving as intended in the 1998 publication on Q2MM methodology.

This is corrected here. Reference the publication, specifically page
1163,  for more details.

Norrby, P.-O. & Liljefors, T. Automated molecular mechanics
parameterization with simultaneous utilization of experimental and
quantum mechanical data Journal of Computational Chemistry,
Wiley-Blackwell, 1998, 19, 1146-1166.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 28, 2016
Very simple merge resolution here. Nothing to say really.

Referencing issue #22.
	simplex.py
@ericchansen
Copy link
Owner

ericchansen commented Jun 28, 2016

The following was taken from the pull request. I wish it updated on here automatically, but I guess I have to copy and paste it. Also unfortunate, but many of the commits in that pull request will not appear here because issue #22 was not referenced in the commit message.

Lots of helpful changes inside here [I was referencing the pull request of q2mm/q2mm:mysimplex.

Still need to do testing before going forward. Here's some things to consider.

  1. Is simplex reading the estimated derivatives properly from gradient?
  2. How best to handle raise when all trial parameter sets produce the same objective function score.
  3. If simplex follows gradient, should simplex only act on parameters with negative 2nd derivatives? Would simplex do nothing if no parameter has a negative 2nd derivative? Or would we rather keep it the same and have it act on which parameters have the lowest 2nd derivatives, not necessarily negative?
  4. If the user wants to run only simplex and specifies more parameters, than the maximum parameter number set inside simplex.Simplex.__init__, do we want to let them use that number of parameters or should it check the 2nd derivatives?

@peonor
Copy link
Contributor Author

peonor commented Jun 28, 2016

I merged your iss22 branch into my own branch and worked from that, but couldn’t push back since I had no write access to your code, and there was no such branch under Q2MM/q2mm. I had to create a new branch and push that to Q2MM/q2mm, where I had write access. Any other way I could have done it?

/Per-Ola

From: Eric Hansen [mailto:notifications@github.com]
Sent: den 28 juni 2016 17:43
To: ericchansen/q2mm q2mm@noreply.github.com
Cc: Norrby, Per-Ola Per-Ola.Norrby@astrazeneca.com; Author author@noreply.github.com
Subject: Re: [ericchansen/q2mm] Simplex (#22)

The following was taken from the pull request. I wish it updated on here automatically, but I guess I have to copy and paste it. Also unfortunate, but many of the commits in that pull request will not appear here because issue #22#22 was not referenced in the commit message.

Lots of helpful changes inside here [I was referencing the pull request of q2mm/q2mm:mysimplexhttps://github.com/Q2MM/q2mm/tree/mysimplex].

Still need to do testing before going forward. Here's some things to consider.

  1. Is simplex reading the estimated derivatives properly from gradient?
  2. How best to handle raise when all trial parameter sets produce the same objective function score.
  3. If simplex follows gradient, should simplex only act on parameters with negative 2nd derivatives? Would simplex do nothing if no parameter has a negative 2nd derivative? Or would we rather keep it the same and have it act on which parameters have the lowest 2nd derivatives, not necessarily negative?
  4. If the user wants to run only simplex and specifies more parameters, than the maximum parameter number set inside simplex.Simplex.init, do we want to let them use that number of parameters or should it check the 2nd derivatives?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-229090400, or mute the threadhttps://github.com/notifications/unsubscribe/ANVSNLDDlzwHNTSOSjoRpmNmnwW6MJgfks5qQUEegaJpZM4Itj0_.


Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.

@ericchansen
Copy link
Owner

Not sure. It's pretty messing working between forks like this (we just have to be careful where we submit issues in the future). Typically everything between forks is handled with pull requests, which isn't very conducive for close collaboration or coding together. Better suited for adopting fully fleshed out features.

ericchansen added a commit that referenced this issue Jun 28, 2016
A float that didn't change was being recalculated over and over
inside a loop. Changed such that it is only calculcated once before
the loop.

Also changed error handling to accomodate this.

Changes made based upon Per-Ola's suggestion.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 28, 2016
There are FF objects, instances of datatypes.MM3, that have the attribute lines.
datatypes.MM3.lines is a list of strings, literally every line from the mm3.fld
file. Since this is rather large, I try not to store it in memory more than
once.

By the way, datatypes.MM3.lines is used to write FFs in datatypes.MM3.export_ff.

datatypes.MM3.export_ff accepts several arguments. The following is copied from
its __doc__.

    def export_ff(self, path=None, params=None, lines=None):
        """
        Exports the force field to a file, typically mm3.fld.

        Parameters
        ----------
        path : string
               File to be written or overwritten.
        params : list of `datatypes.Param` (or subclass)
        lines : list of strings
                This is what is generated when you read mm3.fld using
                readlines().
        """

The code used to check if lines and self.lines was None, however, self.lines is
now actually a property. When self.lines was updated to being a property, this
negatively affected that behavior.

For example, if you were to say

    if self.lines is None:
        do_stuff()

This would actually call upon the property self.lines. This property looks like
this

    @Property
    def lines(self):
        if self._lines is None:
            with open(self.path, 'r') as f:
                self._lines = f.readlines()
        return self._lines
    @lines.setter
    def lines(self, x):
        self._lines = x

This means that you can call upon self.lines, it will check if self._lines is
None. If it is, then it attempts to read the lines. This would cause problems
when self.path was also None because it couldn't open the file to read the
lines.

Basically, self.lines was being called when self.path was None, which resulted
in errors.

This works on issue #22. There's also a very good chance this will fix many of
the problems Per-Ola has been reporting. More fine tuning may be needed, but I
think this is the core issue.
ericchansen added a commit that referenced this issue Jun 29, 2016
Derivatives are now calculcated every time.

Unfortunately, there is still no way to calculate derivatives if we
restart the job from a par_diff_???.txt file.

References issue #22 and #8.
ericchansen added a commit that referenced this issue Jun 29, 2016
Simplex was previously misnaming FFs during differentiation. That has
now been corrected.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 29, 2016
A float that didn't change was being recalculated over and over
inside a loop. Changed such that it is only calculcated once before
the loop.

Also changed error handling to accomodate this.

Changes made based upon Per-Ola's suggestion.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 29, 2016
There are FF objects, instances of datatypes.MM3, that have the attribute lines.
datatypes.MM3.lines is a list of strings, literally every line from the mm3.fld
file. Since this is rather large, I try not to store it in memory more than
once.

By the way, datatypes.MM3.lines is used to write FFs in datatypes.MM3.export_ff.

datatypes.MM3.export_ff accepts several arguments. The following is copied from
its __doc__.

    def export_ff(self, path=None, params=None, lines=None):
        """
        Exports the force field to a file, typically mm3.fld.

        Parameters
        ----------
        path : string
               File to be written or overwritten.
        params : list of `datatypes.Param` (or subclass)
        lines : list of strings
                This is what is generated when you read mm3.fld using
                readlines().
        """

The code used to check if lines and self.lines was None, however, self.lines is
now actually a property. When self.lines was updated to being a property, this
negatively affected that behavior.

For example, if you were to say

    if self.lines is None:
        do_stuff()

This would actually call upon the property self.lines. This property looks like
this

    @Property
    def lines(self):
        if self._lines is None:
            with open(self.path, 'r') as f:
                self._lines = f.readlines()
        return self._lines
    @lines.setter
    def lines(self, x):
        self._lines = x

This means that you can call upon self.lines, it will check if self._lines is
None. If it is, then it attempts to read the lines. This would cause problems
when self.path was also None because it couldn't open the file to read the
lines.

Basically, self.lines was being called when self.path was None, which resulted
in errors.

This works on issue #22. There's also a very good chance this will fix many of
the problems Per-Ola has been reporting. More fine tuning may be needed, but I
think this is the core issue.
ericchansen added a commit that referenced this issue Jun 29, 2016
Derivatives are now calculcated every time.

Unfortunately, there is still no way to calculate derivatives if we
restart the job from a par_diff_???.txt file.

References issue #22 and #8.
ericchansen added a commit that referenced this issue Jun 29, 2016
Simplex was previously misnaming FFs during differentiation. That has
now been corrected.

Reference issue #22.
ericchansen added a commit that referenced this issue Jun 30, 2016
It looks like issue #22 is resolved. Just ran some tests, and all looks
good.
@ericchansen
Copy link
Owner

Okay, we could still go for some functionality changes, but I think all the bugs are gone.

@ericchansen ericchansen mentioned this issue Jun 30, 2016
ericchansen referenced this issue in Q2MM/q2mm Jul 6, 2016
Sometimes unrestricted optimizations techniques generate absurd parameter
values. If these values exceed what is allowed in the MM3* FF format (I'm
not sure exactly what this is, but I think it's probably 999.9999), then
the entire method of reading and writing FFs breaks down.

THIS IS A TEMPORARY FIX. In this commit, I simply prevent absurd values from
being written.

Ideally, export_ff should raise an exception when attempting to write
rediculous parameter values. This exception should then be handled
appropriately in the various sections of the code that call upon export_ff.
ericchansen added a commit that referenced this issue Jul 6, 2016
Simplex shouldn't experience the bugs mentioned in issue #22. However, other design changes to simplex still need to be made.
This was referenced Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants