Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved implementation of shooting point selector pick (avoid model that might introduce index error in subclasses) #1110

Conversation

hejung
Copy link
Contributor

@hejung hejung commented Apr 5, 2022

After 5 years of using ops it bit me today, for the old implementation it was possible to have idx == len(prob_list) in the last iteration which would then raise an IndexError when trying to get the associated probability from prob_list.

Or in other words in 5 years of ops I never tried to select the last frame of a trajectory as new shooting point, which is a good thing I would say :)

EDIT: typo

@codecov
Copy link

codecov bot commented Apr 5, 2022

Codecov Report

Merging #1110 (502f0a2) into master (9131891) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1110      +/-   ##
==========================================
- Coverage   81.76%   81.75%   -0.01%     
==========================================
  Files         142      142              
  Lines       15612    15612              
==========================================
- Hits        12765    12764       -1     
- Misses       2847     2848       +1     
Impacted Files Coverage Δ
openpathsampling/shooting.py 100.00% <100.00%> (ø)
openpathsampling/netcdfplus/cache.py 63.72% <0.00%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9131891...502f0a2. Read the comment docs.

@sroet
Copy link
Member

sroet commented Apr 5, 2022

Huh, I don't think I have ever seen someone run into that bound xD

Do you mind adding a test for that, probably by overriding _biases to return [0 for _ in trajectory[:-1]+[1] and checking that the last index is indeed picked?

@sroet
Copy link
Member

sroet commented Apr 5, 2022

Hmm, I copied the test to the current master and it does not fail.

I do agree with this fix, but I am a bit puzzled by how this can even be hit.

self._rng.random() should always return something in [0, 1) (0 included, 1 excluded) which means that rand*sum(prob_list) < sum(prob_list) is always true. So the prob <= rand check in the while loop should be False after summing all probabilities (before an out of bounds index is generated).

Did you somehow run this with a random number generator that has 1 included as a possibility?

@hejung
Copy link
Contributor Author

hejung commented Apr 5, 2022

Hm. Strange that they pass. But maybe not so strange afterall, I agree with your judgment that the first condition should be enough to not start the loop again when we summed up all probabilities. I guess the issue was floating point inaccuracies, i.e. that prob < sum_prob even when we summed all up. So if the random number is almost 1 we could have rand <= prob even after adding all probabilities up?

Admittedly the error came from my own selector which now has (in the relevant part of pick):

rand = np.random.random() * sum_bias
idx = 0
prob = biases[0]
while prob <= rand and idx < len(biases) - 1:
            idx += 1
            prob += biases[idx]

And before did not have the -1. I only recalled that this is basically what we do in the ops base selector and thought it would be good to fix it there too.

@sroet
Copy link
Member

sroet commented Apr 5, 2022

I guess the issue was floating point inaccuracies, i.e. that prob < sum_prob even when we summed all up. So if the random number is almost 1 we could have rand <= prob even after adding all probabilities up?

The problem is is that sum does the exact same thing as our loop, so there should be no floating point inaccuracies between the two. Even if there were, the only inaccuracies that would happen during summation are s + e = s if e <<< s which would mean you select a frame that is earlier than the real one.

However, there is a second way to trigger that bug: if all biases are 0, did you check for that in your selector?

@hejung
Copy link
Contributor Author

hejung commented Apr 5, 2022

I do check for the sum of all biases to not be 0, so yes.
However I use np.sum which might be the issue...? I think there it is not save to assume it will do the same as the while loop. As for the python sum implementation: I didnt check if they are doing exactly the same as the while but if that is the case you are indeed right that this can not be triggered in ops except if all biases are 0. Up to you if you still want the "fix" or if we should rather check if all biases are 0.

For reference, the full code for my own pick is:

    def pick(self, trajectory):
        """Return the index of the chosen snapshot within trajectory."""
        biases = self._biases(trajectory)
        sum_bias = np.sum(biases)
        if sum_bias == 0.:
            logger.error('Model not able to give educated guess.\
                         Choosing based on luck.')
            # we can not give any meaningfull advice and choose at random
            return np.random.randint(len(trajectory))

        rand = np.random.random() * sum_bias
        idx = 0
        prob = biases[0]
        while prob <= rand and idx < len(biases) - 1:
            idx += 1
            prob += biases[idx]

        # let the model know which SP we chose
        self.model.register_sp(trajectory[idx])
        return idx

@sroet
Copy link
Member

sroet commented Apr 5, 2022

Ah yeah, I think numpy does something clever like sorting before summing if I remember correctly (this was not true, it uses pairwise summation instead)

I do still agree with the fix. I just was a bit puzzled

For your own selector, you probably instead want use np.random.choice instead 😉

biases = np.array(self._biases(trajectory))
biases /= np.sum(biases)
idx = np.random.choice(len(trajectory), p=biases)

@sroet
Copy link
Member

sroet commented Apr 5, 2022

@dwhswenson dwhswenson changed the title fixed possible (but very rare) out of bound access in shooting point selector pick implementation Improved implementation of shooting point selector pick (avoid model that might introduce index error in subclasses) Apr 5, 2022
Copy link
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the change -- it feels like safer code to me. But I think you're right that the issue is mixing in np.sum; we're always using left-to-right summation. Because of that, I think we're actually safe from this already (only because we happened to use regular sum -- I would not have thought of this issue while writing the code!)

This code is an improvement and a better model for others implementing selectors, so accepting it. Marking is as misc PR though (not bugfix).

@dwhswenson dwhswenson merged commit a414001 into openpathsampling:master Apr 5, 2022
@dwhswenson dwhswenson mentioned this pull request Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants