Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix random seed in tracking #1602

Merged
merged 17 commits into from Aug 22, 2018

Conversation

GuillaumeTh
Copy link
Contributor

Fix random seed by voxels during the tracking to be reproducible in each voxels. This fix impact the PFT and the local tracking.

More details in issue #1596 .

@gabknight, @skoudoro everything looks fine for the PFT results.

@pep8speaks
Copy link

pep8speaks commented Jul 29, 2018

Hello @GuillaumeTh, Thank you for updating !

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 17, 2018 at 16:40 Hours UTC

@codecov-io
Copy link

codecov-io commented Jul 29, 2018

Codecov Report

Merging #1602 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1602      +/-   ##
==========================================
+ Coverage   87.34%   87.35%   +0.01%     
==========================================
  Files         246      246              
  Lines       31811    31841      +30     
  Branches     3451     3456       +5     
==========================================
+ Hits        27785    27815      +30     
  Misses       3204     3204              
  Partials      822      822
Impacted Files Coverage Δ
dipy/tracking/local/tests/test_tracking.py 95.58% <100%> (+0.08%) ⬆️
dipy/tracking/local/localtracking.py 97.75% <100%> (+0.16%) ⬆️
dipy/tracking/utils.py 88.92% <100%> (+0.35%) ⬆️
dipy/tracking/tests/test_utils.py 99.29% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a6aa5a...0805e4d. Read the comment docs.

@GuillaumeTh
Copy link
Contributor Author

I don't know how to resolve the PEP8 issue

where = np.repeat(where, seeds_per_voxel, axis=0)
seeds = where + grid - .5
seeds = []
for i in range(1, seeds_per_voxel + 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much slower than before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible but you call this function one time so I think if it's slower it's not a real problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will benchmark before and after the fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Garyfallidis Yes, it is slower I check to be faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, @Garyfallidis I think I can't be more faster the slower line is to set the random seed in numpy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
B = F.copy()
for s in self.seeds:
s = np.dot(lin, s) + offset
# Fix the random seed in numpy and random
random.seed(np.sum(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuillaumeTh Not sure what is the fix here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Garyfallidis We fix the random seed with the sum of seed coordinates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more clear if I say " Set the random seed in numpy and random" ?

Copy link
Contributor

@gabknight gabknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @GuillaumeTh for this.

@@ -439,6 +438,8 @@ def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
The mapping between voxel indices and the point space for seeds. A
seed point at the center the voxel ``[i, j, k]`` will be represented as
``[x, y, z]`` where ``[x, y, z, 1] == np.dot(affine, [i, j, k , 1])``.
random_seed : int
The seed for the ramdom seed generator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ramdom -> random
The "seed" is a bit confusing here in the doc since we use seed for 2 different things. I suggest adding after "seed generator" : (numpy.random.seed)

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
B = F.copy()
for s in self.seeds:
s = np.dot(lin, s) + offset
# Set the random seed in numpy and random
random.seed(np.sum(s))
np.random.seed(np.sum(s.astype(np.int)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random.seed(.) should also take as input the random_seed parameter. something like:
random.seed(np.sum(s) + random_seed). Otherwise, the same seed will give the same streamline, even with a different random_seedparameter

Also, np.random.seed(np.sum(s.astype(np.int))) will results in the same initial direction for seeds located a different positions in the same voxel (e.g. using nearest neinourg interpolation for the SH). I suggest using:

s_random_seed = hash(np.sum(s) + random_seed)
random.seed(s_random_seed)
np.random.seed(s_random_seed)

seeds = asarray(seeds)

np.random.seed(random_seed)
if not seed_count_per_voxel:
# Randomize the seeds and select the requested amount
np.random.shuffle(seeds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the expected behavior won't happen if seed_count_per_voxel==False.

With seed_count_per_voxel==True, your modification makes that if I fixe random_seed and generate streamlines with 1 seeds per voxel, then with 2 seeds per voxel, the first set of streamlines is included in the second set.

With seed_count_per_voxel==False and fixe random_seed, If I generate 10,000 streamlines, then 10,001 streamlines, the first set is not guaranteed to be in the second.

To adress this, you will need to first generate the 1st seed for all voxel and do the np.random.shuffle(seeds), if len(seeds) < seed_count generate an additional seed per voxel, shuffle, append to seeds and redo the test. Then seeds = seeds[:seeds_count].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll modify the function to have this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we want less than len(seeds) we have not the same seeds. Something that could be correct this problem is to shuffle the data first and then compute the seeds.

@gabknight Do you agree ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will push this feature. I tested it and it works correctly for npv and nt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think you solved it but by shuffling the indices instead of the seeds.

Can you add a test comparing the resulting seed positions, with the same random_seed but various npv and nt, e.g. with a mask of 100 voxel:

random_seeds_from_mask(2, True)[:150] 
== random_seeds_from_mask(3, True)[:150] 
== random_seeds_from_mask(150, False)[:150] 
== random_seeds_from_mask(500, False)[:150]

for s in where:
# Set the random seed with the current seed, the current value of
# seeds per voxel and the global random seed.
np.random.seed((s + 1) * i + random_seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I think this should be np.random.seed(hash((np.sum(s) + 1) * i + random_seed)).

@GuillaumeTh
Copy link
Contributor Author

GuillaumeTh commented Jul 31, 2018

@gabknight after the dipy tests I saw that is know a good idea to use directly a hash because the hash could be superior than 2^32 -1 and it crash when we set the random seed. Maybe just remove the hash for random_seeds_from_mask and _generate_streamlines ? Or module with 2^32 - 1.

In python2 everything is fine but not in python3.

@gabknight
Copy link
Contributor

@GuillaumeTh OK, I don't know what is best for the hash. I suggested to use hash(.) because this is what happen under the hood for random.seed(.): (from https://docs.python.org/2/library/random.html)

random.seed(a=None)¶
Initialize internal state of the random number generator.

None or no argument seeds from current time or from an operating system specific randomness source if available (see the os.urandom() function for details on availability).

If a is not None or an int or a long, then hash(a) is used instead. Note that the hash values for some types are nondeterministic when PYTHONHASHSEED is enabled.

Adding hash(.)%(2^32 - 1) is probabably OK.

@GuillaumeTh
Copy link
Contributor Author

I added the test and the hash correction.

Any things else @gabknight @Garyfallidis @skoudoro ?

@GuillaumeTh
Copy link
Contributor Author

Hi,

I would like to use this code quickly. Is it possible to say me if everything is ok @skoudoro @gabknight ?

@gabknight
Copy link
Contributor

LGTM. thx @GuillaumeTh

@@ -1,4 +1,5 @@
import numpy as np
import random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be really "strict", this import should be first in its own block since this is a core Python import.

@@ -195,15 +195,13 @@ def test_probabilistic_odf_weighted_tracker():
def allclose(x, y):
return x.shape == y.shape and np.allclose(x, y)

path = [False, False]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why was this modified?

where = np.repeat(where, seeds_per_voxel, axis=0)
seeds = where + grid - .5
seeds = []
for i in range(1, seeds_per_voxel + 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

@@ -413,16 +413,15 @@ def seeds_from_mask(mask, density=[1, 1, 1], voxel_size=None, affine=None):


def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
affine=None):
affine=None, random_seed=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuillaumeTh Could you test how complicated it would be to also manage the case where random_seed=None, i.e. set to the previous default behavior? In case someone wants to keep the original behavior.

@GuillaumeTh GuillaumeTh force-pushed the NF_set_random_seed_in_tracking branch from e063f16 to 34849fc Compare August 17, 2018 15:07
@jchoude
Copy link
Contributor

jchoude commented Aug 20, 2018

LGTM. Since @gabknight has already given a thumbs up, I'll wait until tuesday evening to merge.

@jchoude jchoude merged commit 6376cd2 into dipy:master Aug 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants