New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix random seed in tracking #1602
Fix random seed in tracking #1602
Conversation
Hello @GuillaumeTh, Thank you for updating ! Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 17, 2018 at 16:40 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #1602 +/- ##
==========================================
+ Coverage 87.34% 87.35% +0.01%
==========================================
Files 246 246
Lines 31811 31841 +30
Branches 3451 3456 +5
==========================================
+ Hits 27785 27815 +30
Misses 3204 3204
Partials 822 822
Continue to review full report at Codecov.
|
I don't know how to resolve the PEP8 issue |
where = np.repeat(where, seeds_per_voxel, axis=0) | ||
seeds = where + grid - .5 | ||
seeds = [] | ||
for i in range(1, seeds_per_voxel + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much slower than before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible but you call this function one time so I think if it's slower it's not a real problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will benchmark before and after the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Garyfallidis Yes, it is slower I check to be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, @Garyfallidis I think I can't be more faster the slower line is to set the random seed in numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).
dipy/tracking/local/localtracking.py
Outdated
@@ -116,6 +117,9 @@ def _generate_streamlines(self): | |||
B = F.copy() | |||
for s in self.seeds: | |||
s = np.dot(lin, s) + offset | |||
# Fix the random seed in numpy and random | |||
random.seed(np.sum(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuillaumeTh Not sure what is the fix here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Garyfallidis We fix the random seed with the sum of seed coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more clear if I say " Set the random seed in numpy and random" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @GuillaumeTh for this.
dipy/tracking/utils.py
Outdated
@@ -439,6 +438,8 @@ def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True, | |||
The mapping between voxel indices and the point space for seeds. A | |||
seed point at the center the voxel ``[i, j, k]`` will be represented as | |||
``[x, y, z]`` where ``[x, y, z, 1] == np.dot(affine, [i, j, k , 1])``. | |||
random_seed : int | |||
The seed for the ramdom seed generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ramdom -> random
The "seed" is a bit confusing here in the doc since we use seed for 2 different things. I suggest adding after "seed generator" : (numpy.random.seed)
dipy/tracking/local/localtracking.py
Outdated
@@ -116,6 +117,9 @@ def _generate_streamlines(self): | |||
B = F.copy() | |||
for s in self.seeds: | |||
s = np.dot(lin, s) + offset | |||
# Set the random seed in numpy and random | |||
random.seed(np.sum(s)) | |||
np.random.seed(np.sum(s.astype(np.int))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random.seed(.)
should also take as input the random_seed
parameter. something like:
random.seed(np.sum(s) + random_seed)
. Otherwise, the same seed will give the same streamline, even with a different random_seed
parameter
Also, np.random.seed(np.sum(s.astype(np.int)))
will results in the same initial direction for seeds located a different positions in the same voxel (e.g. using nearest neinourg interpolation for the SH). I suggest using:
s_random_seed = hash(np.sum(s) + random_seed)
random.seed(s_random_seed)
np.random.seed(s_random_seed)
dipy/tracking/utils.py
Outdated
seeds = asarray(seeds) | ||
|
||
np.random.seed(random_seed) | ||
if not seed_count_per_voxel: | ||
# Randomize the seeds and select the requested amount | ||
np.random.shuffle(seeds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, the expected behavior won't happen if seed_count_per_voxel==False
.
With seed_count_per_voxel==True
, your modification makes that if I fixe random_seed
and generate streamlines with 1 seeds per voxel, then with 2 seeds per voxel, the first set of streamlines is included in the second set.
With seed_count_per_voxel==False
and fixe random_seed
, If I generate 10,000 streamlines, then 10,001 streamlines, the first set is not guaranteed to be in the second.
To adress this, you will need to first generate the 1st seed for all voxel and do the np.random.shuffle(seeds)
, if len(seeds) < seed_count
generate an additional seed per voxel, shuffle, append to seeds and redo the test. Then seeds = seeds[:seeds_count]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll modify the function to have this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we want less than len(seeds) we have not the same seeds. Something that could be correct this problem is to shuffle the data first and then compute the seeds.
@gabknight Do you agree ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will push this feature. I tested it and it works correctly for npv and nt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think you solved it but by shuffling the indices instead of the seeds.
Can you add a test comparing the resulting seed positions, with the same random_seed
but various npv and nt, e.g. with a mask of 100 voxel:
random_seeds_from_mask(2, True)[:150]
== random_seeds_from_mask(3, True)[:150]
== random_seeds_from_mask(150, False)[:150]
== random_seeds_from_mask(500, False)[:150]
dipy/tracking/utils.py
Outdated
for s in where: | ||
# Set the random seed with the current seed, the current value of | ||
# seeds per voxel and the global random seed. | ||
np.random.seed((s + 1) * i + random_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, I think this should be np.random.seed(hash((np.sum(s) + 1) * i + random_seed))
.
@gabknight after the dipy tests I saw that is know a good idea to use directly a hash because the hash could be superior than 2^32 -1 and it crash when we set the random seed. Maybe just remove the hash for random_seeds_from_mask and _generate_streamlines ? Or module with 2^32 - 1. In python2 everything is fine but not in python3. |
@GuillaumeTh OK, I don't know what is best for the hash. I suggested to use
Adding hash(.)%(2^32 - 1) is probabably OK. |
I added the test and the hash correction. Any things else @gabknight @Garyfallidis @skoudoro ? |
Hi, I would like to use this code quickly. Is it possible to say me if everything is ok @skoudoro @gabknight ? |
LGTM. thx @GuillaumeTh |
dipy/tracking/local/localtracking.py
Outdated
@@ -1,4 +1,5 @@ | |||
import numpy as np | |||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be really "strict", this import should be first in its own block since this is a core Python import.
@@ -195,15 +195,13 @@ def test_probabilistic_odf_weighted_tracker(): | |||
def allclose(x, y): | |||
return x.shape == y.shape and np.allclose(x, y) | |||
|
|||
path = [False, False] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but why was this modified?
where = np.repeat(where, seeds_per_voxel, axis=0) | ||
seeds = where + grid - .5 | ||
seeds = [] | ||
for i in range(1, seeds_per_voxel + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).
dipy/tracking/utils.py
Outdated
@@ -413,16 +413,15 @@ def seeds_from_mask(mask, density=[1, 1, 1], voxel_size=None, affine=None): | |||
|
|||
|
|||
def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True, | |||
affine=None): | |||
affine=None, random_seed=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuillaumeTh Could you test how complicated it would be to also manage the case where random_seed=None
, i.e. set to the previous default behavior? In case someone wants to keep the original behavior.
e063f16
to
34849fc
Compare
LGTM. Since @gabknight has already given a thumbs up, I'll wait until tuesday evening to merge. |
Fix random seed by voxels during the tracking to be reproducible in each voxels. This fix impact the PFT and the local tracking.
More details in issue #1596 .
@gabknight, @skoudoro everything looks fine for the PFT results.