New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix random seed in tracking #1602

Merged
merged 17 commits into from Aug 22, 2018

Conversation

Projects
None yet
6 participants
@GuillaumeTh
Copy link
Contributor

GuillaumeTh commented Jul 28, 2018

Fix random seed by voxels during the tracking to be reproducible in each voxels. This fix impact the PFT and the local tracking.

More details in issue #1596 .

@gabknight, @skoudoro everything looks fine for the PFT results.

@pep8speaks

This comment has been minimized.

Copy link

pep8speaks commented Jul 29, 2018

Hello @GuillaumeTh, Thank you for updating !

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 17, 2018 at 16:40 Hours UTC
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 29, 2018

Codecov Report

Merging #1602 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1602      +/-   ##
==========================================
+ Coverage   87.34%   87.35%   +0.01%     
==========================================
  Files         246      246              
  Lines       31811    31841      +30     
  Branches     3451     3456       +5     
==========================================
+ Hits        27785    27815      +30     
  Misses       3204     3204              
  Partials      822      822
Impacted Files Coverage Δ
dipy/tracking/local/tests/test_tracking.py 95.58% <100%> (+0.08%) ⬆️
dipy/tracking/local/localtracking.py 97.75% <100%> (+0.16%) ⬆️
dipy/tracking/utils.py 88.92% <100%> (+0.35%) ⬆️
dipy/tracking/tests/test_utils.py 99.29% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a6aa5a...0805e4d. Read the comment docs.

@GuillaumeTh

This comment has been minimized.

Copy link
Contributor

GuillaumeTh commented Jul 29, 2018

I don't know how to resolve the PEP8 issue

where = np.repeat(where, seeds_per_voxel, axis=0)
seeds = where + grid - .5
seeds = []
for i in range(1, seeds_per_voxel + 1):

This comment has been minimized.

@Garyfallidis

Garyfallidis Jul 30, 2018

Member

This looks much slower than before

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

It is possible but you call this function one time so I think if it's slower it's not a real problem.

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

I will benchmark before and after the fix

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

@Garyfallidis Yes, it is slower I check to be faster.

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

So, @Garyfallidis I think I can't be more faster the slower line is to set the random seed in numpy.

This comment has been minimized.

@jchoude

jchoude Aug 16, 2018

Contributor

This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
B = F.copy()
for s in self.seeds:
s = np.dot(lin, s) + offset
# Fix the random seed in numpy and random
random.seed(np.sum(s))

This comment has been minimized.

@Garyfallidis

Garyfallidis Jul 30, 2018

Member

@GuillaumeTh Not sure what is the fix here.

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

@Garyfallidis We fix the random seed with the sum of seed coordinates.

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 30, 2018

Contributor

Is it more clear if I say " Set the random seed in numpy and random" ?

@gabknight
Copy link
Contributor

gabknight left a comment

Thx @GuillaumeTh for this.

@@ -439,6 +438,8 @@ def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
The mapping between voxel indices and the point space for seeds. A
seed point at the center the voxel ``[i, j, k]`` will be represented as
``[x, y, z]`` where ``[x, y, z, 1] == np.dot(affine, [i, j, k , 1])``.
random_seed : int
The seed for the ramdom seed generator.

This comment has been minimized.

@gabknight

gabknight Jul 31, 2018

Contributor

ramdom -> random
The "seed" is a bit confusing here in the doc since we use seed for 2 different things. I suggest adding after "seed generator" : (numpy.random.seed)

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
B = F.copy()
for s in self.seeds:
s = np.dot(lin, s) + offset
# Set the random seed in numpy and random
random.seed(np.sum(s))
np.random.seed(np.sum(s.astype(np.int)))

This comment has been minimized.

@gabknight

gabknight Jul 31, 2018

Contributor

random.seed(.) should also take as input the random_seed parameter. something like:
random.seed(np.sum(s) + random_seed). Otherwise, the same seed will give the same streamline, even with a different random_seedparameter

Also, np.random.seed(np.sum(s.astype(np.int))) will results in the same initial direction for seeds located a different positions in the same voxel (e.g. using nearest neinourg interpolation for the SH). I suggest using:

s_random_seed = hash(np.sum(s) + random_seed)
random.seed(s_random_seed)
np.random.seed(s_random_seed)
seeds = asarray(seeds)

np.random.seed(random_seed)
if not seed_count_per_voxel:
# Randomize the seeds and select the requested amount
np.random.shuffle(seeds)

This comment has been minimized.

@gabknight

gabknight Jul 31, 2018

Contributor

As far as I understand, the expected behavior won't happen if seed_count_per_voxel==False.

With seed_count_per_voxel==True, your modification makes that if I fixe random_seed and generate streamlines with 1 seeds per voxel, then with 2 seeds per voxel, the first set of streamlines is included in the second set.

With seed_count_per_voxel==False and fixe random_seed, If I generate 10,000 streamlines, then 10,001 streamlines, the first set is not guaranteed to be in the second.

To adress this, you will need to first generate the 1st seed for all voxel and do the np.random.shuffle(seeds), if len(seeds) < seed_count generate an additional seed per voxel, shuffle, append to seeds and redo the test. Then seeds = seeds[:seeds_count].

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 31, 2018

Contributor

I'll modify the function to have this feature.

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 31, 2018

Contributor

But if we want less than len(seeds) we have not the same seeds. Something that could be correct this problem is to shuffle the data first and then compute the seeds.

@gabknight Do you agree ?

This comment has been minimized.

@GuillaumeTh

GuillaumeTh Jul 31, 2018

Contributor

I will push this feature. I tested it and it works correctly for npv and nt

This comment has been minimized.

@gabknight

gabknight Aug 3, 2018

Contributor

OK. I think you solved it but by shuffling the indices instead of the seeds.

Can you add a test comparing the resulting seed positions, with the same random_seed but various npv and nt, e.g. with a mask of 100 voxel:

random_seeds_from_mask(2, True)[:150] 
== random_seeds_from_mask(3, True)[:150] 
== random_seeds_from_mask(150, False)[:150] 
== random_seeds_from_mask(500, False)[:150]
for s in where:
# Set the random seed with the current seed, the current value of
# seeds per voxel and the global random seed.
np.random.seed((s + 1) * i + random_seed)

This comment has been minimized.

@gabknight

gabknight Jul 31, 2018

Contributor

For consistency, I think this should be np.random.seed(hash((np.sum(s) + 1) * i + random_seed)).

@GuillaumeTh

This comment has been minimized.

Copy link
Contributor

GuillaumeTh commented Jul 31, 2018

@gabknight after the dipy tests I saw that is know a good idea to use directly a hash because the hash could be superior than 2^32 -1 and it crash when we set the random seed. Maybe just remove the hash for random_seeds_from_mask and _generate_streamlines ? Or module with 2^32 - 1.

In python2 everything is fine but not in python3.

@gabknight

This comment has been minimized.

Copy link
Contributor

gabknight commented Aug 3, 2018

@GuillaumeTh OK, I don't know what is best for the hash. I suggested to use hash(.) because this is what happen under the hood for random.seed(.): (from https://docs.python.org/2/library/random.html)

random.seed(a=None)¶
Initialize internal state of the random number generator.

None or no argument seeds from current time or from an operating system specific randomness source if available (see the os.urandom() function for details on availability).

If a is not None or an int or a long, then hash(a) is used instead. Note that the hash values for some types are nondeterministic when PYTHONHASHSEED is enabled.

Adding hash(.)%(2^32 - 1) is probabably OK.

@GuillaumeTh

This comment has been minimized.

Copy link
Contributor

GuillaumeTh commented Aug 3, 2018

I added the test and the hash correction.

Any things else @gabknight @Garyfallidis @skoudoro ?

@GuillaumeTh

This comment has been minimized.

Copy link
Contributor

GuillaumeTh commented Aug 13, 2018

Hi,

I would like to use this code quickly. Is it possible to say me if everything is ok @skoudoro @gabknight ?

@gabknight

This comment has been minimized.

Copy link
Contributor

gabknight commented Aug 16, 2018

LGTM. thx @GuillaumeTh

@@ -1,4 +1,5 @@
import numpy as np
import random

This comment has been minimized.

@jchoude

jchoude Aug 16, 2018

Contributor

If we want to be really "strict", this import should be first in its own block since this is a core Python import.

@@ -195,15 +195,13 @@ def test_probabilistic_odf_weighted_tracker():
def allclose(x, y):
return x.shape == y.shape and np.allclose(x, y)

path = [False, False]

This comment has been minimized.

@jchoude

jchoude Aug 16, 2018

Contributor

I might be missing something, but why was this modified?

where = np.repeat(where, seeds_per_voxel, axis=0)
seeds = where + grid - .5
seeds = []
for i in range(1, seeds_per_voxel + 1):

This comment has been minimized.

@jchoude

jchoude Aug 16, 2018

Contributor

This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

@@ -413,16 +413,15 @@ def seeds_from_mask(mask, density=[1, 1, 1], voxel_size=None, affine=None):


def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
affine=None):
affine=None, random_seed=0):

This comment has been minimized.

@jchoude

jchoude Aug 16, 2018

Contributor

@GuillaumeTh Could you test how complicated it would be to also manage the case where random_seed=None, i.e. set to the previous default behavior? In case someone wants to keep the original behavior.

@GuillaumeTh GuillaumeTh force-pushed the GuillaumeTh:NF_set_random_seed_in_tracking branch from e063f16 to 34849fc Aug 17, 2018

@jchoude

This comment has been minimized.

Copy link
Contributor

jchoude commented Aug 20, 2018

LGTM. Since @gabknight has already given a thumbs up, I'll wait until tuesday evening to merge.

@jchoude jchoude merged commit 6376cd2 into nipy:master Aug 22, 2018

3 checks passed

codecov/patch 100% of diff hit (target 87.34%)
Details
codecov/project 87.35% (+0.01%) compared to 5a6aa5a
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment