Fix random seed in tracking #1602

GuillaumeTh · 2018-07-28T22:52:10Z

Fix random seed by voxels during the tracking to be reproducible in each voxels. This fix impact the PFT and the local tracking.

More details in issue #1596 .

@gabknight, @skoudoro everything looks fine for the PFT results.

pep8speaks · 2018-07-29T19:24:21Z

Hello @GuillaumeTh, Thank you for updating !

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 17, 2018 at 16:40 Hours UTC

codecov-io · 2018-07-29T21:33:56Z

Codecov Report

Merging #1602 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1602      +/-   ##
==========================================
+ Coverage   87.34%   87.35%   +0.01%     
==========================================
  Files         246      246              
  Lines       31811    31841      +30     
  Branches     3451     3456       +5     
==========================================
+ Hits        27785    27815      +30     
  Misses       3204     3204              
  Partials      822      822

Impacted Files	Coverage Δ
dipy/tracking/local/tests/test_tracking.py	`95.58% <100%> (+0.08%)`	⬆️
dipy/tracking/local/localtracking.py	`97.75% <100%> (+0.16%)`	⬆️
dipy/tracking/utils.py	`88.92% <100%> (+0.35%)`	⬆️
dipy/tracking/tests/test_utils.py	`99.29% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a6aa5a...0805e4d. Read the comment docs.

GuillaumeTh · 2018-07-29T22:28:12Z

I don't know how to resolve the PEP8 issue

Garyfallidis · 2018-07-30T16:53:13Z

dipy/tracking/utils.py

-    where = np.repeat(where, seeds_per_voxel, axis=0)
-    seeds = where + grid - .5
+    seeds = []
+    for i in range(1, seeds_per_voxel + 1):


This looks much slower than before

It is possible but you call this function one time so I think if it's slower it's not a real problem.

I will benchmark before and after the fix

@Garyfallidis Yes, it is slower I check to be faster.

So, @Garyfallidis I think I can't be more faster the slower line is to set the random seed in numpy.

This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

Garyfallidis · 2018-07-30T16:54:00Z

dipy/tracking/local/localtracking.py

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
        B = F.copy()
        for s in self.seeds:
            s = np.dot(lin, s) + offset
+            # Fix the random seed in numpy and random
+            random.seed(np.sum(s))


@GuillaumeTh Not sure what is the fix here.

@Garyfallidis We fix the random seed with the sum of seed coordinates.

Is it more clear if I say " Set the random seed in numpy and random" ?

gabknight

Thx @GuillaumeTh for this.

gabknight · 2018-07-31T08:55:03Z

dipy/tracking/utils.py

@@ -439,6 +438,8 @@ def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
        The mapping between voxel indices and the point space for seeds. A
        seed point at the center the voxel ``[i, j, k]`` will be represented as
        ``[x, y, z]`` where ``[x, y, z, 1] == np.dot(affine, [i, j, k , 1])``.
+    random_seed : int
+        The seed for the ramdom seed generator.


ramdom -> random
The "seed" is a bit confusing here in the doc since we use seed for 2 different things. I suggest adding after "seed generator" : (numpy.random.seed)

gabknight · 2018-07-31T09:31:14Z

dipy/tracking/local/localtracking.py

@@ -116,6 +117,9 @@ def _generate_streamlines(self):
        B = F.copy()
        for s in self.seeds:
            s = np.dot(lin, s) + offset
+            # Set the random seed in numpy and random
+            random.seed(np.sum(s))
+            np.random.seed(np.sum(s.astype(np.int)))


random.seed(.) should also take as input the random_seed parameter. something like:
random.seed(np.sum(s) + random_seed). Otherwise, the same seed will give the same streamline, even with a different random_seedparameter

Also, np.random.seed(np.sum(s.astype(np.int))) will results in the same initial direction for seeds located a different positions in the same voxel (e.g. using nearest neinourg interpolation for the SH). I suggest using:

s_random_seed = hash(np.sum(s) + random_seed) random.seed(s_random_seed) np.random.seed(s_random_seed)

gabknight · 2018-07-31T10:13:37Z

dipy/tracking/utils.py

    seeds = asarray(seeds)

+    np.random.seed(random_seed)
    if not seed_count_per_voxel:
        # Randomize the seeds and select the requested amount
        np.random.shuffle(seeds)


As far as I understand, the expected behavior won't happen if seed_count_per_voxel==False.

With seed_count_per_voxel==True, your modification makes that if I fixe random_seed and generate streamlines with 1 seeds per voxel, then with 2 seeds per voxel, the first set of streamlines is included in the second set.

With seed_count_per_voxel==False and fixe random_seed, If I generate 10,000 streamlines, then 10,001 streamlines, the first set is not guaranteed to be in the second.

To adress this, you will need to first generate the 1st seed for all voxel and do the np.random.shuffle(seeds), if len(seeds) < seed_count generate an additional seed per voxel, shuffle, append to seeds and redo the test. Then seeds = seeds[:seeds_count].

I'll modify the function to have this feature.

But if we want less than len(seeds) we have not the same seeds. Something that could be correct this problem is to shuffle the data first and then compute the seeds.

@gabknight Do you agree ?

I will push this feature. I tested it and it works correctly for npv and nt

OK. I think you solved it but by shuffling the indices instead of the seeds.

Can you add a test comparing the resulting seed positions, with the same random_seed but various npv and nt, e.g. with a mask of 100 voxel:

random_seeds_from_mask(2, True)[:150] == random_seeds_from_mask(3, True)[:150] == random_seeds_from_mask(150, False)[:150] == random_seeds_from_mask(500, False)[:150]

gabknight · 2018-07-31T13:06:09Z

dipy/tracking/utils.py

+        for s in where:
+            # Set the random seed with the current seed, the current value of
+            # seeds per voxel and the global random seed.
+            np.random.seed((s + 1) * i + random_seed)


For consistency, I think this should be np.random.seed(hash((np.sum(s) + 1) * i + random_seed)).

GuillaumeTh · 2018-07-31T22:41:00Z

@gabknight after the dipy tests I saw that is know a good idea to use directly a hash because the hash could be superior than 2^32 -1 and it crash when we set the random seed. Maybe just remove the hash for random_seeds_from_mask and _generate_streamlines ? Or module with 2^32 - 1.

In python2 everything is fine but not in python3.

gabknight · 2018-08-03T08:51:33Z

@GuillaumeTh OK, I don't know what is best for the hash. I suggested to use hash(.) because this is what happen under the hood for random.seed(.): (from https://docs.python.org/2/library/random.html)

random.seed(a=None)¶
Initialize internal state of the random number generator.

None or no argument seeds from current time or from an operating system specific randomness source if available (see the os.urandom() function for details on availability).

If a is not None or an int or a long, then hash(a) is used instead. Note that the hash values for some types are nondeterministic when PYTHONHASHSEED is enabled.

Adding hash(.)%(2^32 - 1) is probabably OK.

GuillaumeTh · 2018-08-03T14:05:11Z

I added the test and the hash correction.

Any things else @gabknight @Garyfallidis @skoudoro ?

GuillaumeTh · 2018-08-13T14:05:10Z

Hi,

I would like to use this code quickly. Is it possible to say me if everything is ok @skoudoro @gabknight ?

gabknight · 2018-08-16T11:25:15Z

LGTM. thx @GuillaumeTh

jchoude · 2018-08-16T15:44:27Z

dipy/tracking/local/localtracking.py

@@ -1,4 +1,5 @@
 import numpy as np
+import random


If we want to be really "strict", this import should be first in its own block since this is a core Python import.

jchoude · 2018-08-16T15:47:38Z

dipy/tracking/local/tests/test_tracking.py

@@ -195,15 +195,13 @@ def test_probabilistic_odf_weighted_tracker():
    def allclose(x, y):
        return x.shape == y.shape and np.allclose(x, y)

-    path = [False, False]


I might be missing something, but why was this modified?

jchoude · 2018-08-16T15:51:54Z

dipy/tracking/utils.py

-    where = np.repeat(where, seeds_per_voxel, axis=0)
-    seeds = where + grid - .5
+    seeds = []
+    for i in range(1, seeds_per_voxel + 1):


This will definitively be a bit slower than before. However, I think that, as @GuillaumeTh said, this is called only once at the beginning, to generate the seeds. In the overall processing time, this should be negligible, and has the big advantage of allowing a truly reproducible tracking (between runs on the same dataset).

jchoude · 2018-08-16T15:53:08Z

dipy/tracking/utils.py

@@ -413,16 +413,15 @@ def seeds_from_mask(mask, density=[1, 1, 1], voxel_size=None, affine=None):


 def random_seeds_from_mask(mask, seeds_count=1, seed_count_per_voxel=True,
-                           affine=None):
+                           affine=None, random_seed=0):


@GuillaumeTh Could you test how complicated it would be to also manage the case where random_seed=None, i.e. set to the previous default behavior? In case someone wants to keep the original behavior.

jchoude · 2018-08-20T13:18:33Z

LGTM. Since @gabknight has already given a thumbs up, I'll wait until tuesday evening to merge.

Garyfallidis reviewed Jul 30, 2018

View reviewed changes

gabknight reviewed Jul 31, 2018

View reviewed changes

jchoude reviewed Aug 16, 2018

View reviewed changes

GuillaumeTh added 16 commits August 17, 2018 10:54

Fix random seed in tracking

85a5d2b

Update tests and example

d79559a

Remove testing print

912e07b

Fix Doctest error

fc90bea

Fix comments

dcb1341

Remove testing function

237103e

Modify shuffle and update example

392fa91

Add random_seed parameter and use hash

f336470

Change range for arange

8304a67

Unify random hash

0336594

Add abs for 0,0,0 case

9a841bf

Module the hash value

cab9e90

Add test for random_seeds_from_mask

0a3ce32

Fix PEP8

015b54a

Set the random seed to None

33c6984

Move random in if condition

34849fc

GuillaumeTh force-pushed the NF_set_random_seed_in_tracking branch from e063f16 to 34849fc Compare August 17, 2018 15:07

Add test for code coverage

0805e4d

jchoude merged commit 6376cd2 into dipy:master Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix random seed in tracking #1602

Fix random seed in tracking #1602

GuillaumeTh commented Jul 28, 2018

pep8speaks commented Jul 29, 2018 •

edited

codecov-io commented Jul 29, 2018 •

edited

GuillaumeTh commented Jul 29, 2018

Garyfallidis Jul 30, 2018

GuillaumeTh Jul 30, 2018

GuillaumeTh Jul 30, 2018

GuillaumeTh Jul 30, 2018 •

edited

GuillaumeTh Jul 30, 2018

jchoude Aug 16, 2018

Garyfallidis Jul 30, 2018

GuillaumeTh Jul 30, 2018 •

edited

GuillaumeTh Jul 30, 2018

gabknight left a comment

gabknight Jul 31, 2018

gabknight Jul 31, 2018

gabknight Jul 31, 2018

GuillaumeTh Jul 31, 2018

GuillaumeTh Jul 31, 2018

GuillaumeTh Jul 31, 2018

gabknight Aug 3, 2018

gabknight Jul 31, 2018

GuillaumeTh commented Jul 31, 2018 •

edited

gabknight commented Aug 3, 2018

GuillaumeTh commented Aug 3, 2018

GuillaumeTh commented Aug 13, 2018

gabknight commented Aug 16, 2018

jchoude Aug 16, 2018

jchoude Aug 16, 2018

jchoude Aug 16, 2018

jchoude Aug 16, 2018

jchoude commented Aug 20, 2018

Fix random seed in tracking #1602

Fix random seed in tracking #1602

Conversation

GuillaumeTh commented Jul 28, 2018

pep8speaks commented Jul 29, 2018 • edited

Comment last updated on August 17, 2018 at 16:40 Hours UTC

codecov-io commented Jul 29, 2018 • edited

Codecov Report

GuillaumeTh commented Jul 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuillaumeTh Jul 30, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuillaumeTh Jul 30, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabknight left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuillaumeTh commented Jul 31, 2018 • edited

gabknight commented Aug 3, 2018

GuillaumeTh commented Aug 3, 2018

GuillaumeTh commented Aug 13, 2018

gabknight commented Aug 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchoude commented Aug 20, 2018

pep8speaks commented Jul 29, 2018 •

edited

codecov-io commented Jul 29, 2018 •

edited

GuillaumeTh Jul 30, 2018 •

edited

GuillaumeTh Jul 30, 2018 •

edited

GuillaumeTh commented Jul 31, 2018 •

edited