-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce a given transform #208
Comments
I agree this would be useful, and that what you suggest makes sense. But maybe a way to do this without changing so much code everywhere is to get the current seed of the PyTorch random number generator and add it to the parameters dictionary, or rather to use a Useful links:
|
I'll keep adding some findings here, for reference. Py 3.8.2 (torchio) ~ ipython ✔ 100% 01:23 13:48:29
Python 3.8.2 (default, Mar 26 2020, 10:43:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch
In [2]: torch.save(torch.get_rng_state(), '/tmp/rng.pth')
In [3]: torch.rand(1)
Out[3]: tensor([0.8218])
In [4]:
Do you really want to exit ([y]/n)?
Py 3.8.2 (torchio) ~ ipython ✔ 100% 41.41s 13:49:12
Python 3.8.2 (default, Mar 26 2020, 10:43:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch
In [2]: torch.set_rng_state(torch.load('/tmp/rng.pth'))
In [3]: torch.rand(1)
Out[3]: tensor([0.8218]) |
I think this would do: In [2]: %run /tmp/romain.py
In [3]: do_random_stuff()
Out[3]: (tensor([0.6550]), 6129670197736294978)
In [4]: do_random_stuff(6129670197736294978)
Out[4]: (tensor([0.6550]), 6129670197736294978) Where import torch
def do_random_stuff(seed=None):
if seed is None:
seed = torch.seed()
else:
torch.manual_seed(seed)
return torch.rand(1), seed |
So you could access the seed used for a specific transformation from the transform history, and use it as an argument to reproduce the exact result. Would that work for you? If yes, I'll open a PR soon. |
this sound like a nice lazy solution ! I guess one should be careful where to put the get_rng_state, and the set (in transform init ... ?) may be something to check, is that only torch is used to get random number (and not numpy.random), it seems to be the case for your transforms (but not the one I added, but well I'll change it) Thanks |
I strongly suggest you do change your random generators to PyTorch in your implementation. I've had trouble before with multiprocessing. Using PyTorch to generate random numbers ensures good results. I think it's related to pytorch/pytorch#5059. |
why that, for each transform you need to get the specific random state, to save in the history ... non ? (and not the seed from the begining of the code) just to be sure, i am clear enough: and from the saved history of a given transform (let say the transform 600 ) I want to retrieve the same random param |
Maybe this script can help me explain what I mean: import torch
class RandomMultiply:
def __init__(self, seed=None):
self.seed = seed
def get_params(self):
if self.seed is None:
seed = torch.seed()
else:
seed = self.seed
torch.manual_seed(seed)
random_number = torch.rand(1).item()
return seed, random_number
def __call__(self, x):
seed, random_number = self.get_params()
x.n = random_number
random_params = {
'seed': seed,
'random_number': random_number,
}
x.history.append(random_params)
return x
class Number:
def __init__(self):
self.n = 100
self.history = []
def main():
number = Number()
num_iterations = 1000
transform = RandomMultiply()
results = []
for _ in range(num_iterations):
number = transform(number)
results.append(number.n)
print('I love this result:', results[600])
random_params = number.history[600]
print('The number was {random_number} and the seed was {seed}'.format(**random_params))
new_number = Number()
reproducer = RandomMultiply(seed=random_params['seed'])
reproduction = reproducer(new_number)
print('\nReproduced result:', reproduction.n)
print('Random parameters:', reproduction.history[0])
if __name__ == "__main__":
main() Output: I love this result: 0.36917614936828613
The number was 0.36917614936828613 and the seed was 9140603397059799205
Reproduced result: 0.36917614936828613
Random parameters: {'seed': 9140603397059799205, 'random_number': 0.36917614936828613} However, sometimes I get an error when I run the script: I love this result: 0.738286554813385
The number was 0.738286554813385 and the seed was 17402630908179140122
Traceback (most recent call last):
File "/tmp/romain.py", line 55, in <module>
main()
File "/tmp/romain.py", line 48, in main
reproduction = reproducer(new_number)
File "/tmp/romain.py", line 18, in __call__
seed, random_number = self.get_params()
File "/tmp/romain.py", line 13, in get_params
torch.manual_seed(seed)
File "/usr/local/Caskroom/miniconda/base/envs/torchio/lib/python3.8/site-packages/torch/random.py", line 34, in manual_seed
return default_generator.manual_seed(seed)
RuntimeError: Overflow when unpacking long |
It works if use |
I love this results too ! It is still not 100% clear what torch.seed is doing .. is it equivalent to torch.manuall_seed call but with a non deterministic random number ? (so at the end it keeps the randomness) The thing I do not like with the use of torch.manual_seed(seed) is that it influence all the code using random that follow, but for my use case I guess I can live with that |
I think so, yes. It probably calls a very low-level RNG and sets the seed with that.
I guess the ideal implementation would include resetting the random state after getting the random parameters, but hopefully that's not a problem. |
but you will break reproducibility of the code in one wants to compare results obtained before and after the changes, ... I guess it is ok, one just need to know. |
why not to use torch.get_rng_state() ? (it won't modify the random state ... |
Yes but
|
Maybe we can see how other libraries using PyTorch deal with this, if they do. |
|
yes sure ... no hurry, (and my first option is may bet not so much code ... mostly copy paste ) it has also the advantage to be explicit and more independent of the deep pytorch seeding ... |
For reference: if I add a manual_seed at the top of the script, I'd expect to always get the same result, but it wasn't happening. Using the import torch
torch.manual_seed(25074390)
class RandomMultiply:
def __init__(self, seed=None):
self.seed = seed
def get_params(self):
if self.seed is None:
seed = torch.seed()
else:
seed = self.seed
torch.manual_seed(seed & (2**63 - 1))
random_number = torch.rand(1).item()
return seed, random_number
def __call__(self, x):
with torch.random.fork_rng():
seed, random_number = self.get_params()
x.n = random_number
random_params = {
'seed': seed,
'random_number': random_number,
}
x.history.append(random_params)
return x
class Number:
def __init__(self):
self.n = 100
self.history = []
def main():
number = Number()
num_iterations = 1000
transform = RandomMultiply()
results = []
for _ in range(num_iterations):
number = transform(number)
results.append(number.n)
print('I love this result:', results[600])
random_params = number.history[600]
print('The number was {random_number} and the seed was {seed}'.format(**random_params))
new_number = Number()
reproducer = RandomMultiply(seed=random_params['seed'])
reproduction = reproducer(new_number)
print('\nReproduced result:', reproduction.n)
print('Random parameters:', reproduction.history[0])
print('A reproducible random number:', torch.rand(1).item())
if __name__ == "__main__":
main() Output: Py 3.8.2 (torchio) ~/tmp python romain.py ✔ 68% (1:50) 14:03:23
I love this result: 0.9413431882858276
The number was 0.9413431882858276 and the seed was 11325910110281226277
Reproduced result: 0.9413431882858276
Random parameters: {'seed': 11325910110281226277, 'random_number': 0.9413431882858276}
A reproducible random number: 0.1055898666381836
Py 3.8.2 (torchio) ~/tmp python romain.py ✔ 68% (1:50) 14:03:25
I love this result: 0.4556175470352173
The number was 0.4556175470352173 and the seed was 11295961902343433785
Reproduced result: 0.4556175470352173
Random parameters: {'seed': 11295961902343433785, 'random_number': 0.4556175470352173}
A reproducible random number: 0.1055898666381836 I'll try with generators. |
New test using generators. I'm probably doing something wrong. The same seed generates different results. Each call had a different manual seed at the top of the script. def get_params(self):
generator = torch.Generator()
if self.seed is None:
seed = generator.initial_seed()
else:
seed = self.seed
generator.manual_seed(seed)
random_number = torch.rand(1).item()
return seed, random_number Py 3.8.2 (torchio) ~/tmp python romain.py ✔ 57% (1:33) 14:19:36
I love this result: 0.1055898666381836
The number was 0.1055898666381836 and the seed was 67280421310721
Reproduced result: 0.1055898666381836
Random parameters: {'seed': 67280421310721, 'random_number': 0.1055898666381836}
A reproducible random number: 0.1055898666381836
Py 3.8.2 (torchio) ~/tmp python romain.py ✔ 56% (1:24) 14:20:03
I love this result: 0.9377833008766174
The number was 0.9377833008766174 and the seed was 67280421310721
Reproduced result: 0.9377833008766174
Random parameters: {'seed': 67280421310721, 'random_number': 0.9377833008766174} <---- same seed as above but yields different number
A reproducible random number: 0.9377833008766174 |
I do not see the problem ... The fact that you obtain different result when you run twice the code, is what is expected, (with random number) non ? what do I miss ? |
Note that I changed the seed at the top for each run. What I want is that the result of a transform is deterministic given a certain input and seed, independently of the manual seed I set at the beginning. BUT if I do set a manual seed at the beginning of the script, I want exactly the same thing to happen every time I run it. |
ok I see, I miss the seconde line of your exemple ... I may not have the last python neither the last pytorch |
Try this #208 (comment) |
it is already in you example, ... |
Oh, sorry! |
I've asked in the PyTorch forum: https://discuss.pytorch.org/t/get-current-random-seed-for-reproducibility/87597 |
Hello, if I may add a little piece of information that I remember from my computer science lectures, a pseudo-random number generator computes a progression from the given seed. Therefore, to be able to find the same result, we not only need to set the same seed but also we need to call the random functions as many times as it was called the first time. |
Hello again, I went from your code @fepegar but used the rng_state instead of the seed in order to reproduce the results (I think the rng_state contains the full information of the generator). This worked. To get the result, I used import torch
torch.manual_seed(25074390)
class RandomMultiply:
def __init__(self, rng_state=torch.get_rng_state()):
self.rng_state = rng_state
torch.set_rng_state(rng_state)
def get_params(self):
"""
if self.seed is None:
seed = torch.seed()
else:
seed = self.seed
torch.manual_seed(seed & (2**63 - 1))
"""
rng_state = torch.get_rng_state()
random_number = torch.rand(1).item()
return rng_state, random_number
def __call__(self, x):
#with torch.random.fork_rng():
rng_state, random_number = self.get_params()
x.n = random_number
random_params = {
'rng_state': rng_state,
'random_number': random_number,
}
x.history.append(random_params)
return x
class Number:
def __init__(self):
self.n = 100
self.history = []
def main():
number = Number()
num_iterations = 1000
transform = RandomMultiply()
results = []
for _ in range(num_iterations):
number = transform(number)
results.append(number.n)
print('I love this result:', results[600])
random_params = number.history[600]
print("The number was {random_number} and the rng_state was {rng_state}".format(**random_params))
new_number = Number()
reproducer = RandomMultiply(rng_state=random_params["rng_state"])
reproduction = reproducer(new_number)
print('\nReproduced result:', reproduction.n)
print('Random parameters:', reproduction.history[0])
torch.set_rng_state(reproduction.history[0]["rng_state"])
print('A reproducible random number:', torch.rand(1).item())
if __name__ == "__main__":
main()
|
Hi @GReguig, Thanks for investigating this with us. We briefly discussed passing the random state in #208 (comment), but I'd like to use the seed if possible, so that the API is not changed and because it's easier and more intuitive for users. However, if that solution satisfies these conditions and we can't think of anything else, I guess we'll need to follow that direction. P.S.: you can specify the language for syntax highlighting in markdown, i.e.
vs print(1 if b is None else 'hey') |
Oh alright, sorry about that. I will investigate this a little bit more then. Thanks |
It will if people use the same version of the libraries they used for the experiment. Which they should be logging. You could say this about every software.
If we managed to set the seed only for the current transform, that would not happen.
"Double" feels arbitrary, and for e.g. the coarse field in
It is indeed nice to see what the parameters were in the log.
I'm not sure I follow here.
I thought they were not necessary because the user is meant to know what transforms they applied. But I agree it'd be useful if you apply e.g.
Ok, I agree with most of these. Still not sure about the implementation. I do like the
It'll definitely be great to have some help once we know what we're after. |
after 10 different experiment, personally I do not remember all ... my idea was that each transform were calling the same get_param fonction (instead of what is done today, to define them in each transform class) let us know how we can help |
Ideally you would log the status of your code (last commit), or even the code itself, in the experiment log. I use TensorBoard and
Where would you put that function? How does it know which parameters it needs to generate?
Thanks! I will. |
I think that the new transform in #222 is a perfect example of why using a seed/random state would be more helpful. How do you reproduce that? Actually, how do you reproduce e.g. a Maybe the best solution is adding a flag to ask the user if they'd like the random state to be saved. |
How about randomly generating a seed for a given transform (and saving it) and setting it specifically for this transform (and the given subject) ? This would only require to save the generated seed and resetting it at the beginning of the transform. I would imagine something like that (the key point would be to systematically manually reset the seed) import torch
torch.manual_seed(25074390)
def gen_seed():
"""
Random seed generator to avoid overflow
:return:
"""
return torch.randint(0, 10**8, (1,))
class RandomMultiply:
def get_params(self):
torch.manual_seed(seed=self.seed)
random_number = torch.rand(1).item()
return self.seed, random_number
def __call__(self, x, seed=gen_seed()):
self.seed = seed
seed, random_number = self.get_params()
x.n = random_number
random_params = {
'seed': seed,
'random_number': random_number,
}
x.history.append(random_params)
return x
class Number:
def __init__(self):
self.n = 100
self.history = []
def main():
number = Number()
num_iterations = 1000
transform = RandomMultiply()
results = []
for _ in range(num_iterations):
number = transform(number)
results.append(number.n)
print('I love this result:', results[600])
random_params = number.history[600]
print('The number was {random_number} and the seed was {seed}'.format(**random_params))
new_number = Number()
reproducer = RandomMultiply()
reproduction = reproducer(new_number, seed=random_params['seed'])
print('\nReproduced result:', reproduction.n)
print('Random parameters:', reproduction.history[0])
torch.manual_seed(reproduction.history[0]["seed"])
print('A reproducible random number:', torch.rand(1).item())
if __name__ == "__main__":
main() Would it work ? |
Ok you make a point there, But it may not be that critical, to reproduce the same transform, with different random values, if the std (and mean) is the same
This is an other point in your favor, although of less importance |
I guess I would put it in the abstract class transform (all transform herit from it right?) This may be a good add one of this code refractory, Today each transform use uniform selection, but I am sure there are other interesting way and here we would have to code it only once ... so that it can be used in any transform ... |
I really don't see the point of this. It looks like a job for method inheritance, which is the way it works now. |
@romainVala it looks like I edited your comment #208 (comment) when I tried to reply to it! Sorry about that :D |
I do think that if we want to make it reproducible, it needs to be 100% reproducible. I think @GReguig's solution might work! I added an option to input a manual seed at the beginning of the script and got this:
|
Although I don't think it's a good idea to add an argument to |
I guess that the point of this stuff is to reproduce a single transform once, so it wouldn't be that bad. |
I've implemented in #226 a solution similar to what @GReguig proposed. The seed is now passed when calling, not when instantiating. It took me a while, but I've managed to add the transforms history as a string, so it can be propagated even to the patches sampled by the import torchio as tio
colin = tio.datasets.Colin27()
transform = tio.RandomAffine()
transformed = transform(colin)
transform_class, seed = transformed.get_applied_transforms()[-1]
new_transform = transform_class()
new_transformed = new_transform(colin, seed=seed) In that example, |
Sound good ! |
That's correct, only the seed is saved at the moment. I've been thinking about how to retrieve the parameters. Maybe they can be stored in |
please, no after retieval ... |
What about a method in the transform that returns the parameters given a seed? That way you wouldn't need to apply it to get them. seed = [get seed from sample history]
parameters = transform.get_parameters_from_seed(seed) |
Ok sorry if I am not clear enough (or if it is a bad idea) Instead of having each transform that define the function get_param, why not having a common function to do it (and each transform will call it ) but this is then an extra argument to the use of a generic get_params function ... |
I prefer, the previous, way where the parameter were directly include in the history, May be it is more "user friendly" if the specific behavior (saving the parameter and or the seeding) could be a user choice ... |
Saving all the parameters is quite problematic. As we said before, to get fully reproducible results, the whole image would need to be saved for e.g. Saving a large number of parameters for e.g. I think we are the library in the field that allows for so much reproducibility with the current system. I think this should be enough, especially because it's a niche (although important) feature that most users will never care about. |
yes we discuss, about the difficulty to save large number of parameter, and this was the reason why you remove them from the sample dictionary, (with the argument it will make torch collate job to complicated (although it was working ...) I think they are 2 distinct objectives, that are overlapping in this discussion: I thought it was more convenient to have the 2_ functionality directly in torchio (as a user option), but having a way to program it outside torchio, is fine to me (with I agree reproducibility is important (but never used), but not sure most user will never care to get the transform parameters. My point 2_ is not for reproducibility, but is needed, to look at transform induce bias on model prediction. (ok nobody is doing that today, they only want the data augmentation stuff, and the deep model will solve it for you, but I think it is important to better understand model generalization ...) By the way, having the option for controling different random distribution for parameters (normal uniform poisson ect ..) worth a new issue ? |
The parameters were not being saved in the batch! That has been added in #226.
I think that the current implementation is a good compromise between feasibility in terms of how much information must be stored to reproduce/extract parameters and amount of work for the user.
Yes, please. |
Concerning my comment on you PR may be it is better to discuss here (since there still are some confusion (or disagreement ...) about the main objective) the reproducibity is not only to reproduce the random parameter of a given transform, but to be able (from a saved history) to reproduce, all transform applied to the image As it was done before, I could read from the history, the order of transform that were apply (I was just missing some "basic" transform because no information was added to the history (for instance RescaleIntensity), but this could be solve by adding input parameter with So I think we should keep add_transform, as it was before, and just add an extra key to handel the seeding Do you think we can mixt both way ? If not how can I retrieve that a RescaleIntensity has been applied with a given percentile ...? |
So I've been thinking of something like this in def get_params_from_seed(self, seed: Optional[int] = None):
with torch.random.fork_rng():
torch.manual_seed(seed)
params = self.get_params() # pylint: disable=no-member
return params But as I said in #226, sometimes you have different parameters for each image in the sample. Using 4D images (#238) will make this even more complex, but I guess that's a different issue. |
sound good, |
🚀 Feature
Given the history information saved in the sample for a given transform, how can I apply the exact same transform on a new volume
Motivation
After doing a lot of data augmentation for training and validation, I would like to visually check some specific volume (the one having the worst loss for instance).
Since saving all the transformed volume can become a very high cost (in terme of disk space) one prefer to save the transform's parameter, and have the possibility to re-create the exact same transformed volume
Pitch
I implemented once for the RandomElasticDeformation
Alternatives
may be a more generic way would be to separate the apply_transform code in 2 disctinct par
one to get the random param
one to apply them
so that the apply code could be reuse in both case
....
Additional context
The text was updated successfully, but these errors were encountered: