-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should be able to seed Scaper generation for reproducibility #54
Conversation
Okay, took a shot at implementing this! It passes all the current test cases. I edited the Scaper object to take a RandomState object to generate all of the random numbers through out. If it's None, int, or RandomState, it does what scikit-learn does (copied their check_random_state function basically).
TODO
|
Alright, added a test case for checking if seeding works. It's just iterating over a bunch of seeds, making different scaper generators with the same seed, and generating audio from each generator. Then the audio is compared to make sure that it's all the same, given the same seed. There was a small issue with having to deepcopy the seed before passing it to the Scaper init function but after that, the code worked as written. |
Reviewing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @pseeth ! Please see my comments in the code review, basically:
- Remove all
import random
- Rename new sampling functions to make names more descriptive (see comments)
- Add code to test not just output audio but also JAMS and txt when generating with seeding (code already exist, should be easy fix, see comments).
All of these should be very easy/quick to address. Thanks!
I addressed the tinier stuff in the review. Will work on updating the test case now! |
Done! Passes the new test with 100% cov on my machine now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of one rouge import random
that's still there this all looks good to me!
Sounds good! Yeah, you can see the commit history on my fork here: https://github.com/pseeth/scaper/commits/master For reference for merge conflicts. I'll work on modifying source time next! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes so far look perfect! Onto the docs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pseeth see few minor comments in addition to the proposed update to the docs once we add reset_event_spec()
docs/changes.rst
Outdated
@@ -3,6 +3,10 @@ | |||
Changelog | |||
--------- | |||
|
|||
v1.2.0 | |||
~~~~~~ | |||
- Added a random_state parameter to Scaper, which allows all runs to be perfectly reproducible given the same audio and the same random seed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...to Scaper" --> "...to the Scaper object"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please add a note about replacing numpydoc with napoleon
docs/examples.rst
Outdated
|
||
# create a scaper | ||
sc = scaper.Scaper(duration, fg_folder, bg_folder) | ||
sc = scaper.Scaper(duration, fg_folder, bg_folder, random_state=seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is great, but, as discussed, we might want to update it if we add a function along the lines of Scaper.reset_event_spec()
docs/tutorial.rst
Outdated
@@ -100,6 +100,31 @@ when we add foreground events, we'll have to specify an ``snr`` | |||
be louder (or softer) with respect to the background level specified by | |||
``sc.ref_db``. | |||
|
|||
Seeding the Scaper generator for reproducibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Seeding the Scaper object for reproducibility"
scaper/core.py
Outdated
''' | ||
Create a Scaper object. | ||
|
||
Parameters | ||
---------- | ||
duration : float | ||
Duration of the soundscape, in seconds. | ||
fg_path : str | ||
Path to foreground folder. | ||
bg_path : str | ||
Path to background folder. | ||
protected_labels : list | ||
Provide a list of protected foreground labels. When a foreground | ||
label is in the protected list it means that when a sound event | ||
matching the label gets added to a soundscape instantiation the | ||
duration of the source audio file cannot be altered, and the | ||
duration value that was provided in the specification will be | ||
ignored. | ||
|
||
Adding labels to the protected list is useful for sound events | ||
whose semantic validity would be lost if the sound were trimmed | ||
before the sound event ends, for example an animal vocalization | ||
such as a dog bark. | ||
random_state : int, RandomState instance or None, optional (default=None) | ||
If int, random_state is the seed used by the random number | ||
generator; If RandomState instance, random_state is the random number | ||
generator; If None, the random number generator is the RandomState | ||
instance used by np.random. | ||
|
||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did all this go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It went up to under the object description, which is what is used by sphinx to generate the documentation. The __init__
doesn't get documented by default when generating the docs. So i put it in the other place. I could put it into two places if needed, but it seems like it'd be easy for them to get out of sync with one another.
@@ -42,10 +42,9 @@ | |||
], | |||
extras_require={ | |||
'docs': [ | |||
'sphinx==1.2.3', # autodoc was broken in 1.3.1 | |||
'sphinxcontrib-napoleon', | |||
'sphinx', # autodoc was broken in 1.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the pin to 1.2.3 no longer necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs generate for me using the latest sphinx - this likely needs testing from another computer to make sure the docs generate everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically there, few minor fixes and we're good.
docs/examples.rst
Outdated
@@ -53,29 +48,32 @@ Example: synthesizing 1000 soundscapes in one go | |||
time_stretch_min = 0.8 | |||
time_stretch_max = 1.2 | |||
|
|||
# generate a random seed for this Scaper object | |||
seed = np.random.randint(0, 100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will give a different seed each time. I think the goal is to illustrate, in the simplest manner possible, how to get reproducible results. So I would replace this with just seed = 123
.
docs/examples.rst
Outdated
seed = np.random.randint(0, 100000) | ||
|
||
## alternate ways to define random state: | ||
# seed = np.random.RandomState(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep consistent with the previous example, seed = np.random.RandomState(123)
.
docs/examples.rst
Outdated
# or don't define any random state. runs will be random and not reproducible in | ||
# this case. you can use np.random.get_state() to reproduce the run after the fact | ||
# if needed: | ||
# seed = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it belongs more in a tutorial (i.e. in the docs) than in this example. Let's remove it from here.
docs/examples.rst
Outdated
sc.ref_db = ref_db | ||
|
||
# reset the event specifications for foreground and background at the beginning | ||
# of each loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset the event specifications for foreground and background at the beginning of each loop to clear all previously added events
tests/test_core.py
Outdated
@@ -751,6 +751,66 @@ def test_scaper_init(): | |||
assert sc.fade_out_len == 0.01 # 10 ms | |||
|
|||
|
|||
def test_scaper_reset(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not testing a proper reset, let's rename this test_reset_fgbg_event_spec()
tests/test_core.py
Outdated
generators.append(_create_scaper_with_random_seed(seed)) | ||
|
||
generators.append(_create_scaper_with_random_seed(seed)) | ||
generators[-1].set_random_state(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since unit tests should be as modular/small/simple as possible, please let's break this out into a separate test_set_random_state()
scaper/util.py
Outdated
@@ -168,7 +169,7 @@ def _check_random_state(seed): | |||
elif isinstance(seed, (numbers.Integral, np.integer, int)): | |||
return np.random.RandomState(seed) | |||
elif isinstance(seed, np.random.RandomState): | |||
return seed | |||
return deepcopy(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use a deepcopy here? if the user is providing a state object, wouldn't they expect that same object to be used, rather than a deep copy? Say they provide this object to Scaper but also to other processes - they'd expect it to be the same object shared across all the code, not copies, right? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, moved it into the test.
…d commits) Squashed commits: [b149f03] editied comment on test [0cac882] unit test for set_random_state, deepcopy for the seed, updated docs [8c056bd] adding to changelog [ac702f3] adding reset stuff and adjusting tutorial [932b7a6] edited docs, minor alteration to _check_random_state [15086e9] make subheading for seed in tutorial [323de90] small docs edit [0b0d62b] updating docs [6b7d7f3] merged master and passing tests for seeding [aee24f9] removing duplicates in choose [ede4e9f] adding warning, removing duplicates [bfa2a5a] whoops, left a random! [6ee5a1f] updated test case [e0e51b4] addressed naming and got rid of TODOs [c407807] updating gitignore to not keep .DS_Store on mac [519307c] editing to make merge easier [0a1c3cf] editing a bit of style [552bf79] added and passed tests for seeding - works! [622b969] implemented seeding, passes current test cases but not new ones yet
We should be able to seed Scaper so that multiple runs with the same seed on the same underlying data yield the exact same mixtures. Addresses #36.
For now, I've just placed TODOs.
Will try to follow the discussion in #36 when implementing.
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)