Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should be able to seed Scaper generation for reproducibility #54

Merged
merged 3 commits into from
Feb 3, 2020

Conversation

pseeth
Copy link
Collaborator

@pseeth pseeth commented Apr 2, 2019

We should be able to seed Scaper so that multiple runs with the same seed on the same underlying data yield the exact same mixtures. Addresses #36.

For now, I've just placed TODOs.

Will try to follow the discussion in #36 when implementing.


This change is Reviewable

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 6f8891a on pseeth:seeding into 3c83bba on justinsalamon:master.

@coveralls
Copy link

coveralls commented Apr 2, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling ea948cf on pseeth:seeding into 803a26b on justinsalamon:master.

@pseeth
Copy link
Collaborator Author

pseeth commented Apr 3, 2019

Okay, took a shot at implementing this! It passes all the current test cases. I edited the Scaper object to take a RandomState object to generate all of the random numbers through out. If it's None, int, or RandomState, it does what scikit-learn does (copied their check_random_state function basically).

  • Added test cases to test the check_random_state function.
  • Changed _get_value_from_dist to take a RandomState object
  • Moved implementations of SUPPORTED_DIST functions to utils.py, where they get wrapped a bit and take in a random_state to generate from the distribution instead of generating using the global RandomState object.

TODO

  • Add test cases for checking if seeding actually works...

@pseeth
Copy link
Collaborator Author

pseeth commented Apr 3, 2019

Alright, added a test case for checking if seeding works. It's just iterating over a bunch of seeds, making different scaper generators with the same seed, and generating audio from each generator. Then the audio is compared to make sure that it's all the same, given the same seed. There was a small issue with having to deepcopy the seed before passing it to the Scaper init function but after that, the code worked as written.

@justinsalamon
Copy link
Owner

Reviewing

Copy link
Owner

@justinsalamon justinsalamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @pseeth ! Please see my comments in the code review, basically:

  • Remove all import random
  • Rename new sampling functions to make names more descriptive (see comments)
  • Add code to test not just output audio but also JAMS and txt when generating with seeding (code already exist, should be easy fix, see comments).

All of these should be very easy/quick to address. Thanks!

scaper/core.py Outdated Show resolved Hide resolved
scaper/core.py Outdated Show resolved Hide resolved
scaper/util.py Outdated Show resolved Hide resolved
scaper/util.py Outdated Show resolved Hide resolved
scaper/util.py Outdated Show resolved Hide resolved
scaper/util.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
scaper/util.py Outdated Show resolved Hide resolved
scaper/core.py Show resolved Hide resolved
@pseeth
Copy link
Collaborator Author

pseeth commented Apr 6, 2019

I addressed the tinier stuff in the review. Will work on updating the test case now!

@pseeth
Copy link
Collaborator Author

pseeth commented Apr 6, 2019

Done! Passes the new test with 100% cov on my machine now.

Copy link
Owner

@justinsalamon justinsalamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of one rouge import random that's still there this all looks good to me!

scaper/util.py Outdated Show resolved Hide resolved
@justinsalamon
Copy link
Owner

OK @pseeth, assuming the tests still pass (waiting for travis to complete) looks like this PR is good to merge. But... you mentioned it'll be easier to merge #53 first, right? In that case I'll hold off on merging this until we merge #53, then we can come back to this one.

@pseeth
Copy link
Collaborator Author

pseeth commented Apr 8, 2019

Sounds good! Yeah, you can see the commit history on my fork here:

https://github.com/pseeth/scaper/commits/master

For reference for merge conflicts. I'll work on modifying source time next!

Copy link
Owner

@justinsalamon justinsalamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes so far look perfect! Onto the docs :)

Copy link
Owner

@justinsalamon justinsalamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pseeth see few minor comments in addition to the proposed update to the docs once we add reset_event_spec()

docs/changes.rst Outdated
@@ -3,6 +3,10 @@
Changelog
---------

v1.2.0
~~~~~~
- Added a random_state parameter to Scaper, which allows all runs to be perfectly reproducible given the same audio and the same random seed.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...to Scaper" --> "...to the Scaper object"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add a note about replacing numpydoc with napoleon


# create a scaper
sc = scaper.Scaper(duration, fg_folder, bg_folder)
sc = scaper.Scaper(duration, fg_folder, bg_folder, random_state=seed)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is great, but, as discussed, we might want to update it if we add a function along the lines of Scaper.reset_event_spec()

@@ -100,6 +100,31 @@ when we add foreground events, we'll have to specify an ``snr``
be louder (or softer) with respect to the background level specified by
``sc.ref_db``.

Seeding the Scaper generator for reproducibility
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Seeding the Scaper object for reproducibility"

docs/tutorial.rst Show resolved Hide resolved
scaper/core.py Outdated
Comment on lines 847 to 868
'''
Create a Scaper object.

Parameters
----------
duration : float
Duration of the soundscape, in seconds.
fg_path : str
Path to foreground folder.
bg_path : str
Path to background folder.
protected_labels : list
Provide a list of protected foreground labels. When a foreground
label is in the protected list it means that when a sound event
matching the label gets added to a soundscape instantiation the
duration of the source audio file cannot be altered, and the
duration value that was provided in the specification will be
ignored.

Adding labels to the protected list is useful for sound events
whose semantic validity would be lost if the sound were trimmed
before the sound event ends, for example an animal vocalization
such as a dog bark.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random number
generator; If None, the random number generator is the RandomState
instance used by np.random.

'''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did all this go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It went up to under the object description, which is what is used by sphinx to generate the documentation. The __init__ doesn't get documented by default when generating the docs. So i put it in the other place. I could put it into two places if needed, but it seems like it'd be easy for them to get out of sync with one another.

@@ -42,10 +42,9 @@
],
extras_require={
'docs': [
'sphinx==1.2.3', # autodoc was broken in 1.3.1
'sphinxcontrib-napoleon',
'sphinx', # autodoc was broken in 1.3.1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the pin to 1.2.3 no longer necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs generate for me using the latest sphinx - this likely needs testing from another computer to make sure the docs generate everywhere

Copy link
Owner

@justinsalamon justinsalamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically there, few minor fixes and we're good.

@@ -53,29 +48,32 @@ Example: synthesizing 1000 soundscapes in one go
time_stretch_min = 0.8
time_stretch_max = 1.2

# generate a random seed for this Scaper object
seed = np.random.randint(0, 100000)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give a different seed each time. I think the goal is to illustrate, in the simplest manner possible, how to get reproducible results. So I would replace this with just seed = 123.

seed = np.random.randint(0, 100000)

## alternate ways to define random state:
# seed = np.random.RandomState(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep consistent with the previous example, seed = np.random.RandomState(123).

Comment on lines 57 to 60
# or don't define any random state. runs will be random and not reproducible in
# this case. you can use np.random.get_state() to reproduce the run after the fact
# if needed:
# seed = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it belongs more in a tutorial (i.e. in the docs) than in this example. Let's remove it from here.

sc.ref_db = ref_db

# reset the event specifications for foreground and background at the beginning
# of each loop.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset the event specifications for foreground and background at the beginning of each loop to clear all previously added events

@@ -751,6 +751,66 @@ def test_scaper_init():
assert sc.fade_out_len == 0.01 # 10 ms


def test_scaper_reset():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not testing a proper reset, let's rename this test_reset_fgbg_event_spec()

generators.append(_create_scaper_with_random_seed(seed))

generators.append(_create_scaper_with_random_seed(seed))
generators[-1].set_random_state(seed)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since unit tests should be as modular/small/simple as possible, please let's break this out into a separate test_set_random_state()

scaper/util.py Outdated
@@ -168,7 +169,7 @@ def _check_random_state(seed):
elif isinstance(seed, (numbers.Integral, np.integer, int)):
return np.random.RandomState(seed)
elif isinstance(seed, np.random.RandomState):
return seed
return deepcopy(seed)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use a deepcopy here? if the user is providing a state object, wouldn't they expect that same object to be used, rather than a deep copy? Say they provide this object to Scaper but also to other processes - they'd expect it to be the same object shared across all the code, not copies, right? Or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, moved it into the test.

…d commits)

Squashed commits:
[b149f03] editied comment on test
[0cac882] unit test for set_random_state, deepcopy for the seed, updated docs
[8c056bd] adding to changelog
[ac702f3] adding reset stuff and adjusting tutorial
[932b7a6] edited docs, minor alteration to _check_random_state
[15086e9] make subheading for seed in tutorial
[323de90] small docs edit
[0b0d62b] updating docs
[6b7d7f3] merged master and passing tests for seeding
[aee24f9] removing duplicates in choose
[ede4e9f] adding warning, removing duplicates
[bfa2a5a] whoops, left a random!
[6ee5a1f] updated test case
[e0e51b4] addressed naming and got rid of TODOs
[c407807] updating gitignore to not keep .DS_Store on mac
[519307c] editing to make merge easier
[0a1c3cf] editing a bit of style
[552bf79] added and passed tests for seeding - works!
[622b969] implemented seeding, passes current test cases but not new ones yet
@justinsalamon justinsalamon merged commit 7fb77a9 into justinsalamon:master Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants