Should be able to seed Scaper generation for reproducibility #54

pseeth · 2019-04-02T11:24:39Z

We should be able to seed Scaper so that multiple runs with the same seed on the same underlying data yield the exact same mixtures. Addresses #36.

For now, I've just placed TODOs.

Will try to follow the discussion in #36 when implementing.

This change is

coveralls · 2019-04-02T11:37:10Z

Coverage remained the same at 100.0% when pulling 6f8891a on pseeth:seeding into 3c83bba on justinsalamon:master.

coveralls · 2019-04-02T11:37:12Z

Coverage remained the same at 100.0% when pulling ea948cf on pseeth:seeding into 803a26b on justinsalamon:master.

pseeth · 2019-04-03T01:25:22Z

Okay, took a shot at implementing this! It passes all the current test cases. I edited the Scaper object to take a RandomState object to generate all of the random numbers through out. If it's None, int, or RandomState, it does what scikit-learn does (copied their check_random_state function basically).

Added test cases to test the check_random_state function.
Changed _get_value_from_dist to take a RandomState object
Moved implementations of SUPPORTED_DIST functions to utils.py, where they get wrapped a bit and take in a random_state to generate from the distribution instead of generating using the global RandomState object.

TODO

~~Add test cases for checking if seeding actually works...~~

pseeth · 2019-04-03T02:22:22Z

Alright, added a test case for checking if seeding works. It's just iterating over a bunch of seeds, making different scaper generators with the same seed, and generating audio from each generator. Then the audio is compared to make sure that it's all the same, given the same seed. There was a small issue with having to deepcopy the seed before passing it to the Scaper init function but after that, the code worked as written.

justinsalamon · 2019-04-05T22:51:14Z

Reviewing

justinsalamon

Nice work @pseeth ! Please see my comments in the code review, basically:

Remove all import random
Rename new sampling functions to make names more descriptive (see comments)
Add code to test not just output audio but also JAMS and txt when generating with seeding (code already exist, should be easy fix, see comments).

All of these should be very easy/quick to address. Thanks!

scaper/core.py

scaper/util.py

tests/test_core.py

scaper/util.py

scaper/core.py

pseeth · 2019-04-06T02:19:20Z

I addressed the tinier stuff in the review. Will work on updating the test case now!

pseeth · 2019-04-06T02:28:58Z

Done! Passes the new test with 100% cov on my machine now.

justinsalamon

Outside of one rouge import random that's still there this all looks good to me!

scaper/util.py

justinsalamon · 2019-04-08T04:15:05Z

OK @pseeth, assuming the tests still pass (waiting for travis to complete) looks like this PR is good to merge. But... you mentioned it'll be easier to merge #53 first, right? In that case I'll hold off on merging this until we merge #53, then we can come back to this one.

pseeth · 2019-04-08T04:20:58Z

Sounds good! Yeah, you can see the commit history on my fork here:

https://github.com/pseeth/scaper/commits/master

For reference for merge conflicts. I'll work on modifying source time next!

justinsalamon

Changes so far look perfect! Onto the docs :)

justinsalamon

@pseeth see few minor comments in addition to the proposed update to the docs once we add reset_event_spec()

justinsalamon · 2020-02-01T20:27:17Z

docs/changes.rst

@@ -3,6 +3,10 @@
 Changelog
 ---------

+v1.2.0
+~~~~~~
+- Added a random_state parameter to Scaper, which allows all runs to be perfectly reproducible given the same audio and the same random seed.


"...to Scaper" --> "...to the Scaper object"

Also please add a note about replacing numpydoc with napoleon

justinsalamon · 2020-02-01T20:30:44Z

docs/examples.rst


        # create a scaper
-        sc = scaper.Scaper(duration, fg_folder, bg_folder)
+        sc = scaper.Scaper(duration, fg_folder, bg_folder, random_state=seed)


This example is great, but, as discussed, we might want to update it if we add a function along the lines of Scaper.reset_event_spec()

justinsalamon · 2020-02-01T20:31:33Z

docs/tutorial.rst

@@ -100,6 +100,31 @@ when we add foreground events, we'll have to specify an ``snr``
 be louder (or softer) with respect to the background level specified by
 ``sc.ref_db``.

+Seeding the Scaper generator for reproducibility


"Seeding the Scaper object for reproducibility"

docs/tutorial.rst

justinsalamon · 2020-02-01T20:35:40Z

scaper/core.py

-        '''
-        Create a Scaper object.
-
-        Parameters
-        ----------
-        duration : float
-            Duration of the soundscape, in seconds.
-        fg_path : str
-            Path to foreground folder.
-        bg_path : str
-            Path to background folder.
-        protected_labels : list
-            Provide a list of protected foreground labels. When a foreground
-            label is in the protected list it means that when a sound event
-            matching the label gets added to a soundscape instantiation the
-            duration of the source audio file cannot be altered, and the
-            duration value that was provided in the specification will be
-            ignored.
-
-            Adding labels to the protected list is useful for sound events
-            whose semantic validity would be lost if the sound were trimmed
-            before the sound event ends, for example an animal vocalization
-            such as a dog bark.
-        random_state : int, RandomState instance or None, optional (default=None)
-            If int, random_state is the seed used by the random number 
-            generator; If RandomState instance, random_state is the random number 
-            generator; If None, the random number generator is the RandomState 
-            instance used by np.random.
-
-        '''


Where did all this go?

It went up to under the object description, which is what is used by sphinx to generate the documentation. The __init__ doesn't get documented by default when generating the docs. So i put it in the other place. I could put it into two places if needed, but it seems like it'd be easy for them to get out of sync with one another.

justinsalamon · 2020-02-01T20:36:34Z

setup.py

@@ -42,10 +42,9 @@
    ],
    extras_require={
        'docs': [
-                'sphinx==1.2.3',  # autodoc was broken in 1.3.1
-                'sphinxcontrib-napoleon',
+                'sphinx',  # autodoc was broken in 1.3.1


is the pin to 1.2.3 no longer necessary?

the docs generate for me using the latest sphinx - this likely needs testing from another computer to make sure the docs generate everywhere

justinsalamon

Practically there, few minor fixes and we're good.

justinsalamon · 2020-02-02T02:23:34Z

docs/examples.rst

@@ -53,29 +48,32 @@ Example: synthesizing 1000 soundscapes in one go
    time_stretch_min = 0.8
    time_stretch_max = 1.2

+    # generate a random seed for this Scaper object
+    seed = np.random.randint(0, 100000)


This will give a different seed each time. I think the goal is to illustrate, in the simplest manner possible, how to get reproducible results. So I would replace this with just seed = 123.

justinsalamon · 2020-02-02T02:24:39Z

docs/examples.rst

+    seed = np.random.randint(0, 100000)
+
+    ## alternate ways to define random state:
+    #     seed = np.random.RandomState(0)


To keep consistent with the previous example, seed = np.random.RandomState(123).

justinsalamon · 2020-02-02T02:25:12Z

docs/examples.rst

+    # or don't define any random state. runs will be random and not reproducible in 
+    # this case. you can use np.random.get_state() to reproduce the run after the fact
+    # if needed:
+    #     seed = None


This feels like it belongs more in a tutorial (i.e. in the docs) than in this example. Let's remove it from here.

justinsalamon · 2020-02-02T02:26:34Z

docs/examples.rst

-        sc.ref_db = ref_db
+
+        # reset the event specifications for foreground and background at the beginning
+        # of each loop.


reset the event specifications for foreground and background at the beginning of each loop to clear all previously added events

justinsalamon · 2020-02-02T02:33:36Z

tests/test_core.py

@@ -751,6 +751,66 @@ def test_scaper_init():
    assert sc.fade_out_len == 0.01  # 10 ms


+def test_scaper_reset():


Since we're not testing a proper reset, let's rename this test_reset_fgbg_event_spec()

justinsalamon · 2020-02-02T02:35:14Z

tests/test_core.py

            generators.append(_create_scaper_with_random_seed(seed))
+
+        generators.append(_create_scaper_with_random_seed(seed))
+        generators[-1].set_random_state(seed)


Since unit tests should be as modular/small/simple as possible, please let's break this out into a separate test_set_random_state()

justinsalamon · 2020-02-02T16:54:01Z

scaper/util.py

@@ -168,7 +169,7 @@ def _check_random_state(seed):
    elif isinstance(seed, (numbers.Integral, np.integer, int)):
        return np.random.RandomState(seed)
    elif isinstance(seed, np.random.RandomState):
-        return seed
+        return deepcopy(seed)


why use a deepcopy here? if the user is providing a state object, wouldn't they expect that same object to be used, rather than a deep copy? Say they provide this object to Scaper but also to other processes - they'd expect it to be the same object shared across all the code, not copies, right? Or am I missing something?

Fair, moved it into the test.

…d commits) Squashed commits: [b149f03] editied comment on test [0cac882] unit test for set_random_state, deepcopy for the seed, updated docs [8c056bd] adding to changelog [ac702f3] adding reset stuff and adjusting tutorial [932b7a6] edited docs, minor alteration to _check_random_state [15086e9] make subheading for seed in tutorial [323de90] small docs edit [0b0d62b] updating docs [6b7d7f3] merged master and passing tests for seeding [aee24f9] removing duplicates in choose [ede4e9f] adding warning, removing duplicates [bfa2a5a] whoops, left a random! [6ee5a1f] updated test case [e0e51b4] addressed naming and got rid of TODOs [c407807] updating gitignore to not keep .DS_Store on mac [519307c] editing to make merge easier [0a1c3cf] editing a bit of style [552bf79] added and passed tests for seeding - works! [622b969] implemented seeding, passes current test cases but not new ones yet

justinsalamon requested changes Apr 5, 2019

View reviewed changes

This was referenced Apr 5, 2019

Adding saving of sources, take 3 #55

Merged

Output sources as separate files alongside mixture #52

Closed

justinsalamon requested changes Apr 8, 2019

View reviewed changes

scaper/util.py Outdated Show resolved Hide resolved

justinsalamon approved these changes Apr 8, 2019

View reviewed changes

justinsalamon mentioned this pull request Apr 8, 2019

Support default values for distribution tuples #41

Open

adding TODOs

43ff378

pseeth force-pushed the seeding branch from c7bd1a5 to 6b7d7f3 Compare January 30, 2020 22:30

justinsalamon approved these changes Jan 31, 2020

View reviewed changes

justinsalamon requested changes Feb 1, 2020

View reviewed changes

pseeth mentioned this pull request Feb 2, 2020

Function to reset the event specification in the Scaper object #69

Closed

justinsalamon requested changes Feb 2, 2020

View reviewed changes

justinsalamon reviewed Feb 2, 2020

View reviewed changes

pseeth force-pushed the seeding branch from 61d404a to 55519b0 Compare February 2, 2020 18:19

fixing docstring

ea948cf

justinsalamon approved these changes Feb 2, 2020

View reviewed changes

justinsalamon approved these changes Feb 3, 2020

View reviewed changes

justinsalamon merged commit 7fb77a9 into justinsalamon:master Feb 3, 2020

justinsalamon mentioned this pull request Feb 5, 2020

Scaper and random seeds #36

Closed

justinsalamon mentioned this pull request Feb 15, 2020

Factor out distribution logic #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should be able to seed Scaper generation for reproducibility #54

Should be able to seed Scaper generation for reproducibility #54

pseeth commented Apr 2, 2019 •

edited by justinsalamon

Loading

coveralls commented Apr 2, 2019

coveralls commented Apr 2, 2019 •

edited

Loading

pseeth commented Apr 3, 2019 •

edited

Loading

pseeth commented Apr 3, 2019

justinsalamon commented Apr 5, 2019

justinsalamon left a comment

pseeth commented Apr 6, 2019

pseeth commented Apr 6, 2019

justinsalamon left a comment

justinsalamon commented Apr 8, 2019

pseeth commented Apr 8, 2019

justinsalamon left a comment

justinsalamon left a comment

justinsalamon Feb 1, 2020

justinsalamon Feb 1, 2020

justinsalamon Feb 1, 2020

justinsalamon Feb 1, 2020

justinsalamon Feb 1, 2020

pseeth Feb 1, 2020

justinsalamon Feb 1, 2020

pseeth Feb 1, 2020

justinsalamon left a comment

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

justinsalamon Feb 2, 2020

pseeth Feb 2, 2020

		@@ -751,6 +751,66 @@ def test_scaper_init():
		assert sc.fade_out_len == 0.01 # 10 ms


		def test_scaper_reset():

Should be able to seed Scaper generation for reproducibility #54

Should be able to seed Scaper generation for reproducibility #54

Conversation

pseeth commented Apr 2, 2019 • edited by justinsalamon Loading

coveralls commented Apr 2, 2019

coveralls commented Apr 2, 2019 • edited Loading

pseeth commented Apr 3, 2019 • edited Loading

pseeth commented Apr 3, 2019

justinsalamon commented Apr 5, 2019

justinsalamon left a comment

Choose a reason for hiding this comment

pseeth commented Apr 6, 2019

pseeth commented Apr 6, 2019

justinsalamon left a comment

Choose a reason for hiding this comment

justinsalamon commented Apr 8, 2019

pseeth commented Apr 8, 2019

justinsalamon left a comment

Choose a reason for hiding this comment

justinsalamon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsalamon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pseeth commented Apr 2, 2019 •

edited by justinsalamon

Loading

coveralls commented Apr 2, 2019 •

edited

Loading

pseeth commented Apr 3, 2019 •

edited

Loading