Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support default values for distribution tuples #41

Open
beasteers opened this issue Feb 20, 2019 · 15 comments
Open

Support default values for distribution tuples #41

beasteers opened this issue Feb 20, 2019 · 15 comments

Comments

@beasteers
Copy link
Contributor

It would be convenient to be able to omit the source_time for background files and have it start at any point in the recording. So essentially, have it default to ('uniform', [0, bg_audio_file_duration]). And the same goes for event_time. I'd like to be able to just specify ('uniform',) for example and have them randomly placed throughout the file.

Similarly but less important, it would be nice to be able to omit event_duration and have it default to ('const', fg_audio_file_duration - source_time)

In general, I think providing sensible defaults for parameters (like label and source_file default to ('choose', []), source_time defaults to ('const', 0), etc) would be helpful.

@beasteers beasteers changed the title Support uniform based on file durations Support uniform distribution based on file durations Feb 20, 2019
@justinsalamon
Copy link
Owner

justinsalamon commented Apr 5, 2019

It would be convenient to be able to omit the source_time for background files and have it start at any point in the recording.

Why?

So essentially, have it default to ('uniform', [0, bg_audio_file_duration]).

I don't think that's a good default. If anything I think (const, 0) would be a more logical default (i.e. by default start at the beginning of the track).

Anyway, this is a bigger discussion than specific default values, since right now Scaper does not support any default values. So the first question that needs to be resolved is:

should scaper support default values and why?

Right now it's not clear to me why default values should be supported, but I'm open to suggestions!

@beasteers
Copy link
Contributor Author

Why would you need the background to start at the beginning? Isn't the point of background tracks that they don't really have a start?

If I'm generating a bunch of soundscapes from a limited set of background files, I would want the background to be different every time unless otherwise specified, instead of just cycling through the same starts of X number of tracks.

The real benefit to that as a source_time is that, say I only have 2 background recordings, one is 10mins, one is 2mins, and I'm generating a bunch of 30 second soundscapes. I want to utilize the entirety of my 12mins of background audio to create those soundscapes, but I would have to set a source_time at ('uniform', 0, 90) to avoid clipping in the two minute clip (giving me only 4mins to work with) or set it at ('uniform', 0, 550) and have a majority of the samples taken from the two minute clip be of the last 30 seconds.

So if I omit source_time, I'd be implying, "take any 30 seconds from my background audio" and would effectively use the entirety of the available audio.

@beasteers
Copy link
Contributor Author

Scaper should support defaults because:

  • It's cumbersome to have to write out values for variables that should be implied if they're not present.

    • time_stretch, pitch_shift: if they're not mentioned, then they're probably not wanted
    • label, source_file: if it's not mentioned, just use any one of them
    • source_time (foreground): start at the beginning
    • source_time (background): just take any valid snippet from the background track
    • event_duration: if it's not mentioned, I would expect it to just play the entire track
    • event_time: if they didn't say when, meh just put it anywhere
  • For values that depend on properties of the source audio (fg.event_time, fg.event_duration, bg.source_time), there isn't a way to know the audio durations prior to sampling. The other option would be to support something like e.g. event_duration=('const', None) and have that be how you specify using the entire foreground audio clip. I just think it makes sense that if you don't specify values for certain properties, you would want to generate as diverse a set of soundscapes as possible.

@justinsalamon
Copy link
Owner

justinsalamon commented Apr 6, 2019

If I'm generating a bunch of soundscapes from a limited set of background files, I would want the background to be different every time unless otherwise specified, instead of just cycling through the same starts of X number of tracks.

This is a valid use case, but I don't know that it's necessarily the most common use case. Will have to give it some thought.

It's cumbersome to have to write out values for variables that should be implied if they're not present.

Could you paste some example code to illustrate this?

  • time_stretch, pitch_shift: if they're not mentioned, then they're probably not wanted

I tend to agree.

  • label, source_file: if it's not mentioned, just use any one of them

Agreed.

  • source_time (foreground): start at the beginning

Agreed.

  • source_time (background): just take any valid snippet from the background track

Not convinced this is the best default behavior.

  • event_duration: if it's not mentioned, I would expect it to just play the entire track

Which track? The source file or the output soundscape? Another option would be to randomly choose a duration for the event. It's not clear to me which default value would make most sense here.

  • event_time: if they didn't say when, meh just put it anywhere

Anywhere according to which distribution? Normal? Uniform? Again, it's not clear to me what a good default value would be.

To summarize, it's not clear to me that there are sensible defaults for each field, and consequently it's not clear to me that it makes sense to have default values at all. Note that "not clear" doesn't mean I disagree, in fact, I tend to support the idea of having default values, but I think this requires careful hashing out before looking at any PR code.

@beasteers
Copy link
Contributor Author

beasteers commented Apr 6, 2019

So I think it'll help if we clarify the assumptions being made on what the source audio looks like, because it seems like this might be the source of some disagreement? My base assumptions that I've been using are:

background events:

  • usually longer, assuming we're taking audio from some long field recordings, anywhere from a minute to an hour
  • possibly fewer in number than foreground events (which is fine because of longer lengths)

foreground events:

  • a single event
  • basically thinking like a sound effects library

So maybe the assumptions I'm making are wrong, but the defaults I picked are based off of these so I'm thinking that's why we may have different ideas about what they should be.

This is a valid use case, but I don't know that it's necessarily the most common use case. Will have to give it some thought.

source_time (background): just take any valid snippet from the background track
Not convinced this is the best default behavior.

So are the background events you use typically cut up into small chunks? I may just be misunderstanding how scaper is meant to be used. Otherwise it seems a bit wasteful to ignore all background audio after [0, sc.duration].

Could you paste some example code to illustrate this?

# I just find this:
sc.add_event(label=('const', 'dog_bark'),
             event_time=('uniform', 5, 10),
             snr=('normal', 10, 3))

# to be a bit more expressive than this:
sc.add_event(label=('const', 'dog_bark'),
             source_file=('choose', []),
             source_time=('const', 0),
             event_time=('uniform', 5, 10),
             event_duration=('truncnorm', 3, 1, 0.5, 5),
             snr=('normal', 10, 3),
             pitch_shift=None,
             time_stretch=None)

Which track? The source file or the output soundscape? Another option would be to randomly choose a duration for the event. It's not clear to me which default value would make most sense here.

The entire source file, my bad. This default is going off the assumption that we have foreground events that are isolated. So if I had a folder of dog barks and I told scaper sc.add_event(label=('const', 'dog_bark')) It would add a single dog bark.

I understand how it wouldn't be a reasonable default if the foreground events are long and don't contain one event each, but in that case, the user could just override the default with their desired value/distribution. I agree that a randomly chosen duration would not make sense.

Anywhere according to which distribution? Normal? Uniform? Again, it's not clear to me what a good default value would be.

I think a basic assumption would be uniform across the length of the soundscape. If I say add an event to this soundscape, the least biased way would be to uniformly place it across all valid values.

@justinsalamon
Copy link
Owner

justinsalamon commented Apr 6, 2019

Your assumptions about the source audio are aligned with mine and with how scaper is suppsed to be used (background files are expected to be longer compared to foreground files; foreground files should contain a single, isolated sound event, though note the event could be short [dog bark] or long [continuous siren sound]).

Your code example illustrates nicely how defaults would make coding in scaper more concise. I think the challenge will be to identify the best default values for each parameter.

In particular, I'm not sure about how to choose the start time (source time) for the background. For some use cases it makes sense to choose a start time randomly (e.g. adding environmental foreground sounds on top of an environmental background track), but for others it might make more sense to always start at the beginning (e.g. adding musical instrument sounds on top of a background beat track). In either case it probably doesn't make sense to choose a time > background_source_duration - soundscape_duration.

I'll give this some further thought. Unfortunately I don't think scaper has a large enough userbase (yet!) to poll users, which would be the best thing to do. But I guess we could make a best effort in setting default values and then if users request changes those can be considered. What I definitely want to avoid is setting default values that lead to "bad behavior", i.e. the user thinking they're doing X while in fact they're doing Y.

@beasteers
Copy link
Contributor Author

Hm I never thought about using scaper for music as using a continuous distribution would hardly, if ever, produce soundscapes with events on beat. If that is a use case, I would have expected a binomial distribution to be implemented or something.

I understand. I still think we need a way of sampling source_time from ('uniform', 0, background_source_duration - soundscape_duration) though because in the case of uneven length background files, having to set it arbitrarily would produce biased samples.

@justinsalamon
Copy link
Owner

For music the most convenient default case, potentially, is for source_time to be 0 by default such that if you have a beat track, and then different instrument tracks, you can create "sub-mixes" by randomly choosing a subset of instruments and mixing them together, all starting at time 0 so that the tracks remain synchronized. In fact we did just that with MedleyDB as an augmentation technique. @pseeth might have additional thoughts.

@beasteers
Copy link
Contributor Author

Oh interesting. Gotcha. My thought is that, the music case can be accommodated by adding source_time=('const', 0), but the environmental case currently can't. So I'm fine tabling the discussion about defaults, but it is a minor annoyance for me that we can't randomly sample uniformly across all background audio.

So regardless of whether it's the default, I think this behavior should be implemented to allow:

fg.event_time=('uniform', 0, sc.duration - event_duration), 
fg.event_duration=('const', fg.source_duration), 
bg.source_time=('uniform', 0, bg.source_duration - sc.duration)

This has already been implemented in #51 where setting each of these to None will give each of these distributions. Another option would be to do something like ('uniform', 0, None) but that might complicate distribution validation quite a bit.

I'm open to taking out the default values of #51 if that could be a compromise.

@pseeth
Copy link
Collaborator

pseeth commented Apr 6, 2019

Hey guys, I just saw this discussion. Mixing coherent music mixtures is actually quite tricky and a bit hacky. I have this script to do it with musdb if you're interested in seeing how I'm doing it:

https://gist.github.com/pseeth/51902c231f69b42ddc7274af20b27d24

As you can see, it's a little bit insane. The script above makes 20k mixtures from musdb assuming you reorganize the musdb to match the Scaper style:

musdb/
  train/
     bass/
        song0.wav
        song1.wav
        ...
     drums/
        song0.wav
        song1.wav
        ...
     other/
        song0.wav
        song1.wav
        ...
     vocals/
        song0.wav
        song1.wav
        ...

One thing of note: I am not using a start time of 0 across the board. What I do is sample a random start time for one of the sources (vocals in this case), and then tie the start time for all the other sources to that selected start time. This complicates things a bit in Scaper as to replicate this internally, Scaper would have to first instantiate the required event (vocals), and then update the remaining events to be tied to the start time of the required event.

Oh incidentally, the gist I posted above is also an instance of generating the jams files in one thread and synthesizing them in parallel across multiple threads (as discussed in #36).

@justinsalamon
Copy link
Owner

So regardless of whether it's the default, I think this behavior should be implemented to allow:

fg.event_time=('uniform', 0, sc.duration - event_duration), 
fg.event_duration=('const', fg.source_duration), 
bg.source_time=('uniform', 0, bg.source_duration - sc.duration)

I'm not sure I'm following, this is already possible: if you set a foreground event_time = ('uniform', 0, sc.duration) it'll use a start time between 0 and sc.duration - event_duration. If you set the label to be protected, it will use the source file duration as the event duration. And similarly there's nothing preventing you from setting a start time for the background event other than 0.

@justinsalamon justinsalamon changed the title Support uniform distribution based on file durations Support default values for distribution tuples Apr 8, 2019
@justinsalamon
Copy link
Owner

This is how I'd summarize the thread so far:

  • It would be useful to support default values for distribution tuples so that scaper code can be more concise
  • Some parameters have natural default values (e.g. None for time stretching and pitch shifting)
  • Other parameters (e.g. source_time) are less straightforward and different use cases would benefit from different default values.

All in all I'm in favor of supporting default values, but it's a pretty major API change and would require (1) determining the best default value for each parameter and (2) implementing the change including unit testing and documentation.

I'm happy to look into this once we finish merging #53, #54, #55.

@beasteers
Copy link
Contributor Author

Fair point. I hadn't considered setting all labels as protected to force the full clip duration. For event_time, that is also a solution, though it will introduce a slight bias towards events occurring in the interval [sc.duration - event_duration, sc.duration], but assuming the foreground events are short that shouldn't be a huge deal.

For background source_time, it's not just about setting the background source_time to a value other than zero, it's about being able to set the background source_time to a value that allows us to use the entirety of our background audio in an unbiased manner. See from above:

say I only have 2 background recordings, one is 10mins, one is 2mins, and I'm generating a bunch of 30 second soundscapes. I want to utilize the entirety of my 12mins of background audio to create those soundscapes, but I would have to set a source_time at ('uniform', 0, 90) to avoid clipping in the two minute clip (giving me only 4mins total to work with between the two files), or set it at ('uniform', 0, 550) and have a majority of the samples taken from the two minute clip be of the last 30 seconds.
So if I omit source_time (or however it's implemented), I'd be implying, "take any 30 seconds from my background audio" and would effectively use the entirety of the available audio.

I don't think it's that much of an API change purely because it's entirely backwards compatible. But yes I agree it will take a bit of work with determining defaults and unit test/docs.

@justinsalamon
Copy link
Owner

justinsalamon commented Apr 9, 2019

For background source_time, it's not just about setting the background source_time to a value other than zero, it's about being able to set the background source_time to a value that allows us to use the entirety of our background audio in an unbiased manner. See from above:

The PR we're currently working on (#53) adjusts the source time prior to sampling to give an unbiased sampling of the valid range given the actual source time and event duration. So I think that PR addresses your concern.

@pseeth
Copy link
Collaborator

pseeth commented Feb 29, 2020

#53 is now in Scaper, @beasteers, does it address this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants