-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support default values for distribution tuples #41
Comments
Why?
I don't think that's a good default. If anything I think Anyway, this is a bigger discussion than specific default values, since right now Scaper does not support any default values. So the first question that needs to be resolved is: should scaper support default values and why? Right now it's not clear to me why default values should be supported, but I'm open to suggestions! |
Why would you need the background to start at the beginning? Isn't the point of background tracks that they don't really have a start? If I'm generating a bunch of soundscapes from a limited set of background files, I would want the background to be different every time unless otherwise specified, instead of just cycling through the same starts of X number of tracks. The real benefit to that as a source_time is that, say I only have 2 background recordings, one is 10mins, one is 2mins, and I'm generating a bunch of 30 second soundscapes. I want to utilize the entirety of my 12mins of background audio to create those soundscapes, but I would have to set a source_time at So if I omit source_time, I'd be implying, "take any 30 seconds from my background audio" and would effectively use the entirety of the available audio. |
Scaper should support defaults because:
|
This is a valid use case, but I don't know that it's necessarily the most common use case. Will have to give it some thought.
Could you paste some example code to illustrate this?
I tend to agree.
Agreed.
Agreed.
Not convinced this is the best default behavior.
Which track? The source file or the output soundscape? Another option would be to randomly choose a duration for the event. It's not clear to me which default value would make most sense here.
Anywhere according to which distribution? Normal? Uniform? Again, it's not clear to me what a good default value would be. To summarize, it's not clear to me that there are sensible defaults for each field, and consequently it's not clear to me that it makes sense to have default values at all. Note that "not clear" doesn't mean I disagree, in fact, I tend to support the idea of having default values, but I think this requires careful hashing out before looking at any PR code. |
So I think it'll help if we clarify the assumptions being made on what the source audio looks like, because it seems like this might be the source of some disagreement? My base assumptions that I've been using are: background events:
foreground events:
So maybe the assumptions I'm making are wrong, but the defaults I picked are based off of these so I'm thinking that's why we may have different ideas about what they should be.
So are the background events you use typically cut up into small chunks? I may just be misunderstanding how scaper is meant to be used. Otherwise it seems a bit wasteful to ignore all background audio after [0, sc.duration].
The entire source file, my bad. This default is going off the assumption that we have foreground events that are isolated. So if I had a folder of dog barks and I told scaper I understand how it wouldn't be a reasonable default if the foreground events are long and don't contain one event each, but in that case, the user could just override the default with their desired value/distribution. I agree that a randomly chosen duration would not make sense.
I think a basic assumption would be uniform across the length of the soundscape. If I say add an event to this soundscape, the least biased way would be to uniformly place it across all valid values. |
Your assumptions about the source audio are aligned with mine and with how scaper is suppsed to be used (background files are expected to be longer compared to foreground files; foreground files should contain a single, isolated sound event, though note the event could be short [dog bark] or long [continuous siren sound]). Your code example illustrates nicely how defaults would make coding in scaper more concise. I think the challenge will be to identify the best default values for each parameter. In particular, I'm not sure about how to choose the start time (source time) for the background. For some use cases it makes sense to choose a start time randomly (e.g. adding environmental foreground sounds on top of an environmental background track), but for others it might make more sense to always start at the beginning (e.g. adding musical instrument sounds on top of a background beat track). In either case it probably doesn't make sense to choose a time > I'll give this some further thought. Unfortunately I don't think scaper has a large enough userbase (yet!) to poll users, which would be the best thing to do. But I guess we could make a best effort in setting default values and then if users request changes those can be considered. What I definitely want to avoid is setting default values that lead to "bad behavior", i.e. the user thinking they're doing X while in fact they're doing Y. |
Hm I never thought about using scaper for music as using a continuous distribution would hardly, if ever, produce soundscapes with events on beat. If that is a use case, I would have expected a binomial distribution to be implemented or something. I understand. I still think we need a way of sampling source_time from |
For music the most convenient default case, potentially, is for |
Oh interesting. Gotcha. My thought is that, the music case can be accommodated by adding So regardless of whether it's the default, I think this behavior should be implemented to allow:
This has already been implemented in #51 where setting each of these to I'm open to taking out the default values of #51 if that could be a compromise. |
Hey guys, I just saw this discussion. Mixing coherent music mixtures is actually quite tricky and a bit hacky. I have this script to do it with musdb if you're interested in seeing how I'm doing it: https://gist.github.com/pseeth/51902c231f69b42ddc7274af20b27d24 As you can see, it's a little bit insane. The script above makes 20k mixtures from musdb assuming you reorganize the musdb to match the Scaper style:
One thing of note: I am not using a start time of 0 across the board. What I do is sample a random start time for one of the sources (vocals in this case), and then tie the start time for all the other sources to that selected start time. This complicates things a bit in Scaper as to replicate this internally, Scaper would have to first instantiate the required event (vocals), and then update the remaining events to be tied to the start time of the required event. Oh incidentally, the gist I posted above is also an instance of generating the jams files in one thread and synthesizing them in parallel across multiple threads (as discussed in #36). |
I'm not sure I'm following, this is already possible: if you set a foreground |
This is how I'd summarize the thread so far:
All in all I'm in favor of supporting default values, but it's a pretty major API change and would require (1) determining the best default value for each parameter and (2) implementing the change including unit testing and documentation. I'm happy to look into this once we finish merging #53, #54, #55. |
Fair point. I hadn't considered setting all labels as protected to force the full clip duration. For For background source_time, it's not just about setting the background source_time to a value other than zero, it's about being able to set the background source_time to a value that allows us to use the entirety of our background audio in an unbiased manner. See from above:
I don't think it's that much of an API change purely because it's entirely backwards compatible. But yes I agree it will take a bit of work with determining defaults and unit test/docs. |
The PR we're currently working on (#53) adjusts the source time prior to sampling to give an unbiased sampling of the valid range given the actual source time and event duration. So I think that PR addresses your concern. |
#53 is now in Scaper, @beasteers, does it address this issue? |
It would be convenient to be able to omit the
source_time
for background files and have it start at any point in the recording. So essentially, have it default to('uniform', [0, bg_audio_file_duration])
. And the same goes forevent_time
. I'd like to be able to just specify('uniform',)
for example and have them randomly placed throughout the file.Similarly but less important, it would be nice to be able to omit event_duration and have it default to
('const', fg_audio_file_duration - source_time)
In general, I think providing sensible defaults for parameters (like
label
andsource_file
default to('choose', [])
,source_time
defaults to('const', 0)
, etc) would be helpful.The text was updated successfully, but these errors were encountered: