You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is great! I'm not too experienced with ML development but follow a lot of audio ML research, and I've been thinking that this approach should be the way to do things for a good while. Looking forward to playing around with ddsp for an upcoming project.
Got a few questions...
In some places, harmonic distribution seems nearly synonymous with the amplitude distribution a(n), as a model of variations between partials' spectral magnitudes, but then it's also referenced to model spectral centroid? Can you elaborate on the difference between harmonic distribution and a(n)? I use "overtone distribution" in my code to refer to discrete frequency distributions of partials relative to a fundamental (inharmonic timbre stuff)... probably contributing to my confusion. 😛
I'll be synthesizing novel inharmonic timbres with retuned pitches, using (mostly) harmonic timbres for inputs. Remapping/interpolating f(0) seems easy with the current model. I'm wondering if it's viable to remap overtone partials to an arbitrary frequency distribution with the current model... ie, instead of multiplying the fundamental by integers, simply multiply it by some predefined set of rational numbers/floats. As I'll be synthesizing novel timbres, I won't necessarily have training sets to provide as inputs to train an unconstrained oscillator bank via a loss function... so I'm thinking the process could just be training the current model, still limited to f(0), for the given input and then remapping partial frequencies onto inharmonic frequency sets at the additive synth while still using other features that are generated by the encoder and/or interpolated. Make sense/any immediate issues with that idea?
You use 101 partials in the synth... which for harmonic timbres would extend past the 8kHz nyquist limit for any pitch >~80Hz. Is that just to cover the entire frequency range for any reasonable pitch? Also curious about why you limited it to the 16kHz sample rate... real-time constraint or faster training or something?
Thanks and stay safe out there! Sorry for the wall of text.
The text was updated successfully, but these errors were encountered:
sinusoidal_amps = amplitude * harmonic_distribution
harmonic_distribution sums to 1 for every moment in time (thus I call it a distribution), so it distributes "amplitude" among the harmonics.
Yah, you can do inharmonicity, it's even an input to harmonic_synthesis(), but I just don't use it at the moment. I have some "pure sinusoidal" models, but haven't open sourced them yet.
16kHz is just a standard in ML since audio ML is hard. We're working on extending it to 48kHz and almost have it working, and in principle it should be a lot easier than with other ML models, but there a couple hyperparameters to tweak. Also, personally, I kind of like the retro sound of lower bit rates (hides the model mistakes :). But yes, you can work at higher frequencies.
This is great! I'm not too experienced with ML development but follow a lot of audio ML research, and I've been thinking that this approach should be the way to do things for a good while. Looking forward to playing around with ddsp for an upcoming project.
Got a few questions...
In some places, harmonic distribution seems nearly synonymous with the amplitude distribution a(n), as a model of variations between partials' spectral magnitudes, but then it's also referenced to model spectral centroid? Can you elaborate on the difference between harmonic distribution and a(n)? I use "overtone distribution" in my code to refer to discrete frequency distributions of partials relative to a fundamental (inharmonic timbre stuff)... probably contributing to my confusion. 😛
I'll be synthesizing novel inharmonic timbres with retuned pitches, using (mostly) harmonic timbres for inputs. Remapping/interpolating f(0) seems easy with the current model. I'm wondering if it's viable to remap overtone partials to an arbitrary frequency distribution with the current model... ie, instead of multiplying the fundamental by integers, simply multiply it by some predefined set of rational numbers/floats. As I'll be synthesizing novel timbres, I won't necessarily have training sets to provide as inputs to train an unconstrained oscillator bank via a loss function... so I'm thinking the process could just be training the current model, still limited to f(0), for the given input and then remapping partial frequencies onto inharmonic frequency sets at the additive synth while still using other features that are generated by the encoder and/or interpolated. Make sense/any immediate issues with that idea?
You use 101 partials in the synth... which for harmonic timbres would extend past the 8kHz nyquist limit for any pitch >~80Hz. Is that just to cover the entire frequency range for any reasonable pitch? Also curious about why you limited it to the 16kHz sample rate... real-time constraint or faster training or something?
Thanks and stay safe out there! Sorry for the wall of text.
The text was updated successfully, but these errors were encountered: