-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/improve HRTF normalization #27
Comments
Fix the minimum phase conversion, improve the behind/in front difference with a lowpass, remove the base frequencies. Generally clean up and debug mit.py into something that's maintainable. It was last thrown together before the library even existed as a quick proof of concept. Also, increase the fft length for the magnitude responses to match the sample rate; this gives much better processing when averaging power and so on. Refs #27. Doesn't close it because we want to collect feedback.
Hello, I've been testing the last commit and have some feedback and questions. I've been using OpenAl Soft for my games and am in the process of migrating to Synthizer. As it stands, Synthizer does not perform as I expect it to perform, could you please let me know if I'm missing a setting of some sorts with the following scenario? Note that HRTF is enabled.
I can provide demos for both OpenAl Soft and Synthizer if required in which you can move the listener around the source to test the problem I've described. As always, good work. |
It's possible that they're doing something I'm not but I can't look at their implementation because it's GPL. There's a few possibilities here:
We'll want to determine which. For reverb that's a simple enough answer because you either are or aren't. For near-field effects, modifying your demo to do the same thing but with larger ranges will tell us that; I think you have to touch max distance but there might be another thing to tweak. In general if you can get the same effect at larger ranges, that's near-field. If both of those aren't relevant, then they're probably doing something interesting and I'll want demo sounds or demo programs or whatever you've got, but it'd be helpful if you could play eliminate the possibilities first. How's the rest of it? One of the problems is that I'm too close to the audio to do blind listening tests. Does it function as you'd expect at larger ranges or is there also a problem there? And lastly, are you using a custom dataset or the one built into OpenALSoft? |
I'm not using any reverb myself, and I don't think OpenAl Soft uses one in the background. For both libraries, I went with no configuration other than enabling HRTF. I haven't changed OpenAl Soft's HRTF dataset either, it's just the default. It looks like it is near-field effects, as the difference becomes crystal clear when the source is close, e.g. when the source is 2 meters above and to the left, I can literally turn my head to point at it, but when it's more than like 10, it gets difficult to pinpoint with same accuracy. I'll try to get you the demos today. |
Ok. That's basically what I needed to know. I'll get that to you when I can, but it's going to require tracking down literature. I might be able to do some short term things, though. |
Are you a C or Python user? if I give you links to specific commits can you work with that? I probably won't have anything for a week or so, but it would be nice if I could put out betas or something without having to spam releasing all the time. |
I'm wrapping it in C#, but I can definitely work with commits; I can compile it myself for my wrapper. |
It looks like SYZ_DISTANCE_MODEL_EXPONENTIAL is the closest to their default implementation. I'm still experimenting, will let you know if I can find a match to do a better quality check. In either case, it sounds awesome. |
yeah. That's probably a factor too. Have you seen this? OpenAL has all of the same distance models, but I believe their defaults are different. Perhaps mine should be tweaked to match, though we'll have to wait a bit for 0.9 since that's breaking. Also, there's definitely docs improvements here. You're supposed to match them to whatever environment you're representing, but they don't map well to physical properties so you kind of have to do it by ear. If you find some good defaults that just make you go "WOW" out of the box post them here and I'll probably just use them. It's quite difficult for the person writing the audio library to evaluate it because a lot of how hearing works is that it sounds better if you know what you're hearing. I'll still be looking into near-field effects though. It would be neat if we could get it, and Creative definitely has it. that video implies that OpenALSoft doesn't (OpenALSoft starts at 1:39). But I have trouble believing that because my preliminary research suggests that it's beyond super easy, in the sense that someone who knows more than me could just wave their hand and add some terms to a formula, and OpenALSoft has been doing theirs for 10 years. Maybe there's an unexpired Creative patent floating around. |
I've tried to replicate what Creative has with OpenALSoft, but no matter how slow I move the source, there's just one snap point where the sound just moves across the other side. I would definitely love that effect to exist. I'll keep testing to see if I'm missing a setting. |
Going to go ahead and close this out because it seems as though there's no complaints. I'm deferring near-field stuff under the "it's as good as OpenALSoft" justification, though perhaps this issue will come back if it turns out we aren't as good as them. Fortunately, most future changes in regards to the HRTF should no longer be nearly as painful as the first round of this issue. |
Reading this thread, I have a couple of questions on the HRTF rendering:
Thanks a lot and sorry for the long list of questions. |
The object reference lists all the properties you can set and the tutorial does show how to set properties. Everything that takes an enum value is bound to Python enums. You might have to read the Python bindings source code to work a little bit of it out, but I've had multiple people figure this out without a problem so it would be good to understand where exactly the difficulty here is. It may not be entirely clear why you'd want to set something, but everything you could possibly want to set is documented. I think you're using coordinates wrong. If you move a sound from x = 1 to x = -1 without involving y or z, you're moving the sound through the center of the head. You need to move multiple coordinates at once in a realistic fashion, for example (1, 1, 0), (0.5, 1, 0), etc. If you're not trying to map things to an actual 3D space and need to just throw pan values at the library, then you can grab pannedSource and control the gain, azimuth, and elevation yourself; I suspect this is what you want. This would also address your "it only works for integer positions" thing as well: if you move it through the center of the head, it can only be on the right, in the middle, or on the left. If by near-field correction you mean something like the video I linked above, yes, eventually. If by near-field correction you mean working out an entire second set of HRIRs and crossfading or somesuch, probably not. No one publishes that data as open source, I haven't found any good resources on doing it myself through mathematical analysis, and making the data myself requires a good bit of specialized hardware and access to an anechoic chamber for the duration of the project. I'm going to guess that maybe you're sighted. I'm blind, and when I listen to the Oculus video I can't even reliably tell what is and isn't a volumetric source, and their HRTF is full of artifacts. They're probably using an ambisonics implementation, which allows for volumetric sources at the cost of quality and accuracy. In practice, the best that's possible is "this source sounds small" and "this source sounds big" with no real subtlety between them, and whether it even works or not would depend on the input sound as well. Most of the rest of the "size" perception comes from knowing what it is (e.g. trains are big, cars are smaller than trains, the source is bigger than a point, hey it's a car, boom you have "perceived" how big it is) or seeing it and combining that with what you're hearing. I can go into more details as to why this is specifically hard, but suffice it to say that I don't consider the gain as worth the trade-off and time spent especially since it probably also means doing patent research in order to find something that's not going to get us shut down by Oculus or whoever else's toes doing this is bound to step on. Sorry I can't be more optimistic here. I absolutely agree volumetric sources would be cool. |
Thanks for your answers. On the near field correction yes I meant what is in the video you shared. In that video, I find the near field effect of creative a bit disturbing but nonetheless it works I think. OpenAL does not work from that perspective. Regarding the hrtf spatialization, yes I am trying to map things to an actual 3D space. Before going full steam with 3d space movements I was just assessing my perception of the sound spatialization for each dimension, hence why playing only with x, or y or z independently from each others. I know that the x dimension usually renders well but that height and depth are more challenging to perceive properly. For instance when trying google resonnance I realized that Y and Z were just not perceivable, at least to me. Mayne I got a big hears vs default hrtf params haha. I still do not understand why when moving the sound source to 0.1 or 0.5 or to 1.0 it does not make any audible difference. Regarding properties I definitely missed the references you did mention. Can you please point my to the object reference list file or doc? For the python bindings, is there a particular source file I should look at? Thanks a lot |
I think the problem you're having with 0.1 vs. 1.0 sounding the same is the same thing that @SemihBudak had earlier in this thread. You need to set the distance model params differently. I will make them match the OpenAL defaults in the near future which will help at least a bit. Every HRTF implementation out there that I know of that's not a sound card from Creative models the head as a point, and all distance controls is the volume. If you hit synthizer.github.io the properties are all documented in the object reference and map to Python in the straightforward manner of Some people can. hear height with HRTF. Some can't. It's subtle, and the problem is that it's incredibly personal. Even in an entirely real environment height perception is usually much worse than you'd give it credit for. For me with Synthizer/OpenALSoft/etc, it's most obvious if the source is at about 45 degrees off center, but your mileage may vary. Azimuth has both the gain difference and the interaural time delay, but height cues are entirely about how sound bounces off your individual ears/head which isn't something that can be averaged in a dataset. Maybe I can do better at some point; occasionally I find a demo of someone who has, notably the Blue Ripple people, but (again) all of that research is seemingly highly proprietary so who knows if I can figure out how to duplicate it or not. |
ok thanks. so I have done some testing and I have some feedback. This way we can compare similar sounds simulation renders but with our different perceptions.
Regarding cpu usage, the library is quite efficient for now, I do not observe significant usage. I did play with only one source though. |
It is normal for a horizontal circle to sound like it is at the level of your head, because the position you are feeding the library is the position of the listener's head. The discontinuity is because most of the datasets don't contain data for below an azimuth of -45 degrees. At some point I will figure out synthesizing some fake ones, but this has so far only been a problem in theory. The lack of depth is probably the lack of near-field effects plus you haven't added a reverb. That will get you a little bit of the way there, but you'll probably need to play with manually tweaking gains for the time being. It's also possible that the Steam people are using an occlusion filter to simulate air, but that particular implementation does things like raycasting level geometry that I don't have the resources for. Most of the "depth" stuff you're talking about isn't just HRTF, but the library as a whole, and I'm working on it. In the interest of setting expectations appropriately, you're not going to get what you want on a short time horizon. This is a pre-1.0 weekend project. I'm competitive in the OSS space, but when it comes to "how does insert proprietary solution do x" I have to reverse engineer it from first principles because all of the research on this stuff happens inside the VR companies and they're not telling us how they do it. The publicly available stuff is something like 5 years behind what Steam, Oculus, etc. are doing, not just on audio, but on everything they're involved in. |
Ok I understand the situation. Thanks for setting expectations and I appreciate this is not easy at all and the VR companies have lots of resources to work on this domain. I need to try your reverb functions, on my todo list. I have been told reverb helps a lot with sound spatialisation, but I am not so convinced for outdoor scenarios, especially if one does not use vision on top of sound perception. What about the sofa profiles, is it something your library manages or will manage in your view? When I played with sofa profiles in the past (youtube video playing a wave sound on the beach), to be honest, I was having difficulties to pick up one in particular, they kind of all sounded the same to me. What is the default dataset you are using? |
Even if you buy the best HRTF that exists, you will not be able to get an exact distance estimate out of anyone. The best you can do is "yeah it sounds close" or "maybe it's far away". A lot of the distance perception is also you combining sight with the demo. I'm getting the impression that you think that HRTF alone does way more than it can. Even the best blind people in the real world can't get beyond "pretty close" or "hm, probably far away" unless we're talking about echolocation which encodes the distance in the amount of time it takes the sound generated by the echolocator to return. The primary queues for distance are the direct path as compared to a reverb (e.g. in the distance mostly it's from the reverb, late reflections sound closer to the direct sound), how much sounds move when you turn your head (closer sounds move more), and so on. My best guess for e.g. Steam is that they're fading out a stereo crossover and faking it, then just saying that sounds beyond a meter or two they're not going to bother simulating to that level of detail. I also wouldn't be surprised if their demos/library put in a reverb by default. Since I already know what you're working on, it is important to point out that HRTF isn't magically a lot better for blind people: we aren't going to get much more out of it than you can with your eyes closed. We're just better at processing what we do get. Much of the perceived quality and accuracy is that you are combining sound and sight without realizing; in particular, the Steamaudio demos have audible artifacts to me, even if they do convey a feeling of "space" better. Things like audiogames always add additional mechanisms that provide more information to one degree or another when precise distance judgements are required. The dataset that's currently being used is MIT Kemar: https://sound.media.mit.edu/resources/KEMAR.html This is what OpenALSoft uses. I may eventually look at switching to Cipic, which samples the lower elevations of the Kemar head, and has a higher horizontal resolution. In general changing the HRTF can help if you find exactly the right one, but almost no one bothers changing it because it's a laborious process for a not hugely significant gain. I'm not going to support the Sofa format directly. I might support it indirectly later, but getting HRTF from a dataset to what actually goes into a library like this requires a lot of processing, as demonstrated in our Python script for MIT. A lot of that is reusable, but there is typically some degree of processing required so at best it'd have to be a custom file format and at worst a reimplementation of a lot of Numpy's stuff in C++ (most deps for serious C++ math aren't even close to public domain, which is a goal of this library). It's also worth noting that a large part of why I don't use more recent data is that most of the recent data isn't available for commercial use without prior permission from the authors. This may or may not factor into your decisions as to what you end up using. |
We have a really good story around ITD, but a not so great one around the HRTF normalization. Specifically, we end up throwing out too much of the frequency dependent effects. I'll need to sit down for a weekend or something and hammer on them until we get better quality on the normalization. It may be possible to borrow from the matlab scripts that come with the MIT dataset.
The text was updated successfully, but these errors were encountered: