Audio normalisation? #253

srchild · 2022-05-07T20:42:57Z

srchild
May 7, 2022

{new user alert, I will probably make some silly comments/questions/requests!}

First, congratulations to Patrick McGuire and anyone else involved. This is a great idea and it installed and set up easily.

Audio levels seem really low on the archived clips, whether I listen on my PC or on my phone, and whether I listen to my own recordings or to virginia.birdnetpi.com or tyreso.birdnetpi.com

Perhaps the audio could be normalised before being saved?

On a quick browse of the source code I noticed that you use ffmpeg in birdnet_recording.sh. ffmpeg can be used to normalise audio e.g. see:

https://superuser.com/questions/323119/how-can-i-normalize-audio-using-ffmpeg

and

http://ffmpeg.org/ffmpeg-all.html#loudnorm

Here's a loudnorm recipe to test (I can't test it now - it is dark and there are no birds singing!)

http://johnriselvato.com/ffmpeg-how-to-normalize-audio/

mcguirepr89 · 2022-05-07T21:14:03Z

mcguirepr89
May 7, 2022
Maintainer

Hello, @srchild

I had to do some googling to know more about your suggestion, as I'm not an audio-buff.

Here are my normalization results per your links:
Before (without) normalization:
https://virginia.birdnetpi.com/index.php?filename=Carolina_Chickadee-99-2022-05-06-birdnet-08:43:42.mp3

After (with) normalization:
https://virginia.birdnetpi.com/index.php?filename=Normalized_Carolina_Chickadee-99-2022-05-06-birdnet-08:43:42.mp3

Let me know if that is what you meant

6 replies

mcguirepr89 May 7, 2022
Maintainer

It will be important to test the differences with the same samples going into the model to determine whether normalization should be happening on the audio before it goes into the model or after (during extraction).

There are a lot of surprising differences in results based on mics, so I would assume that normalizing differences will yield similarly sensitive results.

This type of stuff will need lots and lots of testing to determine what constitutes a good default. Adding it as a post-processing optional feature, however, would allow folks to start testing in an "experimental" capacity so as not to only have a few folks in development weighing in.

Thoughts?

srchild May 7, 2022
Author

Yes that is what I meant. Your normalised version is much easier to listen to without having to boost my playback volume much higher than needed for any other audio listening.

Simplistically, it is a matter of increasing the gain on the recordings so that the recordings are more easily audible, but not increasing so much that you clip the peaks. I think the ffmpeg parameter will protect you against overboosting and clipping it so long as you follow that recipe or something similar.

It is certainly worth some caution to avoid distorting the recordings and affecting the analysis. I should think (I also am not an audio buff) that if you do linear normalisation that should avoid distortion. Some non-linear normalisation algorithms may affect the analysis e.g. if they boost lower volumes much more than higher volumes, decreasing the dynamic range.

There is also the possibility of using the un-normalised recording for analysis, then normalising only the recordings that are to be saved so that they can be played back more easily.

ehpersonal38 May 7, 2022
Collaborator

I remember when I first set my BirdNET-Pi up, I played around with noise reduction - I would enable noise reduction, play a single bird sample on my phone that was close to the microphone, and note the average confidence for 10 or so trials. Then repeat with noise reduction off. For my setup noise reduction actually seemed to make detection confidence a little worse.

Anyways, we'd need something like that to test the optimal microphone configuration (normalization, gain, noise reduction, pass filters, etc.), just less tedious. And maybe we'd go through all that work just to discover the default microphone with average gain is best! 😆

srchild May 8, 2022
Author

Noise reduction needs to know what you consider noise and what you consider signal. For example a noise reduction algorithm might remove high frequencies which it assumes to be hiss/background noise, in favour of clarity of human voices in the mid-range, and in doing so remove or distort high-pitched bird song.

srchild May 8, 2022
Author

Normalising the saved clips alone might work best. Normalising the whole 15sec clips might not work as well, because during the longer period (15 sec vs 3-6 sec) there my be some loud transients (dogs parking, car doors slamming, or even another louder bird) which would reduce the headroom for normalisation

CaiusX · 2022-05-08T09:07:41Z

CaiusX
May 8, 2022
Collaborator

Hi @srchild and @mcguirepr89 & gang

I think normalization and/or other effects on the wav recording should be done post analysis, and that the user should have an option to save the audio file

As is - useful if you are keeping the samples for future model training - "this is what my RPi heard and this is what it interpreted". This can be reviewed, given ticks where correct ID, crosses where incorrect ID and Correct ID inserted into a DB field & potentially fed back into a training system (still to be identified/developed)
Processed - This is what my RPi heard, optimised for human consumption
Not at all - I'm not interested in the sounds, only the data for some further analysis (minimal system).

@mcguirepr89 - this loops back to our previous chat about including "sound-system-setup" as a Database - doesnt have to be big, just "Microphone Type, Recording Alsa settings, saving format, saving ffmpeg settings ......." with date/time set and updated when reset?

Your thinking?

Best
CaiusX

4 replies

srchild May 8, 2022
Author

I agree there is value in preserving the original recordings without processing.

I think that easy access to listen to the recordings is also valuable. In my (very) limited use of this so far I have seen some outrageous misidentifications e.g. apparently I have a eurasian eagle owl in my garden, the first one recorded in the UK for over 200 years. Being able to listen to the recording helps me understand them - the "eagle-owl" was actually a computer beep picked up by the microphone!!. Audio file attached just for fun.

The problem is that if I turn up my computer volume to maximum to listen to recordings and forget to turn it down again then I get deafened whenever some other website starts playing audio.

How about normalising only during playback. This will preserve the original data. Most clips will never be played back so there will be little extra load on CPU. Normalising during playback could be a user configuration option, or perhaps a per-playback option such as a checkbox next to the recording "check this box to boost audio volume during playback".

_By_Date_2022-05-05_Eurasian_Eagle-Owl_Eurasian_Eagle-Owl-74-2022-05-05-birdnet-22_53_56.zip

DD4WH May 9, 2022
Collaborator

How about normalising only during playback. This will preserve the original data.

That is the best option in my opinion! I would prefer a user configuration option: once set, then forget :-).

mcguirepr89 May 9, 2022
Maintainer

I like these ideas -- we'll play around with implementing them and see what speaks to the user ;)

mhaeberli May 12, 2023

still need this, I think!

Mattmanandeddie · 2023-05-13T23:13:51Z

Mattmanandeddie
May 13, 2023

Just spitballing ideas,
Has anyone tried JACK audio with Calf studio gear? I've used it on my desktop running Ubuntu Studio to make a bad mic sound better for voice. I am no audio pro either but Calf let's you build a complete pro studio rack equipment in software and Qjackctl let's plug everything together with virtual cables. I have BirdNET-Pi on a RPi3B+ I think it would struggle with the load, but a RPi4 might work with a desktop environment. I have another instance of BirdNET-Pi in a LXC container. I'll try and see if I can get JACK & Calf to work with a GUI somehow, CLI audio editing isn't fun.
X11 forwarding over SSH maybe???
RTP stream>>Desktop or LXC using JACK>>LXC BirdNET???
Hmmmm,
Just thoughts.

10 replies

morrowwm Jan 19, 2024

Sorry for the delay, I have been thinking on how to avoid the distortion with no progress. There might be a less severe filter that provides a better result. I'll work on making the settings part of configuration.

DMontgomery40 Jan 31, 2024

@morrowwm any luck? Even something just to test? My dad is obsessed with his birdnet-pi setup I made him, but now he can't hear anything above 6khz anymore, even with hearing aids. It's killing me, I really want to find a solution for him, and have tried some sloppy automations to run files through Audicity, but I don't know what I'm doing there... Even if you have something to mess around with that's better but not ready for production, would love to help test and tweak it!

morrowwm Jan 31, 2024

I've been distracted by other projects.

Did you try the hack I listed above?

118,119c117,124
<     sox -V1 "${h}/${OLDFILE}" "${NEWSPECIES_BYDATE}/${NEWFILE}" \
<       trim ="${START}" ="${END}"
---
>     # reduce noise and normalize
>     # get a noise profile from "quiet" preceding call. 
>     # This is not robust. The call might be at the beginning of the extraction.
>     sox "${h}/${OLDFILE}" -n trim 0 2.0 noiseprof /tmp/noise.profile 
>     # apply noise profile
>     sox --norm=-3 "${h}/${OLDFILE}" /tmp/temp.wav noisered /tmp/noise.profile 0.25
> 
>     # output
>     sox -V1 /tmp/temp.wav "${NEWSPECIES_BYDATE}/${NEWFILE}" trim ="${START}" ="${END}"

Maybe tweak the noise reduction parameter (0.25 here), or change the trim window. Let me move this up the priority list.

morrowwm Feb 1, 2024

Not too many birds here this time of year. I do hear crows, so I'm suspicious there's something amiss with my equipment.

DMontgomery40 Feb 1, 2024

Oh thank you so much, I did try the code above, and if the first two seconds of every recording we silent, it would be perfect. Unfortunately that's rarely the case, even went down to one second. If it is clear, it sounds great, if there's any bird at all it sounds like robot aliens.

For now I've just tried this for my dad: le
sox -V1 "${h}/${OLDFILE}" "${NEWSPECIES_BYDATE}/${NEWFILE}" highpass 500 gain +3 trim "${START}" "${END}"

It's... meh. It kinda helps, the bird is louder but so is everything else of course

morrowwm · 2024-02-02T16:04:33Z

morrowwm
Feb 2, 2024

Got some birds today. I'm experimenting with NOT changing the noise profile every time, i.e. not trying to find only noise in a sound file which might contain a bird song. It doesn't seem to work having a noise profile which you never change.

Here's my current changes. In scripts/server.py

$ diff ~/src/BirdNET-Pi/scripts/server.py ./server.py
16a17
> import sox
321d321
< 
475a476,477
>         num_detections = 0 # WMM
> 
482a485
>                         num_detections = num_detections + 1 # WMM
613a617,623
> 
> 
>         if num_detections == 0:
>             print(f"WMM: {num_detections} detections in {full_file_name}, using for noise profile")
>             tfm = sox.Transformer()
>             tfm.noiseprof(full_file_name, "/home/bill/noise.prof")
>

and in scripts/extract_new_birdsounds.sh

$ diff ~/src/BirdNET-Pi/scripts/extract_new_birdsounds.sh ./extract_new_birdsounds.sh
118,119c118,126
<     sox -V1 "${h}/${OLDFILE}" "${NEWSPECIES_BYDATE}/${NEWFILE}" \
<       trim ="${START}" ="${END}"
---
>     #sox -V1 "${h}/${OLDFILE}" "${NEWSPECIES_BYDATE}/${NEWFILE}" \
>     #  trim ="${START}" ="${END}"
> 
>     # reduce noise and normalize
>     # uses a noise profile from "quiet" preceding call. 
>     sox --norm=-3 "${h}/${OLDFILE}" /tmp/temp.wav noisered /home/bill/noise.prof 0.15
> 
>     # output
>     sox -V1 /tmp/temp.wav "${NEWSPECIES_BYDATE}/${NEWFILE}" trim ="${START}" ="${END}"

If there are no detections, generate a new noise profile. Awaiting some actual birds now.

I tried a highpass filter too, to diminish the low frequency alien warbles. I agree, the results were unimpressive.

I also looked at some more exotic noise reduction tools, for example using machine learning methods. The ones I found are pretty big packages, and mostly focused on recognizing human speech, then removing everything else. This makes me think one could enhance the BirdNET-Analyzer package itself, to decrease the amplitude of the parts of a sound file which don't match what it identifies. That's a more involved undertaking.

2 replies

morrowwm Feb 7, 2024

This seems to be working slightly better. Maybe some ~~low~~high-pass filtering?

Common_Raven-90-2024-02-07.mp3.zip

krummrey May 23, 2024

During the day I have pretty much a bird singing from dusk till dawn. So even when no species is reliably detected, there are birds singing. Maybe doing a noise profile during the night would work better? Or building a noise profile with a detection score below 1% would be more reliable.

Vollpflock · 2024-02-07T20:50:38Z

Vollpflock
Feb 7, 2024

This seems to be working slightly better. Maybe some low-pass filtering? Common_Raven-90-2024-02-07.mp3.zip

You mean high-pass?

1 reply

morrowwm Feb 7, 2024

Yes, high-pass, not low pass. I've edited my posting.

dwreski · 2024-05-31T13:06:56Z

dwreski
May 31, 2024

Should we expect to see this improvement in the current version? It's been several months, but I'm also having this issue with barely audible recordings. Are there instructions for a newbie that I can implement to improve audio quality?

4 replies

morrowwm May 31, 2024

I'm not happy with the way it distorts the sound. But it could go in as an option, I suppose. I have another project ahead of work on this right now. @krummrey has a good suggestion above.

morrowwm Jun 5, 2024

Update: there was a bug in the extract_new_birdsounds.sh change. Fixed now.
** Update 2: improved debugging output **

Try this, anyone who is interested. Two files, and you may have to install the sox python library.

Both files go in the scripts folder.

In server.py, if no detections are made, a noise profile is created using the sox.Transformer() method. That noise profile is used in extract_new_birdsounds.sh, as well as filtering out low frequencies. There is some distortion. Maybe too much.

P.S. You might want to experiment with the noise reduction level circa line 122 in extract_new_birdsounds.sh,:

sox "${h}/${OLDFILE}" /tmp/temp1.wav noisered ${NOISE_PROF} 0.15

If there is interest, I'll polish it enough to make a pull request.

birdnet.zip

dwreski Jun 11, 2024

Will this eventually be made available as a simple version update? I'm not new to Linux, but I don't have the resources to experiment with this right now.

morrowwm Jun 11, 2024

In my opinion, it's not working well enough yet.

morrowwm · 2024-06-12T18:52:25Z

morrowwm
Jun 12, 2024

I've made a substantial change to this experiment, which is yielding much better results for me.

I gave up on using the sox noise reduction function, and wrote a small python script which uses the noisereduce python module. Homepage here. You'll have to do pip install noisereduce before using it.

The script is simple:

#!/user/bin/python

import sys
from scipy.io import wavfile
import noisereduce as nr

# load data
rate, data = wavfile.read(sys.argv[1])

# perform noise reduction
reduced_noise = nr.reduce_noise(y=data, sr=rate)
print("Completed noise reduction")

wavfile.write(sys.argv[2], rate, reduced_noise)

and yields good noise reduction. A blue jay stopped by this afternoon. Showing the sound files in audacity: the first is the input, scaled up to be visible, the second is after noise reduction and a 1KHz highpass, and the third is birdnet's output after trimming to just the one call.

And here are the individual sound files.
bluejay_soundfiles.zip.zip

Latest version of my changes:

birdnet_changes.zip

This can be cleaned up a bit to remove output of the working files, and probably move more processing into the python script, since it has the sound data open. But it looks very promising. No distortion like sox noisered was doing.

0 replies

srchild · 2024-06-12T21:00:02Z

srchild
Jun 12, 2024
Author

I’ve enabled that now Simon From: morrowwm ***@***.***> Sent: Wednesday, June 12, 2024 7:53 PM To: mcguirepr89/BirdNET-Pi ***@***.***> Cc: srchild ***@***.***>; Mention ***@***.***> Subject: Re: [mcguirepr89/BirdNET-Pi] Audio normalisation? (Discussion #253) I've made a substantial change to this experiment, which is yielding much better results for me. I gave up on using the sox noise reduction function, and wrote a small python script which uses the noisereduce python module <https://pypi.org/project/noisereduce/> . Homepage here <https://github.com/timsainb/noisereduce> . You'll have to do pip install noisereduce before using it. The script is simple: #!/user/bin/python import sys from scipy.io import wavfile import noisereduce as nr # load data rate, data = wavfile.read(sys.argv[1]) # perform noise reduction reduced_noise = nr.reduce_noise(y=data, sr=rate) print("Completed noise reduction") wavfile.write(sys.argv[2], rate, reduced_noise) and yields good noise reduction. A blue jay stopped by this afternoon. Showing the sound files in audacity: the first is the input, scaled up to be visible, the second is after noise reduction and a 1KHz highpass, and the third is birdnet's output after trimming to just the one call. image.png (view on web) <https://github.com/mcguirepr89/BirdNET-Pi/assets/11416685/71cbc292-b316-4e14-90f3-e0232f2fabaa> And here are the individual sound files. bluejay_soundfiles.zip.zip <https://github.com/user-attachments/files/15809571/bluejay_soundfiles.zip.zip> Latest version of my changes: birdnet_changes.zip <https://github.com/user-attachments/files/15809577/birdnet_changes.zip> This can be cleaned up a bit to remove output of the working files, and probably move more processing into the python script, since it has the sound data open. But it looks very promising. No distortion like sox noisered was doing. — Reply to this email directly, view it on GitHub <#253 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY4QMR35LFPOP4UN6ZHWI73ZHCKH7AVCNFSM5VK4UBZ2U5DIOJSWCZC7NNSXTOKENFZWG5LTONUW63SDN5WW2ZLOOQ5TSNZVGU2TEMQ> . You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AY4QMRY7PPGTV3EPG2JTPW3ZHCKH7A5CNFSM5VK4UBZ2YY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOACKNXAQ.gif> Message ID: ***@***.*** ***@***.***> >

1 reply

morrowwm Jun 13, 2024

I've moved the high pass filter and normalization steps into the new python script now, which should save some processing time. The noise reduction is taking around 4 seconds on a RPi 4.

These are the changed files for this experiment. server.py can actually be put back to baseline, I think.

birdnet_changes.zip

Audio normalisation? #253

Replies: 8 comments · 28 replies

mcguirepr89 May 7, 2022 Maintainer

mcguirepr89 May 7, 2022 Maintainer

srchild May 7, 2022 Author

ehpersonal38 May 7, 2022 Collaborator

srchild May 8, 2022 Author

srchild May 8, 2022 Author

CaiusX May 8, 2022 Collaborator

srchild May 8, 2022 Author

DD4WH May 9, 2022 Collaborator

mcguirepr89 May 9, 2022 Maintainer

srchild Jun 12, 2024 Author

Replies: 8 comments 28 replies

mcguirepr89
May 7, 2022
Maintainer

mcguirepr89 May 7, 2022
Maintainer

srchild May 7, 2022
Author

ehpersonal38 May 7, 2022
Collaborator

srchild May 8, 2022
Author

srchild May 8, 2022
Author

CaiusX
May 8, 2022
Collaborator

srchild May 8, 2022
Author

DD4WH May 9, 2022
Collaborator

mcguirepr89 May 9, 2022
Maintainer

srchild
Jun 12, 2024
Author