Audio data file naming standard for Orcasound? #7

scottveirs · 2023-02-03T21:47:12Z

scottveirs
Feb 3, 2023
Maintainer

As we get serious about computing noise metrics and also consider when to implement a suite of improvements to the orcanode code, 2023 is a strategic time to try to agree upon any new standards we want for naming Orasound audio data files -- both the lossy streaming segments and the lossless recordings.

Please review this on-going discussion of the associated date-time format issue in the orcanode repo. Feel free to offer links to other similar conversations you have seen in the bioacoustic community.

We can also discuss the current AWS data "structures" for streaming and archived audio data in separate S3 buckets:

streaming bucket > device + hydrophone location > audio format > UNIX epoch
e.g. streaming-orcasound-net > rpi_sunset_bay/ > hls/ > 1654712898/
archive bucket > device + hydrophone location > YYYY-MM-DD_HH-MM-SS_device_location-samplerate-numchannels.flac
e.g. archive-orcasound-net >
rpi_port_townsend/ > 2023-01-21_16-42-31_rpi_port_townsend-48000-2.flac

Associated issues:

CaseCal · 2023-02-07T19:41:01Z

CaseCal
Feb 7, 2023
Maintainer

I think the format Ben showed here may be the best fit for this project. Having a start and stop timestamp in the filename will make scanning and collating files based on a daterange much easier.

The extra dimensions we have that are not addressed in current schema are the time and frequency domain granularity. I think we could either address through the filename (Ben's examples had precision in seconds so this is close, but we would probably want to change it to something like seconds per sample?) or the location.

Here's examples for a 6 hour file with data at one-minute, 10hz granularity.

Ex 1: rpi_port_townsend/20230101T085500_20230101T145500_60s_10hz.parquet
Ex 2: rpi_port_townsend/60_seconds/10_hz/20230101T085500_20230101T145500.parquet

One additional option is to include some "pre-made" frequency band settings, such as "3rd-octave", "12-octave", "10_log_hz" etc.

1 reply

scottveirs Feb 25, 2023
Maintainer Author

Sorry missed this til now, @CaseCal --

I think either example will work for naming the files that hold the noise levels (for any given averaging time and frequency bands).

When I think about eventually computing noise metrics for the bioacoustic dashboard's noise plot, I imagine Example 2 might work best. (Because it's faster to search a (shorter) file list for the requested datetime within the sub-directory for the requested averaging time and frequency bands, than it is to search a (longer) file list?)

One idea is to clearly convey the parquet files holds noise levels (aka Sound Pressure Levels), e.g. through adding an SPL into the filename or add it as a "directory" as a way to differentiate raw audio data files?

Note that Ben does this by putting an SPL prefix on the BC Hydrophone Network noise level files, e.g. SPL-timeline_OWNER_LOCATION_HYDROPHONE_PERIOD.csv :

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio data file naming standard for Orcasound? #7

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Audio data file naming standard for Orcasound? #7

scottveirs Feb 3, 2023 Maintainer

Replies: 1 comment · 1 reply

CaseCal Feb 7, 2023 Maintainer

scottveirs Feb 25, 2023 Maintainer Author

scottveirs
Feb 3, 2023
Maintainer

Replies: 1 comment 1 reply

CaseCal
Feb 7, 2023
Maintainer

scottveirs Feb 25, 2023
Maintainer Author