How to append to existing wav without reading the wav data first? #22

Boscop · 2018-03-13T07:39:34Z

I want to record an audio stream from a live performance to wav, in chunks every few seconds.
How to append to an existing wav file with hound without having to read the existing wav data first?
(The recording session could be many hours long.)

ruuda · 2018-03-13T10:54:54Z

That’s an interesting problem, and one that I do not have a good answer to currently.

To clarify, having a long-running process that writes the file in one go is not an option for you?

I can imagine adding something like

WavWriter::append(writer: W, spec: WavSpec) -> Result<WavWriter<W>>
  where W: Read + Write + Seek

that would assert that the current header agrees with the spec, seeks to the end, and behaves as if the WavWriter has already written the samples. Would that solve your problem?

As a temporary workaround, you could open the file in append mode, and append samples manually using Sample::write. This would invalidate the header, which you would need to fix up manually like in WavWriter::finalize_internal().

Boscop · 2018-03-14T04:46:35Z

To clarify, having a long-running process that writes the file in one go is not an option for you?

Unfortunately it's not an option because the data would be multiple GB and it has to run as a 32-bit process (and the rest of the application already uses a lot of RAM). And the stream session length doesn't have an upper bound, and it should be possible to run this for a long time (even days) with constant RAM usage..

But this append function would be perfect for this :)
I think it would make sense to add it, because it would be useful for other types of applications with similar use cases, too.
And it would make sense to also add a convenience method like create that opens a file and wraps in in a BufWriter..

Btw, I'm also concerned about losing already written wav data when the application crashes after some samples were written before the writer was finalized to write the header..
E.g. let's say the application was running for an hour, wrote an hour of wav samples but since it keeps the WavWriter around, it's not finalized so the header hasn't been updated, and then it crashes, leaving the wav file in an inconsistent state, and the hour of wav samples not reflected in the header..
To avoid this, would you recommend finalizing the writer and re-opening the file on every append (every few seconds), or would it be inefficient?
Maybe it also makes sense to allow calling finalize_internal() from user code, so that there is a way to update the header without consuming the writer? (Then finalize_internal() would have to seek back to the end after writing, and have a different name.)

Or let's say I open this corrupted wav file again with append after restarting the application, will it seek to the correct end of the samples, or to the end of where the header says the samples end?
If it seeks to the actual end of the samples, so that finalizing this writer would write the correct header, without losing the samples that were previously not reflected in the header, that would be great :)
So it would only have to do this check and round it to a correct boundary to continue writing from (e.g. if the application crashed after writing the sample for the left channel to position i, before writing the one for the right channel at i+1, it should round down to the boundary i and continue writing samples from there.
So like:

let sample_bytes = file_size - header_size;
let sample_total_size = self.bytes_per_sample * self.spec.channels;
let seek_pos = sample_bytes / sample_total_size * sample_total_size; // round down
// delete remaining bytes after seek_pos and seek to there to continue writing from there

Then after this writer is finalized it will have repaired the corrupt wav file without losing the samples that were recorded previously..

ruuda · 2018-03-14T16:59:21Z

Unfortunately it's not an option because the data would be multiple GB and it has to run as a 32-bit process (and the rest of the application already uses a lot of RAM). And the stream session length doesn't have an upper bound, and it should be possible to run this for a long time (even days) with constant RAM usage.

This should be possible even with the normal writer, it uses constant memory, and only few bytes by itself. If you use an underlying BufWriter, its size can be controlled if you construct it manually. Also, please be aware that a wav file can store at most 4 GiB of audio data. The size is limited by a 32-bit length field in the header 😕

Maybe it also makes sense to allow calling finalize_internal() from user code

I think it would make sense to add a flush(), that updates the header, and flushes the underlying writer, for this purpose.

Or let's say I open this corrupted wav file again with append after restarting the application, will it seek to the correct end of the samples, or to the end of where the header says the samples end?

I would say in general the safe thing to do is to return an Err when the file is corrupt, rather than silently overwriting either the header or the data. Such a file would look to Hound as a valid wav file, but with bytes appended. In your case, if you wrote the file and you know there should be no tailing bytes (Hound does not write them), then it should be safe to interpret them as audio data and fix up the headers.

I will try to have a go at this next week.

Boscop · 2018-03-14T22:32:14Z

Ah right, the BufWriter would use constant RAM, so when recording a stream without interruptions (app crashes) it would work but I also want to do other stuff like continue a stream recording session after restarting the app (which requires appending) and concatenating wav files into one etc. so I think it makes sense to add the append() functionality, and flush() too.

ruuda · 2018-03-24T15:42:50Z

I have a work in progress in the append branch. I think in the end it is better to not pass in the spec and just read it from the existing file, otherwise I would have to introduce a new error variant, which would be a breaking change. Verifying that the spec of the existing file is as expected is something that the user could do.

Hound can read more kinds of files than it can write, so appending will not always be possible. I still have to handle that properly and return Error::Unsupported. But at least for files written by Hound itself, appending should work for now.

Boscop · 2018-03-25T12:02:38Z

Thanks :)

I think it's ok to just read the spec from the existing file.

Btw, now after opening a WavWriter for appending, is there any way to manually check the spec against what it should be?
Could WavWriter have a getter for spec?
Or how else should the user verify the spec? :)

ruuda · 2018-03-25T17:26:13Z

The writer does store the spec, so it could have a getter, good idea! I’ll add one.

ruuda · 2018-04-05T11:51:24Z

I pushed another update that makes the implementation more robust. (It will not corrupt files that were not written by Hound.) Does the current API suit your needs @Boscop? If it does I will make a new release shortly.

Boscop · 2018-04-05T21:19:48Z

Yes, thanks :)

Btw, I see that you are using a custom WriteExt trait.
Is there a reason why you aren't using WriteBytesExt?

ruuda · 2018-04-07T21:43:38Z

Hound 3.4.0 with append support is now on crates.io! I renamed WavWriter::append() to WavWriter::new_append(), and WavWriter::append() is the convenience constructor that takes a path.

Btw, I see that you are using a custom WriteExt trait. Is there a reason why you aren't using WriteBytesExt?

The reason is mostly historical; I introduced the trait in March 2015 (c61733a), analogous to ReadExt in Claxon. Reading integers of specific size and endianness had been part of the std::io::Reader trait for a long time, but around that time, the entire std::io module was overhauled, and reading integers was removed from the standard library. (This was before Rust 1.0.) I started to move away from the old std::io in February 2015. Many crates were published around that time to cover the gap, including byteorder. I am not sure if I was aware of them at that time. I just implemented the few parts I needed myself.

And now that all of the code is already there and working fine, there is little advantage to switching to an external crate. There are no code size advantages because all of the functions are inlined, and there is a significant cost to dependencies that is often underestimated.

ruuda closed this as completed Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to append to existing wav without reading the wav data first? #22

How to append to existing wav without reading the wav data first? #22

Boscop commented Mar 13, 2018

ruuda commented Mar 13, 2018

Boscop commented Mar 14, 2018

ruuda commented Mar 14, 2018

Boscop commented Mar 14, 2018

ruuda commented Mar 24, 2018

Boscop commented Mar 25, 2018 •

edited

ruuda commented Mar 25, 2018

ruuda commented Apr 5, 2018

Boscop commented Apr 5, 2018

ruuda commented Apr 7, 2018

How to append to existing wav without reading the wav data first? #22

How to append to existing wav without reading the wav data first? #22

Comments

Boscop commented Mar 13, 2018

ruuda commented Mar 13, 2018

Boscop commented Mar 14, 2018

ruuda commented Mar 14, 2018

Boscop commented Mar 14, 2018

ruuda commented Mar 24, 2018

Boscop commented Mar 25, 2018 • edited

ruuda commented Mar 25, 2018

ruuda commented Apr 5, 2018

Boscop commented Apr 5, 2018

ruuda commented Apr 7, 2018

Boscop commented Mar 25, 2018 •

edited