Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question re: "Simultanous" writes & reads with buffer #324

Open
FrankBoesing opened this issue Sep 13, 2021 · 7 comments
Open

Question re: "Simultanous" writes & reads with buffer #324

FrankBoesing opened this issue Sep 13, 2021 · 7 comments

Comments

@FrankBoesing
Copy link

FrankBoesing commented Sep 13, 2021

Hello Bill,

Question:
Your TeensySDIOLogger example is great.
I want to create an extension for the Teensy audio library. There will be two classes that can play and record a wave files. Possibly this will happen for multiple files at the same time.
The grid of the audio lib is about 3ms.
I would want to use the ring buffer. Can this also be used for reading?
How would you go about this, or can you give me some advice on how best to proceed? My question is only about your library - I can handle the rest :)

@greiman
Copy link
Owner

greiman commented Sep 13, 2021

The RingBuf class can be used for read or write.

The problem with SD cards is that they are not designed for multiple data streams. The are designed for a single stream with a standard format as specified by the SD association. The problem is most cards are optimized for phones that use several GB of cache. This allows hundreds of MB cache for each file and matches the card's huge internal size of flash pages.

Each manufacture optimizes card buffering, caching, and algorithms for their target application. You should try a variety of cards.

Try cards like the Kingston Canvas series that are optimized for games, cameras, Raspberry Pi.

Some cards optimize read by how the file was written. If you write the card with small writes it will mark it for a small read cache.

So look at sites for gaming, cameras like GoPro, Rasberry Pi and buy some of the recommended cards and test them with your app.

For RingBuf there are a pair of read-into read-out or write-in write-out functions. I only provided write examples since that's where the biggest latency problem occurs.

I have used the read function to increase read performance for apps where a binary file is read and a .csv file is written. This really is to improve write performance by no interleaving reads. More data is written than read so buffering read is a win.

@greiman
Copy link
Owner

greiman commented Sep 13, 2021

You need to experiment with transfer sizes. In the bin to csv example I used big reads and waited till the buffer was empty since this gave the best performance to convert a file.

For most users the RingBuf is useful to avoid write latency problems and a single 512 byte sector aligned on 512 byte boundaries is good.

You can use the RingBuf with ISRs. See the Teensy DMA ADC example.

@FrankBoesing
Copy link
Author

Thank you Bill! Your Answer is very helpful, and I'll see what is doable.

I bought a Kingston Canvas Go! Plus and a Sandisk Extreme, both rated with "A2".

@greiman
Copy link
Owner

greiman commented Sep 14, 2021

It's too bad SD cards are not better documented. The internals are trade secrets. I bought about 20 cards and wrote programs to classify these for various use.

I gave up when I found that the history of how areas were written changed read performance and writing cards several times with different profiles changed write performance.

There is a sizable database in high end cards to optimize performance. Wear leveling algorithms are almost impossible to determine experimentally but are critical to understanding max write latency.

Cards assume files structures are in know places so formatting is critical. FAT areas use small buffering while data areas use large buffers and you should place critical things in reserved areas that won't be written other than during formatting.

Shows what a market with many billions of dollars can produce.

@FrankBoesing
Copy link
Author

FrankBoesing commented Sep 16, 2021

Hi Bill,

I have tested a few cards. I am very surprised by the results.
My test program plays 2 file 8 channel / 16bit, and 1 file 2 channel / 16bit wave at the same time.

The newly purchased cards perform very differently. The best is the Kingston Cavas Go! Plus.
But even a no-name card that I took out of an old smartphone many! years ago from an old smartphone is better than the newly purchased Sandisk Extreme (A2).

I am aware that the results only apply to this test, and the cards probably perform differently in other tests - but it is still amazing.

The % is the AudioProcessorUsageMax() from Paul's Audio Library. Smaller values -> better

  1. Kingston Canvas GO! Plus 64GB, A2 (23%)
  2. MicroData 16GB (40%)
  3. Sandisk Ultra 32GB (48% )
  4. Unbranded Black card, No labels, quite old! 8GB (61%)
  5. Sandisk Ultra 32GB, A1 (72%)
  6. Sandisk Exteme 64GB, A2 (80%)
  7. Sandisk Ultra 16GB - Test stops, >240% ?! Does not work.

@greiman
Copy link
Owner

greiman commented Sep 16, 2021

Often old cards are better for multiple streams since they have smaller flash pages. They manage flash in smaller units.

Modern cards have huge flash pages. All cards emulate 512 byte sectors..

Flash is managed by AUs, Allocation Units. Each AU has a number of RUs, Record Units. RUs are a multiple of 16KB.

The AU for large high performance SDHC cards can be as large as 4MB with 512KB RUs. SDXC cards can have 64 MB AUs.

This make sense for a card that can write at 200 MB/sec on a PC, Mac or Phone. Not so much for a micro-controller.

Flash is written in units of an RU. If you write a file using single sector write commands an entire RU will contain just 512 bytes.

When you read the file internal data rates will be very high if RUs are only partially filled. Performance of cards is specified in terms of the time to read 256 fully filled RUs,

When free AUs are needed, the card garbage collects partially filled RUs by copying data and rewriting RUs and remapping. this causes latency problems.

You can imagine how many choices engineers have to make in the effort to optimize all of this with parallelism and algorithms

@greiman
Copy link
Owner

greiman commented Sep 16, 2021

One of the wildest thing is that the read performance of a file can change if writing other files cause garbage collection the first file.

You can see why interleaving operations causes fragmentation for writes and performance problems on reads since buffers are RU size.

Cards try to fill RUs in an AU by reading partially filled RUs, add data and write to a new RU in the AU. This results in fragmented AUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants