Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed of Format and Free cluster count - Dedicated vs Shared #329

Closed
KurtE opened this issue Oct 7, 2021 · 46 comments
Closed

Speed of Format and Free cluster count - Dedicated vs Shared #329

KurtE opened this issue Oct 7, 2021 · 46 comments

Comments

@KurtE
Copy link

KurtE commented Oct 7, 2021

Hi @greiman (and @PaulStoffregen and @mjs513:

For awhile now several of us have been experimenting with trying to add in MTP support for the different Teensy boards.

One of the major issues I have run into with MTP integration, is the host will often timeout if operations including startup take very long to complete. For example at sketch startup if specify a number of Storages for the host, including one or more SD cards, and it takes very long for us to answer requests from the HOST, it will timeout and MTP will not function at all.

Another place that we are working with is the ability to format the SD card. Short version of the story:
I have been seeing that formatting a larger SDCard over SPI was taking a very long time. In my case I am testing using a 32GB Samsung card and a call

sd.format(&Serial);

Was taking in the nature of lets say 45 seconds. Note: I started off with using the SD library to actually do the calls, but then converted example to just SDFat and same results.

Doing some experimenting I am finding a drastic difference in timing with SPI_SHARED versus SPI_DEDICATED.

Simple test sketch:

// List files, format, list files
#include "SdFat.h"
#include "sdios.h"
const int chipSelect = 10; // BUILTIN_SDCARD;
#define SPI_SPEED SD_SCK_MHZ(33)  // adjust to sd card 
SdFat sd;

void setup()
{
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect.
  }

  Serial.print("Initializing SD card...");
  pinMode(chipSelect, OUTPUT);
  digitalWriteFast(chipSelect, LOW);
//  if(!sd.begin(SdSpiConfig(chipSelect, DEDICATED_SPI, SPI_SPEED))) {
  if(!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {
  //if (!sd.begin(chipSelect)) {
    Serial.println("initialization failed!");
    return;
  }

  // Lets print out free cluster count:
  elapsedMillis emFormat = 0;
  Serial.printf("Free Cluster Count: %u dt: ", sd.freeClusterCount());
  Serial.println(emFormat);
  
  
  Serial.println("Press any key to reformat disk");
  while (Serial.read() == -1);
  while (Serial.read() != -1) ;
  emFormat = 0;
  sd.format(&Serial);
  Serial.printf("Format complete %u\n", (uint32_t)emFormat);

  Serial.println("done!");
}

void loop()
{
  // nothing happens after setup finishes.
}

Test run:

Initializing SD card...Free Cluster Count: 976991 dt: 5596
Press any key to reformat disk
Writing FAT ................................
Format Done
Format complete 46467
done!

As you can see in this run it took 5.5 seconds to compute number of free clusters and about 46.5 seconds to do the format

Changing to DEDICATED_SPI drastically changes these timings:

Initializing SD card...Free Cluster Count: 976991 dt: 1165
Press any key to reformat disk
Writing FAT ................................
Format Done
Format complete 2952
done!

But knowing that there could be other devices on the SPI buss, is there some way to say, please do this operation, like we are in dedicated mode?

@PaulStoffregen
Copy link

I wrote a reply on the forum just now, while you were writing this. Here's the text (hopefully I didn't butcher SdFat's inner details too much?)

My understanding of DEDICATED_SPI is it makes use of a write multiple sectors command. But to do this, it must leave CS asserted low between individual sectors, because the library is designed around minimal RAM use. So if you have a flash chip with LittleFS or a sensor or any other SPI chip, and it tries to make use of SPI at the wrong time while SdFat is leaving CS asserted, all that non-SD communication could get written to the SD card and get the card state out of sync with SdFat. That's why it's called DEDICATED_SPI.

Writing multiple sectors is much faster because the underlying media has large block size. When SHARED_SPI writes just one 512 byte sector, the SD card has to do a read-modify-write operation on a huge block. Doing that over and over for each 512 byte sector is slow.

@PaulStoffregen
Copy link

PaulStoffregen commented Oct 7, 2021

Something I've wanted to explore is a way to increase SdFat's cache size at runtime, similar to how we have Serial1.addMemoryForRead(buffer, size) and Serial1.addMemoryForWrite(buffer, size) on the serial ports. The other use case that suffers terrible performance is playing more than 1 WAV file simultaneously. A large cache could allow holding the FAT sectors for several open files. Together with a smarter WAV player which has larger buffers and a scheduler so we can stagger larger reads, my hope is to eventually get SdFat many-file read performance to rival the Wav Trigger product which can play 14 stereo files simultaneously using 4 bit SDIO.

Maybe larger cache would also allow SHARED_SPI to run faster?

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

Future SD cards will require too much memory for many streams with boards like Teensy. This is required for UHS card like this:

Maximum bus speed of FD624 is 624MB/s

Wav Trigger already is limited to old SD designs. Here is a note from The Wav Trigger site:

Interestingly, the lesson at the moment appears to be to stay away from the newer UHS speed class cards (a “U” with a 1 or 3 inside of it.) While these cards have apparently been optimized for use in cameras, doing so has a detrimental affect on their random access times and makes them unsuitable for the WAV Trigger.

New cards are being designed for phones, PCs, and cameras with lots of memory. They are designed for systems like Linux with huge disk caches.

I will not chase this problem with SdFat.

These card have huge Allocation Units, and Record Units.

The host should manage data areas with the unit of AU and transfer data in units of an RU.

An AU for a modern card is measured in units of 4MB and can be as large as 64MB. An AU consists of a number or record units, RU, and an RU can be up to 512KB for modern cards. RUs are a multiple of 16KB so always do at least this size transfer.

If you write less than an RU, then the next write causes the card to reads the RU, add new data and write a new RU. Soon the AU must be copied to recover flash. This causes huge latency problems.

If you read less than an RU you will likely reread the RU several times.

You can try to manage buffering. Use contiguous exFAT files, they have no FAT entries and are designed to be preallocated for write. Do transfers in power of two sectors. up to the 128KB exFAT cluster size. SdFat will not use its internal cache, will do a single multi-sector transfer, and not access the FAT or bit map for the exFAT file.

@FrankBoesing
Copy link

FrankBoesing commented Oct 8, 2021

Have not tried 14 files, but simultanous playing two 8 channel files and record one stereo works flawlessly if you increase the audioblocksize. My waveplayer can do that. The point is: 3ms is a too tight corset.

For 14 files, i'd suggest to read in loop().

@FrankBoesing
Copy link

+ you need a fast card

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

The problem is not playing a many channel file. It's playing 14 stereo files.

Each file with newer UHS cards requires very large buffers for good performance.

@PaulStoffregen
Copy link

To really make simultaneous audio playing work we need to eventually move to a non-blocking API. If we're playing 7 files and (hypothetically) reads are taking 4 audio updates (3ms each), a scheduler needs to be able to request a non-blocking read to bring in enough data for the next 28 updates. That also means we will need 14K buffer size for each of those 7 files, or about 100K RAM.

@FrankBoesing
Copy link

@biill: have not disputed that. 
Paul: Or just use a MemFile and fill it outside the 3ms.

@PaulStoffregen
Copy link

But the slow format problem with SHARED_SPI is much simpler, since it's just writing zeros. There's no need for lots of RAM usage. Just adding an API at the driver level so the format code can write a sequence of blocks to all zeros should let us get nearly the best SPI speed.

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

A non-blocking API won't do it. Only buffering huge transfers. The standard for SD cards allow huge read latencies, hundreds of ms, if you read less than an RU for many regions of an SD.

@PaulStoffregen
Copy link

We can do large buffers when we have the 8 Mbyte PSRAM chip. :)

@KurtE
Copy link
Author

KurtE commented Oct 8, 2021

Would be great to be able to do lots of this stuff, but this point I am sort of trying to cherry pick a few simple things like:

Why: Serial.printf("Free Cluster Count: %u dt: ", sd.freeClusterCount());

On an SPI drive would this call be 5 times slower in Shared SPI mode?
More details up on the thread: https://forum.pjrc.com/threads/68139-Teensyduino-File-System-Integration-including-MTP-and-MSC?p=290570&viewfull=1#post290570

But on the external 32Gb card

That is in dedicated mode: the timing for the format call and the freeClusterCount:

Initializing SD card...Free Cluster Count: 976991 dt: 1165
Format complete 2952

And in shared mode:

Initializing SD card...Free Cluster Count: 976991 dt: 5596
Format complete 46467

So: Free cluster count went for a little over 1 second to 5.6 seconds
The format went from about 3 seconds to near 46.5 seconds.

The only difference in code was:
Which begin was uncommented:

  //if (!sd.begin(SdSpiConfig(chipSelect, DEDICATED_SPI, SPI_SPEED))) {
  if (!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {

The comments about DEDICATED_SPI sounded like the ability to optimize to read in multiple sectors at a time.
But what I also noticed using Logic Analyzer was the gap between the transfer of bytes was also significantly different.

@KurtE
Copy link
Author

KurtE commented Oct 8, 2021

Sorry I know this is a side question, but was wondering what I would think is a simple thing.

That is suppose I have a pointer to an SDClass object.
Is there a simple way to find out if this object is being serviced by SPI and if so what it is configured for. That is, what IO pin, What speed...

So far I don't see an easy way to get to its config settings.. But I may be missing something obvious.

Thanks

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

Here is the wrapper for the Teensy SPI driver.

As you can see, it uses SPI.transfer(buf, count) for dedicated and shared SPI.

It needs to do a memcpy or memset since buf gets sent or clobbered.

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

Here is the class definition for the SPI wrapper.

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

You can call begin with dedicated SPI to format the card then call begin in shared mode to access the card.

@mjs513
Copy link

mjs513 commented Oct 8, 2021

You mean like this - or did I mess something up - but least now we know it should work - was something Kurt and I were experimenting with.

bool SDClass::format(int type, char progressChar, Print& pr)
{
	SdCard *card = sdfs.card();
	if (!card) return false; // no SD card
	uint32_t sectors = card->sectorCount();
	if (sectors <= 12288) return false; // card too small
	uint8_t *buf = (uint8_t *)malloc(512);
	if (!buf) return false; // unable to allocate memory
	bool ret;
	//Serial.printf("CS PIN IN USE: %d\n", cspin);
	if(cspin < 254) sdfs.begin(SdSpiConfig(cspin, DEDICATED_SPI, SD_SCK_MHZ(16)));
	if (sectors > 67108864) {
#ifdef __arm__
		ExFatFormatter exFatFormatter;
		ret = exFatFormatter.format(card, buf, &pr);
#else
		ret = false;
#endif
	} else {
		FatFormatter fatFormatter;
		ret = fatFormatter.format(card, buf, &pr);
	}
	free(buf);
	if (ret) {
		// TODO: Is begin() really necessary?  Is a quicker way possible?
		if(cspin < 254) {
			sdfs.begin(SdSpiConfig(cspin, SHARED_SPI, SD_SCK_MHZ(16)));
		}
	}
	return ret;
}

It does seem to work to speed the formatting up. Opps just saw an error that I just fixed

@PaulStoffregen
Copy link

FWIW, apparently a new class of "A2" rated cards are now on the market which claim to give minimum 4000 random 4K reads per second, but only if command queuing is used.

@KurtE
Copy link
Author

KurtE commented Oct 8, 2021

As you can see, it uses SPI.transfer(buf, count) for dedicated and shared SPI.

It needs to do a memcpy or memset since buf gets sent or clobbered.

Note: Teensy has better transfer methods:

	void setTransferWriteFill(uint8_t ch ) {_transferWriteFill = ch;}
	void transfer(const void * buf, void * retbuf, size_t count);

They do not clobber the input. And you can set the transfer fill character to something like 0 and not have to pass in a transfer buffer.

Here is the class definition for the SPI wrapper.
Again in order to for example switch to dedicated and then back to shared, it would be nice to know that the SDClass object was using SPI and to get back it's parameters.

Yes I can get the SDCard object using the SDClass.card() method.
and in the SdCard header file there is an isSPI, but with that you need to pass in a CFG
And the SDCardInterface I don't see anything that gets me back to the CFG...
But again probably missing something

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

4000 random 4K reads per second, but only if command queuing is used.

Command queuing is not supported in SPI mode and I don't think it is support for the NXP SDIO controller.

Edit: NXP: Support SD/SDIO standard, up to version 3.0. Need 6.0.

@greiman
Copy link
Owner

greiman commented Oct 8, 2021

And the SDCardInterface I don't see anything that gets me back to the CFG...

All the SPI driver knows about is the SPI port and the SPI speed. these two copied from SdSpiConfig in the begin call.

@PaulStoffregen
Copy link

How does one read the AU and RU sizes? The only info I can find for AU size is part of the 64 byte SD status register. Looks like SdSpiCard class can read it, but SdCardInterface can't.

@FrankBoesing
Copy link

"A2" does not say much - I have a Sandisk A2 - I don't use it for Teensy anymore, because it was too slow.

@PaulStoffregen
Copy link

It probably far from a full RU size write (and I still have no idea how to even discover what the card's RU size actually is) but I made a quick hack to FatFormatter initFatDir() to allocate a 16K buffer and call writeSectors() rather than writeSector(). It lets SHARED_SPI format almost as fast as DEDICATED_SPI on a 32GB Samsung EVO card.

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

"A2" does not say much - I have a Sandisk A2 - I don't use it for Teensy anymore, because it was too slow.

A2 is not slow, it is just unusable on Teensy. The Teensy SDIO controller is stone-age It does not support SD features newer than 2010. Use of SPI is even worse SPI does not support all 2010 features.

Edit: The first SD spec was dated April 3, 2006. Until then SD was defined mainly by SanDisk.

For all practical purposes assume an RU of 512KB for modern cards. There are three RU size classes for all cards:
Table4_61
SDXC cards have the same RU size. SDXC cards have an AU size up to 64MB.

The size of the AU can be determined by ACMD13.

I have the source for exFAT SD cards in new Samsung Android devices. Buffer pools are huge think GB not MB. Soon you will see a new socket for these cards:

SdExpress
SD Express card supports the basic SD interface in UHS-I mode as well as PCIe/NVMe interface.
SD Express Card supports either PCIe Gen 3 or Gen 4 interface, with either 1 lane (1 TX, 1 RX) or 2
lanes (2 TX, 2 RX) as defined by PCI-SIG including hot plug-in/removal support. Due to limited space, a
limited necessary number of out-of-band signals was adopted in SD Express from the PCI-Standard as
described in Section 3.7.3. PCIe Gen 3 supports bit rate of up to 8Gbps, which allows up to 985MB/s (1
lane) or 1,969MB/s (2 lanes) per each direction with 128/130 coding. Additionally, PCIe Gen 4 supports
bit rate of up to 16Gbps, which allows up to 1,969MB/s (1 lane) or 3,938MB/s (2 lanes) per each direction.

There will be microSDs in the 600-700 MB/sec range.

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

Currently I do a max transfer of one cluster in SdFat for shared SPI mode. That limits multi-block transfers to 128KB for shared SPI with exFAT. For dedicated SPI and FIFO SDIO there is no limit. You can write an entire SD for a pre-allocated exFAT file.

exFAT is great. It has an allocated length and used length. Contiguous files don't use the FAT so this allows any size multi-block transfer with no access to other parts of the SD.

As a result of the exFAT spec you can see an enormous write latency if a contiguous file becomes non-contiguous . In that case the entire FAT chain must be constructed which can take many seconds for a huge file.

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

I made a quick hack to FatFormatter initFatDir() to allocate a 16K buffer

Using shared SPI burns flash. If you card has a 512KB RU you will get a factor of eight extra wear with 16KB writes.

I am amazed how well writing 512 byte sectors works on an Uno with shared SPI. Could be a factor of 1024 wear. Data is moved in the card at hundreds of MB/sec to write at 200 KB/sec.

I should probably make a call to switch between shared and dedicated SPI. I am totally redoing how dedicated/shared SPI works in the new beta so I will experiment. That way you could switch to dedicated for format or count free space.

@PaulStoffregen
Copy link

PaulStoffregen commented Oct 9, 2021

Yup, I've seen those new cards. My relatively new Canon camera uses them. Yes, I hear your frustration that the newer cards are designed around systems with gigabytes of RAM. Obviously we don't have anywhere near that amount of memory, even with small external PSRAM chips added.

But we do have so much more than 2K to 16K memory of 1990s era microcontrollers. Even Raspberry Pi's new low-end chip has 256K RAM, and we can expect that trend from all future chip as the older IC fabs with small wafers become increasingly unprofitable. The other major trend we're seeing users building far more sophisticated projects by leveraging complex libraries, for displays with GUIs, audio, video, networking, machine learning, etc.

My long-term concern is we're building those complex libraries on top of storage infrastructure designed around accessing a single file at a time, and only with blocking APIs. Yes, I know modern cards have substantial latency for random read & write. But a fixed cache of only 1 data sector and 1 FAT/bitmap sector is only going to make the matter much worse when someone plays a sound clip while their display library needs to read a JPEG image and a web server library wants to read a html file. Having to wait that latency in a blocking call, rather than being able to request a read and get a callback when the buffer is filled....

I completely understand if you're not interested in supporting multiple file access. I know you've put a lot of work into achieving amazing single file performance with only 1 or 2 sector cache. Arduino is only belatedly embracing non-blocking APIs (their latest SD library got a non-blocking write only months ago) so there isn't a well established de-facto standard to follow.

I want to build these complex libraries and do so in a way where users can combine them together for their projects with the sort of ease where they can runs multiple programs on their PC. So I guess my main question is sort of about the general direction of SdFat looking into the near future?

@mjs513
Copy link

mjs513 commented Oct 9, 2021

Morning Bill
Do have an unrelated Question - kind of along the lines of what Kurt asked. Fair warning still get myself confused easily :) This all relates to getting the initial SdSpiConfg settings. So here goes.

If I run your SDInfo sketch (yes i do look at your test sketches) I can see how you can get config settings relatively easily. But I think that all presumes you specify them using:

#define SD_CONFIG SdSpiConfig(SD_CS_PIN, SHARED_SPI, SD_SCK_MHZ(16))
so you are specifing that SD_Config has the same struct (i think this is right verbage) as SdSpiConfig (was looks at the SdSpiDriver where in the begin it setup:
SdSpiConfig spiConfig
Now the next question is I can't seem to figure out a way to get the config information if I do begin like this:
begin(SdSpiConfig(csPin, SHARED_SPI, SD_SCK_MHZ(16)))
Tried a couple of things to use spiConfig directly but that's not accesible.

So my question is if there is anyway to get the config info using the begin with SdSpiConfig specified in the begin? and I guess the follow since I have the feeling is going to be nope - help how can I do a mod - locally of course to get it?

Oh all this is to try and do what you suggested - throw it into dedicated before formatting and then put it back in shared when done but need the original settings to know what config to put it back to. Not sure this is the final way to go but what to try it. Really curious - so far seems to work but this is the last piece of the puzzle.

PS. Love the explanation on the a2 cards. Man that's a lot going into the future with pcie.

Thanks
Mike

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

My long-term concern is we're building those complex libraries on top of storage infrastructure designed around accessing a single file at a time

I have been here long ago. Physicists often design new things and hire Programmers/Engineers to implement the designs in big experiments. I was involved with a disk exec for early Cray super-computers. I was at UCB when the BSD UNIX disk cache was developed and helped with tests. It's amazing how powerful memory is for filesystem performance.

The answer is always memory. You move I/O out of the filesystem and make the file system only access pages. The file system does adaptive read ahead and write behind the filesystem can even be a user process. You don't do async calls to drivers in the filesystem layer. Drivers and the paging system use threads or lightweight tasks in the kernel that do context switches based on events/interrupts.

Too bad there is no good free RTOS for a base. Poor Arduino is trying to use mbed. mbed evolved from an OK kernel but the HAL layer is a hodgepodge of wrappers around bad company software.

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

Now the next question is I can't seem to figure out a way to get the config information if I do begin like this:
begin(SdSpiConfig(csPin, SHARED_SPI, SD_SCK_MHZ(16)))
Tried a couple of things to use spiConfig directly but that's not accesible.

The parts of the configuration in the sd.begin(config) call are not saved in a single place. SPI port and SPI speed are saved here at about line 84 of SdSpiArduinoDriver.h

 private:
  SPIClass *m_spi;
  SPISettings m_spiSettings;

The shared/dedicated SPI mode is currently saved here at about line 358 of SdSpiCard.h
bool m_sharedSpi = true;

You can't just change and restore these items and I am in the process of changing the structure of shared/dedicated SPI.

SdFat-beta now has two classes to implement SdSpiCard.

#if HAS_SDIO_CLASS
class SharedSpiCard : public SdCardInterface {
#elif USE_BLOCK_DEVICE_INTERFACE
class SharedSpiCard : public BlockDeviceInterface {
#else  // HAS_SDIO_CLASS
class SharedSpiCard {
#endif  // HAS_SDIO_CLASS

and
class DedicatedSpiCard : public SharedSpiCard {

In short I maintain SdFat for simple users who don't modify internals. sd.begin(config) will continue to work but your mods may not.

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

Yup, I've seen those new cards. My relatively new Canon camera uses them.

Do you have CFexpress or SD Express?

@mjs513
Copy link

mjs513 commented Oct 9, 2021

Thanks Bill.
Actually wasn't looking to set them directly but to save them and then reconfigure with something like:
begin(SdSpiConfig(SD_CS_PIN,Oldsharedordedicated , oldmckSpeed)

CsPin wouldn't change of course.

Thanks for your help was driving me crazy, Will have to look at your beta2.1.1-beta to see whats coming down the pike

@PaulStoffregen
Copy link

PaulStoffregen commented Oct 9, 2021

Pretty sure these aren't CFexpress. I believe the higher end camera uses that type.

image

@greiman
Copy link
Owner

greiman commented Oct 9, 2021

Those are UHS-II SDs. UHS-II can do upto 312MB/s.

The new SD Express cards use the PCIe bus at 3940MB/s for PCIe Gen.4 × 2 Lane.

I have a PC that I use for AI with Gen.4 SSDs with up to 6600MB/s sequential reads. It does a full backup at 3000MB/s in less than one minute.

@PaulStoffregen
Copy link

Wow, that's pretty amazing speed.

@PaulStoffregen
Copy link

PaulStoffregen commented Oct 11, 2021

I believe we have found a solution to the SHARED_SPI performance problem with FAT32 freeClusterCount() and format(). Code is on this branch:

https://github.com/PaulStoffregen/SdFat/tree/writeSectorsSame

This adds a readSectorsCallback() and writeSectorsCallback() to BlockDeviceInterface, to allow use of fast read multiple and write multiple sectors of any length with only a single 512 byte buffer. A callback function is used to refill the buffer before writing each sector, or to make use of the buffer after reading each sector.

The performance with SHARED_SPI becomes approx the same as DEDICATED_SPI on these long operations, and it probably is much better for internal wear on the SD card.

@PaulStoffregen
Copy link

Not sure if readSectorsCallback() and writeSectorsCallback() would be a welcome addition to SdFat... but if so, would be happy to make any needed changes and send a pull request.

Or maybe this could be considered for the redesign version?

@greiman
Copy link
Owner

greiman commented Oct 11, 2021

I think the simple answer to slow shared SPI for format or other cases like scan the FAT or bit-map could be a switch between shared and dedicated SPI mode.

This would allow a section of fast SD I/O when you can assure the SPI bus won't be accessed by another device. A callback won't change the fact that a transfer is killed if CS is raised.

Something like this for the SdCard classes when dedicated/shared SPI is enabled in SdFatConfig.h :

  bool isShared();  // return true if in shared mode.
  void setShared(bool mode);  // Set shared/dedicated mode for true/false.

Your code is dead since I already completely changed how dedicated/shared SPI works in SdFat-beta.

Edit: actually I will need an API that allows for the case that only shared SPI is supported for some Uno users. probably a fail return for the set mode.

@PaulStoffregen
Copy link

No worries about abandoned code. If the next SdFat offers a better way, happy to use it.

But we're not even using this from outside the library. It's just edits in FatLib to speed up format and freeClusterCount when shared SPI is used.

No other SPI device should access the SPI bus if SPI.beginTransaction() was called and the code hasn't returned to the main program.

@FrankBoesing
Copy link

FrankBoesing commented Oct 12, 2021

Thought I just try to play 14 files.
Works good.

@PaulStoffregen
Copy link

@KurtE - Maybe time to close this issue?

We now have a workaround for the original 2 shared SPI performance problems. Seems likely future SdFat will make that workaround unnecessary with an API to switch SPI modes.

Bill and Frank proved enough performance exists to play multiple audio files concurrently.

Sounds like only path to eliminating the read latency from hard real time DSP work looks a full RTOS to build a 2nd data reading thread which uses SdFat's blocking API to in turn provide non-blocking service to the DSP thread. Can't say I'm excited about that answer, but it seems to be the official answer and I really don't wish to argue any further.

@KurtE
Copy link
Author

KurtE commented Oct 12, 2021

Thanks, Yes we addressed the main issues, I mentioned.

Other issues should probably be put into a new thread specific to those issues.

Thanks all

@KurtE KurtE closed this as completed Oct 12, 2021
@greiman
Copy link
Owner

greiman commented Oct 13, 2021

I am now testing the changes to allow switching to dedicated SPI to optimize format() and freeClusterCount(). Here is an example fix for freeClusterCount().

  uint32_t freeClusterCount() const {
    bool switchSpi = hasDedicatedSpi() && !isDedicatedSpi();
    if (switchSpi && !setDedicatedSpi(true)) {
      return 0;
    }
    uint32_t rtn = Vol::freeClusterCount();
    if (switchSpi && !setDedicatedSpi(false)) {
      return 0;
    }
    return rtn;    
  }

Users can optimize their code with these calls.
sd.hasDedicatedSpi(), sd.isDedicatedSpi(), sd.setDedicatedSpi(value)

@greiman
Copy link
Owner

greiman commented Oct 13, 2021

Here is the result of placing a call to sd.setDedicatedSpi between tests in the bench example:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
586.44,6655,686,872
5144.03,7265,98,99

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1233.96,1069,295,414
5219.21,101,97,97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants