Skip to content

Bitbanged SD-Card interface for the NEC V25 port#2445

Merged
ghaerr merged 5 commits intoghaerr:masterfrom
swausd:master
Nov 7, 2025
Merged

Bitbanged SD-Card interface for the NEC V25 port#2445
ghaerr merged 5 commits intoghaerr:masterfrom
swausd:master

Conversation

@swausd
Copy link
Copy Markdown
Contributor

@swausd swausd commented Nov 7, 2025

Implementation of a bitbanged SPI interface using 4 pins on a parallel port of the NEC V25. The port is based on the SSD-SD block driver and the 8018x SPI interface created by @cocus. This interface is rather slow - feels like a floppy disk - as every transfered byte has to go through about 80 assembler commands.

Some existing files of the NEC V25 port had to be reformated.

Picture of console output with 2GB SD card, FAT16, no partition:
ELKS-necv25-2

swausd added 5 commits November 7, 2025 18:30
While preparing the SD-Card interface for the NEC V25 port I noticed that //-style comment at the end of #defines leads to problems in inline assembler. So include files and dependent source files had to reformated.
Reformating because of //-style comments. Setup of "struct serial_info" for default 9600 baud rate. Prevents artefacts on first baud rate switching from already 9600 baud to again 9600 baud.
Two possible clock frequencies are documented in the comment for CONFIG_NECV25_FCPU. Both options lead to dividers for minimal baud rate errors.
These are the changed files to integrate the SD-Card interface for the NEC V25 Port into ELKS.
This file contains the bitbanged SPI interface using 4 highest bits of the into the NEC V25 integrated Port 2. Assembler opcodes specific to the NEC V20/V25 are used (SET1 and CLR1) to set and clear single bits. Using these atomic commands eliminate the use of CLI/SPI to prevent interrups.
@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 7, 2025

Well, @swausd - this is pretty dang impressive!! Even though you're saying it's slow, it's very cool.

I'm thinking at some point I'll rename ONFIG_BLK_DEV_SSD_SD8018X to CONFIG_BLK_DEV_SSD_SPI to be less confusing, then use the architecture to select the specific SPI driver file(s). But this is fine for now.

I notice you've changed the CONFIG_NECV25_ FCPU from 1.4Mhz to 2.2Mhz... is this what you want? I thought you said the system couldn't handle the overclocking?

Also, are you running out of ROMFS space using CONFIG_APP_xxx defines trying to find the right set of applications? I'm wondering whether we should perhaps use a new tag in Applications instead for this architecture? What specific applications are you trying to add/subtract in the limited ROMFS filesystem, and how big exactly is it?

I'll wait for your answer on CONFIG_NECV25_FCPU before committing.

Thank you!

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 7, 2025

Also, any ideas why a retry is required on going to IDLE at startup? Do you suppose the hardware requires a bit more time after initialization, or is this perhaps required for all commands? I'm not familiar with SPI or this driver (yet) so I'm just wondering out loud...

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 7, 2025

Also fat support in ELKS is quite slow compared to Minix. Just switching to Minix might give you 20-40% boost in IO.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 7, 2025

Hi @ghaerr ,
About clock: I tried different chips, and some can handle the 22 Mhz (CPU runs with half of that). Oddly the chip with the highest clock marking (8 MHz, which means 16 MHz crystal) can‘t handle it. Older chips without specific markings (should run at 5 MHz) can be overclocked. NEC sold them with up to 10 MHz specification. I think, at some time they startet selecting and marking better chips.
So yes 22MHz is what I want.

About ROMFS: I selected the apps, that fit into the space I have, and which are most likely used after boot. Everything else can go to the SD card. I hope this is a good match for now.

About retry: I tried to give the SD card way more time after the go idle command. But thad doesn’t help. One retry fixes that. May be, this is specific to my cheap 2GB SD Cards.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 7, 2025

Also fat support in ELKS is quite slow compared to Minix. Just switching to Minix might give you 20-40% boost in IO.

@toncho11 , thanks for the advise. Did not try minix fs. Right now, I do not know how to create and use such fs with ubuntu.

@ghaerr ghaerr merged commit cc0e1a3 into ghaerr:master Nov 7, 2025
@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 7, 2025

I do not know how to create and use such fs with ubuntu.

We have an mfs tool that will both format the image and add files. It lives in tools/mfs. Take a look at image/Make.images to see how it's used to create our MINIX images using mfs mkfs ... and mfs genfs ... etc. None of this requires any loopback filesystems, root, or that kind of stuff.

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 7, 2025

This interface is rather slow - feels like a floppy disk - as every transfered byte has to go through about 80 assembler commands.

Have you measured the clock of your SPI? I've got 200 and something khz on my 20MHz SBC

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 8, 2025

@swausd You use the first serial interface SIO0 (I suppose) for communicating with your Linux computer through USB. But there is a second one SIO1. Can this serial SIO1 (in synchronous master mode) be used to communicate with the SD card? In this case speed can reach ~0.5–1 MHz clock.

For example:

V25 (SIO1)                SD card
-------------------------------------
TxD1  ------------------>  DI (pin 2)
RxD1  <------------------  DO (pin 7)
CLK1  ------------------>  CLK (pin 5)
P20   ------------------>  CS  (pin 1)
GND   ------------------   GND (pins 3/6)
+3.3V ------------------   VDD (pin 4)

You must handle level shifting on at least the signals going from the V25 to the SD card (MOSI, SCK, CS).

Here are some potential functions to be used with a new such driver:

#include <stdint.h>
#include "v25_regs.h"   // define register addresses for your system

#define SIO1_BASE   0xFF30
#define S1MOD       (*(volatile uint8_t*)(SIO1_BASE + 0x00))
#define S1CON       (*(volatile uint8_t*)(SIO1_BASE + 0x01))
#define S1BUF       (*(volatile uint8_t*)(SIO1_BASE + 0x02))
#define S1STS       (*(volatile uint8_t*)(SIO1_BASE + 0x03))
#define S1CLK       (*(volatile uint8_t*)(SIO1_BASE + 0x04))

// Example GPIO for chip select (use your board's port)
#define PORT2       (*(volatile uint8_t*)0xFF20)
#define SD_CS_BIT   0x10  // P20 for example

static inline void sd_spi_cs_low(void)  { PORT2 &= ~SD_CS_BIT; }
static inline void sd_spi_cs_high(void) { PORT2 |=  SD_CS_BIT; }

// Wait macros
#define WAIT_TX_READY()  while(!(S1STS & 0x01))  /* bit0 = TX ready */
#define WAIT_RX_READY()  while(!(S1STS & 0x02))  /* bit1 = RX ready */

// ------------------------------------------------------

void sd_spi_init(void)
{
    // Reset port pins
    sd_spi_cs_high();

    // Configure SIO1 for synchronous master, 8-bit, MSB first
    S1CON = 0x00;        // Disable channel
    S1MOD = 0x40;        // SYNC mode, master (bit6=1)
    S1CLK = 0x02;        // Divider: Fcpu / 8  (adjust for ~500kHz–1MHz)
    S1CON = 0x15;        // Enable TX/RX, internal clock

    // Clear status flags
    (void)S1STS;
}

// ------------------------------------------------------

uint8_t sd_spi_transfer(uint8_t out)
{
    WAIT_TX_READY();
    S1BUF = out;          // Send one byte
    WAIT_RX_READY();
    return S1BUF;         // Return received byte
}

// ------------------------------------------------------

void sd_send_command(uint8_t cmd, uint32_t arg)
{
    sd_spi_cs_low();
    sd_spi_transfer(0x40 | cmd);
    sd_spi_transfer(arg >> 24);
    sd_spi_transfer(arg >> 16);
    sd_spi_transfer(arg >> 8);
    sd_spi_transfer(arg);
    sd_spi_transfer(0x95);      // Dummy CRC for CMD0
    sd_spi_transfer(0xFF);      // Extra clock
    sd_spi_cs_high();
}

Clock polarity and timing

The V25 synchronous serial in master mode uses a fixed clock polarity (rising edge sample).
The SD card expects SPI mode 0 (CPOL=0, CPHA=0) by default — this matches most V25 implementations.
If you find data shifted by one bit, you can invert the clock phase in software by toggling the “edge” bit in the SIO1 mode register.

I suppose you already did the voltage leveling, but still:

Signal V25 Pin SD Pin Direction Level Shifting?
MOSI TxD1 2 (DI) Out Yes (5V → 3.3V)
MISO RxD1 7 (DO) In Usually OK direct
SCK CLK1 5 (CLK) Out Yes (5V → 3.3V)
CS P20 (GPIO) 1 (CS) Out Yes (5V → 3.3V)
VDD 3.3V Reg. 4 (VDD) Power
GND GND 3,6 (GND) Ground

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

Hi @toncho11,
for serial communication with my PC I use serial 1 of the V25. Serial 0 has some additional special functions like synchronous IO. Thats why I spared it out for now. I used the bitbanging approach first, because there was the good and functional example of @cocus and serial 0 could be user as 2nd serial port in ELKS. Did expect it to be slow, but not this slow. Just now I am using my oscilloscope to find out what happens on the SPI bus and mesure the bit timing as @cocus has advised in his last post.
On first impression there is not always communication going on. Looks like the CPU has to struggle with the big (2GB) fat filesystem. You already advised to have a look at minix fs. I will also try to create a much smaller fs. At least I don't know what to put on a 2GB storage anyway ;-)

Can this serial SIO1 (in synchronous master mode) be used to communicate with the SD card? In this case speed can reach ~0.5–1 MHz clock.

I reread the V25 datasheet this morning, but there are still some questions left unanswered. For example there are different clock lines for Rx and Tx. You mentioned the sample timing and polarity of the clock. I understand, that this can not be changed in the V25 by configuration. May be external hardware has to be added. I will look into this after trying different file systems. My problem is, I only have the V25 datasheet. Application notes or examples for the V25 are very sparse on the internet.

I suppose you already did the voltage leveling

Yes, I use one of the cheap interface boards usually used with Arduino. This one has a micro SD holder, 3.3V regulator and level shifter on board.

I should even try switching the cheap 2GB SD-Card for a newer sandisk card.

Still a lot to do. But that is the way we want it, don't we. If all is fixed, there is nothing left to do. My sons always ask me: Why are you doing all this? Do you have a use case? No I don't have a use case. I do it just for fun :-)

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 8, 2025

Yes, in dos when doing dir you can usually have a very long pause until the free blocks are calculated etc. That is why I use small partitions of 30MBs. So a similar problem can exist in ELKS, I was already thinking to suggest that in my previous post.

Minix 20 MB is the best.

In the end fast IO is not needed that much because programs are small to load and you usually have everything written on the SD card by your host computer. Speed is needed when we will have a swap file for example. Then it becomes critical.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

@ghaerr , @toncho11 @cocus ,
I checked the different cycle times with an oscilloscope:
6.5 usec = 155 KHz per 1 SPI bit cycle
50 usec = 20 KHz per 1 single SPI byte cycle
77 usec = 13 KHz per 1 byte in a block transfer (this is the optimum: in real life add time for reading the data + preparing the data + managing the file system - see below)

I then tested the time to write my boot ROMFS, 426 block, to a newly formatted, empty SD-Card "cp -v /bin/* /mnt" with the following results:
120 sec = 3,5KB per sec for FAT16 fs
55 sec = 7,7KB per sec for minix fs (this is double of the optimum time mentioned above).
So @toncho11 was right: FAT16 is significantly slower than minix!

I also checked different SD cards (cheap old 2GB noname vs new 64 GB sandisk). The difference is not really noticeable.

So next I check if and how I can use the 2nd serial port, serial0, for SPI. This port has a special synchronous mode.

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

FAT16 is significantly slower than minix!

I knew it was slower, but didn't think it was THAT much slower!! I think the main reason is that the FAT filesystem driver is pretty much a 512-byte sector-oriented FS hacked on top of a Linux 1K block-oriented buffered I/O system, along with LFN directory searches being inefficient. I've been meaning to implement some kind of kernel profiling so that we can learn more exactly where bottlenecks might be.

Also I think on very slow systems like the 8088 and V25, (buffer) data copying and the buffered I/O system starts becoming a major portion of kernel time spent doing things, compared to other kernel activity. Changing the number of EXT buffers allocated can help, but can't overcome slow memory copy speeds. For instance, in a buffered I/O system, the kernel will always perform I/O into a system buffer, then keep that buffer while copying the data to the application, rather than performing the I/O directly to an application buffer. When the filesystem data is spread out and buffer count small, this slows things down considerably.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

@ghaerr
I created a minix filesystem with sudo ./mfs -v /dev/sdc mkfs -1 -n14 -i2048 -s16384 using the elks/elks/tools/bin/mfs directly on a SD-Card. Looks like reading and writing on my V25 ELKS system is working ok.
When I do fsck -vl /dev/ssd on the umounted device in ELKS, I get the error: fsck: unable to read super block

Should this work or do I have to use a special version? sudo ./mfsck -vl /dev/sdc from elks/elks/tools/bin on ubuntu seems to work as expected showing a file list and statistics.

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

Should this work or do I have to use a special version?

This should work. Looking at the disk_utils/fsck.c source code:

        if (BLOCK_SIZE != read(IN, super_block_buffer, BLOCK_SIZE))
                die("unable to read super block");

It appears that read is failing to return BLOCK_SIZE, which is 1024. I would bet it's a negative error code instead. You'll have to add a printf(..., errno) to see what is happening. (Add the revised bin/fsck to the SSD card to avoid reflashing ROMFS).

Should this be the case, it points to a possible problem in the /dev/ssd driver, when reading multiple sectors. The upper layer of the SSD driver will convert this into two sector reads, if I'm not mistaken. You might add printk's to the SSD driver to trace return values, or easier possibly just run hd /dev/ssd | more and see that shows both being the same in ELKS and on the host. (hd is now built on the host and should be in PATH: tools/bin automatically).

After learning a bit more, I can continue to help debug what is happening.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

Thanks for the quick answer. Will look into this tomorrow.

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 8, 2025

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 8, 2025

@swausd Here is one possible configuration. I consulted with AI.

The SPI protocol uses a single clock line (SCK), shared between master and slave.
All data (MOSI and MISO) are synchronized to that same clock.

So — when you adapt the V25’s SIO1 to talk to an SD card via SPI — you must make the transmit and receive clocks the same.

That means:

Use the TxC (transmit clock output) as the SPI SCK line to the SD card.

Tie the RxC pin to the same line, so receive shifts occur on the same clock edges.

Hardware connection summary:

        NEC V25 SIO1                     SD Card (SPI mode)
        -----------------------------    -------------------
        TxD1  ------------------------->  DI (MOSI)
        RxD1  <-------------------------  DO (MISO)
        TxC1  ------------------------->  CLK (SCK)
        RxC1  ------------------┐
                                └------- (tied to TxC1)
        GPIO (e.g. P20) -------->  CS (chip select)
        GND   ------------------->  GND
        3.3V Reg ---------------->  VDD

You make the V25 synchronous master, so TxC1 provides the clock, and RxC1 sees the same clock edges for incoming bits.

Configuration summary (SIO registers):

  • Mode register: synchronous, 8-bit, master mode

  • Clock source: internal (TxC output)

  • RxC: configured as input, tied to TxC

  • Chip select: controlled by GPIO (manual toggle before/after each block)

This way, transmit and receive operations are fully synchronized like SPI.

Yes, the V25 has separate Tx and Rx clock pins — but for SPI communication, you must tie them together and use the transmit clock (TxC) as the shared SPI clock. That way, both Tx and Rx shift on the same edges, emulating SPI correctly.

So we have full-duplex SPI-style communication, even though:

  • TxC is only providing the clock,
  • TxD is sending data, and
  • RxD is receiving data on the same clock edges.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

And you use this

Yes!

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

@toncho11 , many thanks for the input!

Yes, the V25 has separate Tx and Rx clock pins — but for SPI communication, you must tie them together and use the transmit clock (TxC) as the shared SPI clock. That way, both Tx and Rx shift on the same edges, emulating SPI correctly.

Sounds plausible! I will look into this, after I have checked the fsck problem (see above)

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

You might add printk's to the SSD driver to trace return values,

Also I've just noticed that replacing debug_blk with printk in ssd.c and debug with printk in ssd-sd.c (and some printk's in sd_read) will give lots of info without too much work, you might start with that, although a ROMFS re-flash will be required.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

Hi @ghaerr ,
could not resist to make a last try for today. Did a printk() on entry of nearly every function of ssd.c and ssd-sd.c, changed debug to printk in ssd.c and changed fsck.c read_tables() to

int ret = read(IN, super_block_buffer, BLOCK_SIZE);
printf("read_tables %d\n", ret);
// if (BLOCK_SIZE != read(IN, super_block_buffer, BLOCK_SIZE))
die("unable to read super block");

This is what I get on boot:

ELKS Setup START
32K ext buffers, 8K cache, 1 req hdrs
ssd_init enter
ssddev_init enter
sd_initialize enter
sd_cmd_go_idle enter
sd_send_cmd enter
sd_send_cmd enter
sd_check_version enter
sd_send_cmd enter
sd_leave_idle enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_send_acmd enter
sd_send_cmd enter
sd_send_cmd enter
sd_is_hc enter
sd_send_cmd enter
sd_read_csd enter
sd_send_cmd enter
sd_wait_for_token enter
sd_get_capacity_from_csd enter
ssd: 2013265920K disk
NECV25 machine, cpu 3, syscaps 0, 512K base ram, 16 tasks, 64 files, 96 inodes
ELKS 0.9.0-dev (54224 text, 0 ftext, 5776 data, 2496 bss, 57262 heap)
Kernel text e062 data 80 end 1080 top 8000 446+0+0K free
VFS: Mounted root device /dev/rom (0600) romfs filesystem.
Running /etc/rc.sys script
Sorry, clock not supported on this system.
Thu Jan 1 00:00:02 1970

ELKS 0.9.0-dev

[2.18 secs] login: root

# fsck -vl /dev/ssd

SSD: open
read_tables 0 <<------------ return value from int ret = read(IN, super_block_buffer, BLOCK_SIZE);
fsck: unable to read super block
SSD: release
#

I could not find out which function is called by fsck read(IN, super_block_buffer, BLOCK_SIZE)

Any suggestion for further research?

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

Ok @swausd, I can see you're getting as addicted to this stuff as I am. So read is returning EOF, thanks for finding that. Let me take a deeper dive at this while you sleep. It appears that the read data might be being copied from a system buffer since there is no actual sd_read taking place after opening /dev/ssd. This seems strange since all buffers from /dev/ssd should have been marked invalid. You're not booting from /dev/ssd, but instead from ROMFS, correct?

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

@swausd

Allright - I've figured it out: all block device drivers are required to set the block device's i_size field to the number of bytes contained on the (possibly dynamically changed) device. This is done in ssd.c using inode->i_size = ssd_num_sects << 9.

In your case, this is set from the ssd_init() function into ssd_num_sects from the result of the ssddev_init() sub driver function call. Also in your case, but a bug in general, your mounted SD card happens to have a huge number of sectors, which (shown from your debug output) is 2013265920K. Thus I think what is happening is that this large 32-bit value is being shifted left by 9 (to convert sectors to bytes) and ends up setting i_size to zero. Thus, the read in fsck fails and returns EOF, thinking you're beyond the media boundary!

Another kernel bug smashed!

Try testing this by forcing a much smaller return value from ssddev_init(). Let me think about a final fix, the value will have to be clamped in ssd.c such that a left shift << 9 won't overflow an unsigned long.

Thank you!

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 8, 2025

Booting from ROMFS after power on. And there was no read from ssd-sd.c

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 8, 2025

@swausd, I went ahead and fixed this in #2448 since other kernel source cleanup/changes were committed. You should be able to perform a git pull and everything should just work.

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 9, 2025

Yes, the V25 has separate Tx and Rx clock pins — but for SPI communication, you must tie them together and use the transmit clock (TxC) as the shared SPI clock. That way, both Tx and Rx shift on the same edges, emulating SPI correctly.

So we have full-duplex SPI-style communication, even though:

* TxC is only providing the clock,

* TxD is sending data, and

* RxD is receiving data on the same clock edges.

This might work. In fact, the 80c188 also has two UARTs and one (or both?) can work in this way. Too bad I've selected the UART0 for the main console, because that's probably the only one that supports synchronous. Should check, but please let us know.
The important thing here is that in synchronous mode, there's no start/stop bit, thus behaving way similarly to SPI. Bridging the TxC and RxC together should work.

HOWEVER, I'm not sure if the CPU can generate the TxC or if it needs to be provided outside. I think it might just be able to generate it from the main CPU clk.

Please let us know, since this might speed things up a LOT. Common SD cards can operate in 20MHz SPI mode without issues when your traces aren't that long (5cm/2" or less).

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 9, 2025

@fhendrikx
I was thinking if Solo86 can benefit from a SD card functionality, but if you have a CF card reader already then one just needs to use CF to SD card adapter. In this adapter you just plug a SD card in a CF wrapper.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

but if you have a CF card reader already then one just needs to use CF to SD card adapter. In this adapter you just plug a SD card in a CF wrapper.

Last year I build a system based on a 68000 variant, a 16 MHz 68EZ328 alias Dragonball, and ported CPM 68K to it. This CPU has a slow build in SPI interface. I wanted to go faster and added a 16 bit bus mapped CF interface. This was significant faster. Then I tried a CF to SD adapter on this CF interface and this was even faster than the sandisk Ultra II CF cards.

@fhendrikx
Copy link
Copy Markdown
Contributor

@fhendrikx I was thinking if Solo86 can benefit from a SD card functionality, but if you have a CF card reader already then one just needs to use CF to SD card adapter. In this adapter you just plug a SD card in a CF wrapper.

As you say, you can use one of those adaptors to use SD cards in a CF slot. TBH I can't remember testing it, but should work just fine.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

Hi @ghaerr

You should be able to perform a git pull and everything should just work.

I just did a git pull and a rebuild and everything just work! Many Thanks!
I am feeling like Marty McFly in the film "back to the Future". I mentioned a problem go to sleep and when I wake up everything is fixed. Looking at the time stamp it was even fixed before I mentioned it :-) When I am Marty, you must be Emmett "Doc" Brown commanding the DeLorean time machine.

ELKS-necv25-4

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

@toncho11, @cocus
I understand SD cards use SPI mode 0. According to my understanding of the datasheet the synchronous clock of the V25 would be SPI mode 3 (inversion and also phase shifted mode 0). For a quick first test I inverted the CLK signal in the current bit-banged software driver (but implemented no phase shift). As expected it did not work :-(

Despite @toncho11 finding:

If you find data shifted by one bit, you can invert the clock phase in software by toggling the “edge” bit in the SIO1 mode register.

In my real life CPU there is no edge configuration option. I think I have seen that NEC did an update on the V25 with changes among others to the serial port (added a fifo?). But that doesn't help. My CPUs are of the first variant.

I will check if a hardware inverter for the signal to the SD card CLK can fix this. But I think a flip flop gate is necessary. That would fill up board space, add hardware complexity and the second serial port is lost for ELKS. I then would like a different solution better, for example an ATmega coprocessor for a parallel bus to SPI interface.
I will report back if I have new knowledge.

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 9, 2025

@swausd Try to ask ChatGPT. I got:

From the µPD70320 datasheet:

TxC / RxC in synchronous mode behaves like CPOL = 1, CPHA = 1, i.e., SPI mode 3

There is no configuration option to invert edges or shift phase.

That means the V25’s synchronous serial will:

  • Idle high

  • Sample data on rising edges, but relative to a high idle (mode 3), not mode 0

SD cards in SPI mode 0 expect idle low, sample on rising edges.

Possible solutions:

  1. Bit-banged SPI (software clock on GPIO):

You can manually toggle SCK at the exact edges and idle state the SD card expects.

This gives you full control over CPOL/CPHA.

  1. Use a simple edge converter circuit:

Use a logic inverter or XOR gate to produce the correct CPOL/CPHA combination from the V25 clock.

Some people use a D flip-flop to shift phase by half a clock cycle, effectively converting mode 3 → mode 0.

  1. Use SPI mode 3 on SD card (if supported by card):

Some SD cards work with mode 3 for initialization, but not all.

Might work for later reads/writes, but risky for standard initialization sequence.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

Hi @toncho11,
thanks for your support! This is what I think:

  1. Bit banging is slow, so I don't think that this approach would be significantly better. And there remains the problem of synchronization of bit-banged clock and the clock of the serial transmitter (which only uses its own internal clock).
  2. I think an inverter alone does not solve the problem. We have to phase shift or delay the clock too! Flip-Flops or shift registers are fine for this, but only if you have a synchronous, continuous running through clock - which we don't have. May be I am wrong on this?
  3. That might work in special cases. But I don't want to rely on this. And I don't know how to persuade the SD card to switch SPI mode. May be a last resort if my current approach does not work.

Currently I try to invert and delay the serial Tx clock using five 7404 gates in series to the SD card + one 7404 gate to invert this clock back again for feeding to the serial Rx. Usually propagation delay is something we don't want. Here it could be the solution. With very high SPI clock the propagation delay is significant, but signal distortion as well.

Will report.

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 9, 2025

  1. I think an inverter alone does not solve the problem. We have to phase shift or delay the clock too! Flip-Flops or shift registers are fine for this, but only if you have a synchronous, continuous running through clock - which we don't have. May be I am wrong on this?

I didn't dive deep enough to check, but if the phase problem is greater than 1 data bit, what if a 9 bit serial is used and the first (i.e. the first one to depart the CPU) is always a dummy bit?

Otherwise for the idle state, yeah, a simple inverter should work. Not sure if that fixes the phase problem because I didn't check the CPU datasheet. It might be possible to create some additional hardware to make it work.

EDIT: sorry for the 2x post, I was using another account and forgot to switch to my personal one :)

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

@cocus
synchronous mode on the V25 serial0 port is fixed to 8 bits. There is no config option. On the oscilloscope it looks like in the datasheet.

So there is no simple option to phase shift 1/2 clock cycle / 180 degrees which (and the inversion) is necessary for SPI mode 0 to mode 3. That is my understanding, but may be I am missing something.

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 9, 2025

Yeah, I was looking at the datasheet, but I might be missing something. Have a look at the TX:
imagen

And also on the RX, there's no picture, but they're saying that the data is sampled at the rising edge of the RxC.
imagen

As far as I understand, this is Mode 0 (CPOL=0, CPHA=0), not 3 (CPOL=1, CPHA=1) for both.
7qbNq

The only difference is the idle state of the clock, but the edges are pretty well aligned to mode 0

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 9, 2025

@cocus
Look at your first picture from the datasheet with the Tx signal. Tx clock is fed to the SD card and to Serial0 Rx. If /sck0 is inverted, the first rising edge is before D7 is valid. This can‘t work.SCK0 has to be inverted and to be slowed down 1/2 clock cycle.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 10, 2025

@toncho11 , @cocus ,
I have kind of a working system using synchronous mode of serial 0. Looked up what Elm Chan has to say about SPI and SD cards. He states, SD cards use SPI mode 0 but nearly all also work with mode 3 (as @toncho11 stated above and I ignored…). So I worked on the software to use hardware SPI and set converting mode 3 to mode 0 on hold.

Problem I have now is, it works with max. „baud rate setting“. As soon as this baud rate is reduced, the receiver reports wrong values - reproducible. On the oscilloscope everything looks good. Receiver, transmitter and clock all shows the expected values and in sync, Signals look clean too. But the receiver thinks different :-( So there is some work to do.

Speed has doubled compared to best values before. Copy /bin from ROMFS to SD card now is done in 29 sec. Not bad. May be it can be pushed further with some optimisation using inline assembler in the block read/write routines.

I will report…

@toncho11
Copy link
Copy Markdown
Contributor

The usual culprits are:

  • sampling edge vs data setup/hold mismatch (CPHA/CPOL or phase shift),

  • clock-source/divider quantization or fractional error (baud generator mismatch), or

  • input filtering/synchroniser timing that behaves differently at different frequencies.

Fixes: verify both ends use the same clock domain, measure exact timing of sample instant vs MOSI, and add a small programmable delay (or hardware phase-shifter) so the receive-sample point sits squarely in the data valid window.

Alternatives:

Type Description Typical parts
CPLD (Complex Programmable Logic Device) A tiny programmable logic chip that can modify, delay, or invert digital signals in real time. Works like a “hardware glue logic box.” Lattice MachXO2-1200, Xilinx XC9572XL, Lattice iCE40HX1K
MCU (Microcontroller) A small programmable CPU (e.g. AVR, PIC, STM32) that receives data from the V25 on the serial port, performs proper SPI communication with the peripheral, and sends results back. ATmega328 (Pro Mini), ATtiny85, STM32F0/F1

@toncho11
Copy link
Copy Markdown
Contributor

                  +-----------------------------------+
                  |        NEC V25 (µPD70320)         |
                  |                                   |
                  |    TxD1 ----->--------------------|--> Serial Data Out
                  |    RxD1 <-----<-------------------|<-- Serial Data In
                  |    TxC1 ----->--------------------|--> Serial Clock Out
                  |    GND ---------------------------|---+
                  |    VCC (5V) ----------------------|---+-------+
                  +-----------------------------------+           |
                                                                   |
                                                                   |
                    Simple 3-wire synchronous serial @5 V          |
                                                                   |
                                                                   v
                 +------------------------------------------------------+
                 |                    ATtiny85 (bridge MCU)             |
                 |                                                      |
                 |  (UART/SYNC interface to V25)                        |
                 |  PB0 (MOSI/DI)  <----  RxD1 (V25 Out)                |
                 |  PB1 (MISO/DO)  ---->  TxD1 (V25 In)                 |
                 |  PB2 (CLK/SCK)  <----> TxC1 (shared clock)           |
                 |                                                      |
                 |  (SPI interface to SD card, 3.3 V)                   |
                 |  PB3 (MOSI)  -------------------------->  SD-DI      |
                 |  PB4 (MISO)  <--------------------------  SD-DO      |
                 |  PB5 (SCK)   -------------------------->  SD-CLK     |
                 |  PB1 (reuse or extra GPIO) ------------>  SD-CS      |
                 |                                                      |
                 |  VCC = 5 V                                           |
                 |  GND = common ground                                 |
                 +------------------------------------------------------+
                                  |   |   |   | 
                                 [LVC level shifters 5 V→3.3 V] 
                                  |   |   |   |
                                 SD-DI SD-CLK SD-CS SD-DO(3.3 V)
Path Protocol Function
V25 ⇄ ATtiny85 Simple synchronous serial (custom command protocol) Commands like “read block,” “write block,” “status”
ATtiny85 ⇄ SD SPI mode 0, 3.3 V Actual SD transactions

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 10, 2025

Speed has doubled compared to best values before. Copy /bin from ROMFS to SD card now is done in 29 sec. Not bad. May be it can be pushed further with some optimisation using inline assembler in the block read/write routines.

Fantastic progress going from 55 secs to 29 secs using MINIX filesystem and mode 3. Are you still going to use Mode 0 for initialization, as was mentioned above, or is the whole driver now running Mode 3? I'm wondering if there's a way for the init routine to determine whether the SD card supports Mode 3, or whether we should even be concerned about that?

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 10, 2025

To be fair, I don't think adding a MCU just to do the phase shift is a good idea. For instance, I remember that some PIC microcontrollers can be used in parallel bus systems as slave devices. Most of them have SPI transceivers. In fact, I bet there are commercial chips that can work in parallel bus systems and are SPI master devices (I've seen I2C ones at least, but they have to exist).
Not sure if adding a simple capacitor delay for the clk might do the trick. I don't think so :(. In any case, there are some other alternatives but all of them require some changes on the core design of the board.

From my point of view, maybe the "high" baud rate works due to the fact that the wiring might add an additional delay that's just "good enough" for the SD. Just speculating. but excellent work!

@toncho11
Copy link
Copy Markdown
Contributor

I like the idea of ATtiny85 because it is a single chip solution. But indeed it does not have real UART and SPI. While the SPI approaches the real thing, the serial must got through bitbanged serial. It will also be read byte on serial, write it on SPI. The final speed will not be impressive.

Limited throughput:

  • Max speed for software UART is ~19.2 kbps at 8 MHz. Faster causes bit errors.
  • Hardware SPI is faster (~MHz), but software UART is the bottleneck.

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 10, 2025

Or maybe ATmega328PB TQFP - it is both small enough 7mm x 7mm and has hardware UART and SPI.

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 11, 2025

Hmm what about a kind of mountable driver (similar or Minix/FAT) over the serial port? When you mount remote serial 0 /dev/rs0 then it uses the serial port to connect to a host modern computer. The modern computer exposes its file system through a protocol and a host program written in C, .NET, Java or Python.

This way we get "networking" and it avoids the entire sd card need. You use the serial port 1 for communication and serial port 0 for file transfer (as before). It saves you the need to copy to the SD card and eject, put the SD card each time you want to add a new file!

Both the sd card and this idea of a serial mountable driver are limited by the same UART speed of the CPU.

@cocus
Copy link
Copy Markdown
Contributor

cocus commented Nov 11, 2025 via email

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 11, 2025

what about a kind of mountable driver (similar or Minix/FAT) over the serial port?

NFS over SLIP does that. NFS and most all other remote filesystem protocols require UDP (instead of TCP), since it is connectionless and so much quicker than TCP.

Technically that's been done before, with protocols lile slip. Does ELKS
supports it?

Aside from it would be very very slow, first one would have to write an NFS (or other) filesystem for ELKS from scratch, and secondly, UDP would have to be implemented in the ELKS TCP/IP stack - a huge in both programming effort and memory usage.

Even moving the "SPI" SD implementation through a UART serial port/driver would end up increasing the number of instructions executed per byte sent from the current 80+ to 500-1000+. So, after it's all done you'd probably be looking at 11+ minutes to transfer /bin to or from another system. I haven't even considered how a lower baud rate would perform, ELKS maxes out at ~9600 baud on early CPUs before getting UART receiver overrun errors due to ELKS interrupt overhead. We do have the CONFIG_FAST_IRQ4 option for bypassing kernel overhead that allows baud rates up to ~38.4K, depending on the UART FIFO size.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 11, 2025

@ghaerr :

Are you still going to use Mode 0 for initialization, as was mentioned above, or is the whole driver now running Mode 3?

Only mode 3 is used. And no slow down for initialization, it runs at the max clock.

I'm wondering if there's a way for the init routine to determine whether the SD card supports Mode 3, or whether we should even be concerned about that?

Don't know how to get this info. Didn't see this mentioned on Elm Chan site.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 11, 2025

@toncho11, @cocus :
I like adding mass storage to the V25 MCU with available resources. Any additional processor adds hardware complexity and size. That's why I dropped the idea of interfacing a CF card. It would require adding IO address space to my system (right now the memory address space is filled up 100%) and would require 3.3V level shifting. So more size, hardware complexity and only a small addition in speed because of only an 8 bit interface.

About parallel to SPI: I have seen multiple discussions on the web why there isn't an available chip doing this like for I2C. But there really seems to be none.

Right now the speed of SD cards with my V25 is OK with some potential for optimization.

@toncho11
Copy link
Copy Markdown
Contributor

toncho11 commented Nov 11, 2025

1] I am suggesting a driver that does work over pure serial port. No network packets packaging. You define the protocol. You can ask chatgtp to define you the protocol - all commands and their parameters. It is suggesting something similar to FTP optimized over serial. Then with some effort you can ask it to write the ELKS driver and the Linux host program.

2] I see that Plan 9 has a protocol called 9P. A reduced set of this protocol can also be used for this task:

[ELKS] <—UART—> [Linux host running u9fs]

You can simply redirect u9fs I/O to /dev/ttyS0.
Your ELKS driver will then send binary 9P messages (like Tversion, Twalk, Tread), and u9fs will respond.
This needs no IP, TCP, or SLIP — pure serial bytes.

@swausd
Copy link
Copy Markdown
Contributor Author

swausd commented Nov 12, 2025

Hi @ghaerr;
I finished work on the hardware SPI driver for my V25 port. These are the results:

  1. Driver can only use SPI Mode 3 (hardware limit of V25) on serial port 0.
  2. Clock is configurable by setting prescaler value in the driver code. Default CPU clock / 2 which is the max clock usable. For my current system that is 5 MHz. With the bit-banged driver it was about 128KHz.
  3. I introduced two new assembler functions for block read/write and changed ssd-sd.c accordingly.
  4. To be minale invasive I used a new config variable CONFIG_HW_SPI set in config.h, but you may have a better idea how to do this.
  5. Minix filesystem should be used as this is much faster.
  6. The driver works with all the different SD cards i have (3 different brands, two noname and sandisk compact extreme)
  7. Some benchmarks with minix FS:
    7.1 cp /bin/* /mnt/d1 (60s/17s/12s) times in seconds for (bitbanged any SD / noname SD / sandisk extreme SD)
    7.2 cp /mnt/d1/* /mnt/d2 (100s/24s/21)

The bit banged driver should only be used if the serial port 0 is not available and for now has to be manual selected in the Makefile.
Please look at the commit on my elks git repository and tell me what changes are necessary for a pull request.

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Nov 12, 2025

Hello @swausd,

I finished work on the hardware SPI driver for my V25 port.
Please look at the commit on my elks git repository

Fantastic work @swausd, as usual! Thank you! You've done an amazing job showing 5x speedup using CONFIG_HW_SPI and your new hardware SPI driver. I like how CONFIG_HW_SPI now allows either software or hardware SPI to be used.

I would suggest the following minor changes from your current repo, then yes, please post a PR:

  • Add ifdef CONFIG_HW_SPI in drivers/block/Makefile to select the SPI driver.
  • Move CONFIG_HW_SPI out of config.h into necv25.config for greater flexibility.
  • The #ifdef CONFIG_HW_SPI around spi_read_block/spi_write_block in ssd-sd.c is OK for now, but even better might be to define a new spi_read_block wrapper function in ssd-sd.c for the ifndef CONFIG_HW_SPI case which handles the calls to spi_receive/transmit for the software driver. Then there's less ifdefs and spi_read_block/spi_write_block remains the upper level function to send/receive data. That also allows you to remove the ifdef CONFIG_HW_SPI in spi-8018x.h. (And for good measure rename spi-8018x.h to spi.h).

Everything else looks great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants