Bitbanged SD-Card interface for the NEC V25 port#2445
Conversation
While preparing the SD-Card interface for the NEC V25 port I noticed that //-style comment at the end of #defines leads to problems in inline assembler. So include files and dependent source files had to reformated.
Reformating because of //-style comments. Setup of "struct serial_info" for default 9600 baud rate. Prevents artefacts on first baud rate switching from already 9600 baud to again 9600 baud.
Two possible clock frequencies are documented in the comment for CONFIG_NECV25_FCPU. Both options lead to dividers for minimal baud rate errors.
These are the changed files to integrate the SD-Card interface for the NEC V25 Port into ELKS.
This file contains the bitbanged SPI interface using 4 highest bits of the into the NEC V25 integrated Port 2. Assembler opcodes specific to the NEC V20/V25 are used (SET1 and CLR1) to set and clear single bits. Using these atomic commands eliminate the use of CLI/SPI to prevent interrups.
|
Well, @swausd - this is pretty dang impressive!! Even though you're saying it's slow, it's very cool. I'm thinking at some point I'll rename ONFIG_BLK_DEV_SSD_SD8018X to CONFIG_BLK_DEV_SSD_SPI to be less confusing, then use the architecture to select the specific SPI driver file(s). But this is fine for now. I notice you've changed the CONFIG_NECV25_ FCPU from 1.4Mhz to 2.2Mhz... is this what you want? I thought you said the system couldn't handle the overclocking? Also, are you running out of ROMFS space using CONFIG_APP_xxx defines trying to find the right set of applications? I'm wondering whether we should perhaps use a new tag in Applications instead for this architecture? What specific applications are you trying to add/subtract in the limited ROMFS filesystem, and how big exactly is it? I'll wait for your answer on CONFIG_NECV25_FCPU before committing. Thank you! |
|
Also, any ideas why a retry is required on going to IDLE at startup? Do you suppose the hardware requires a bit more time after initialization, or is this perhaps required for all commands? I'm not familiar with SPI or this driver (yet) so I'm just wondering out loud... |
|
Also fat support in ELKS is quite slow compared to Minix. Just switching to Minix might give you 20-40% boost in IO. |
|
Hi @ghaerr , About ROMFS: I selected the apps, that fit into the space I have, and which are most likely used after boot. Everything else can go to the SD card. I hope this is a good match for now. About retry: I tried to give the SD card way more time after the go idle command. But thad doesn’t help. One retry fixes that. May be, this is specific to my cheap 2GB SD Cards. |
@toncho11 , thanks for the advise. Did not try minix fs. Right now, I do not know how to create and use such fs with ubuntu. |
We have an |
Have you measured the clock of your SPI? I've got 200 and something khz on my 20MHz SBC |
|
@swausd You use the first serial interface SIO0 (I suppose) for communicating with your Linux computer through USB. But there is a second one SIO1. Can this serial SIO1 (in synchronous master mode) be used to communicate with the SD card? In this case speed can reach ~0.5–1 MHz clock. For example: You must handle level shifting on at least the signals going from the V25 to the SD card (MOSI, SCK, CS). Here are some potential functions to be used with a new such driver: Clock polarity and timing The V25 synchronous serial in master mode uses a fixed clock polarity (rising edge sample). I suppose you already did the voltage leveling, but still:
|
|
Hi @toncho11,
I reread the V25 datasheet this morning, but there are still some questions left unanswered. For example there are different clock lines for Rx and Tx. You mentioned the sample timing and polarity of the clock. I understand, that this can not be changed in the V25 by configuration. May be external hardware has to be added. I will look into this after trying different file systems. My problem is, I only have the V25 datasheet. Application notes or examples for the V25 are very sparse on the internet.
Yes, I use one of the cheap interface boards usually used with Arduino. This one has a micro SD holder, 3.3V regulator and level shifter on board. I should even try switching the cheap 2GB SD-Card for a newer sandisk card. Still a lot to do. But that is the way we want it, don't we. If all is fixed, there is nothing left to do. My sons always ask me: Why are you doing all this? Do you have a use case? No I don't have a use case. I do it just for fun :-) |
|
Yes, in dos when doing dir you can usually have a very long pause until the free blocks are calculated etc. That is why I use small partitions of 30MBs. So a similar problem can exist in ELKS, I was already thinking to suggest that in my previous post. Minix 20 MB is the best. In the end fast IO is not needed that much because programs are small to load and you usually have everything written on the SD card by your host computer. Speed is needed when we will have a swap file for example. Then it becomes critical. |
|
@ghaerr , @toncho11 @cocus , I then tested the time to write my boot ROMFS, 426 block, to a newly formatted, empty SD-Card "cp -v /bin/* /mnt" with the following results: I also checked different SD cards (cheap old 2GB noname vs new 64 GB sandisk). The difference is not really noticeable. So next I check if and how I can use the 2nd serial port, serial0, for SPI. This port has a special synchronous mode. |
I knew it was slower, but didn't think it was THAT much slower!! I think the main reason is that the FAT filesystem driver is pretty much a 512-byte sector-oriented FS hacked on top of a Linux 1K block-oriented buffered I/O system, along with LFN directory searches being inefficient. I've been meaning to implement some kind of kernel profiling so that we can learn more exactly where bottlenecks might be. Also I think on very slow systems like the 8088 and V25, (buffer) data copying and the buffered I/O system starts becoming a major portion of kernel time spent doing things, compared to other kernel activity. Changing the number of EXT buffers allocated can help, but can't overcome slow memory copy speeds. For instance, in a buffered I/O system, the kernel will always perform I/O into a system buffer, then keep that buffer while copying the data to the application, rather than performing the I/O directly to an application buffer. When the filesystem data is spread out and buffer count small, this slows things down considerably. |
|
@ghaerr Should this work or do I have to use a special version? |
This should work. Looking at the disk_utils/fsck.c source code: It appears that Should this be the case, it points to a possible problem in the /dev/ssd driver, when reading multiple sectors. The upper layer of the SSD driver will convert this into two sector reads, if I'm not mistaken. You might add printk's to the SSD driver to trace return values, or easier possibly just run After learning a bit more, I can continue to help debug what is happening. |
|
Thanks for the quick answer. Will look into this tomorrow. |
|
@swausd And you use this https://hackspark.fr/en/electronics/932-micro-sd-card-reader-module-spi-interfaces-with-level-converter-chip-arduino-compatible.html for both level shifting and SD card reader? |
|
@swausd Here is one possible configuration. I consulted with AI. The SPI protocol uses a single clock line (SCK), shared between master and slave. So — when you adapt the V25’s SIO1 to talk to an SD card via SPI — you must make the transmit and receive clocks the same. That means: Use the TxC (transmit clock output) as the SPI SCK line to the SD card. Tie the RxC pin to the same line, so receive shifts occur on the same clock edges. Hardware connection summary: You make the V25 synchronous master, so TxC1 provides the clock, and RxC1 sees the same clock edges for incoming bits. Configuration summary (SIO registers):
This way, transmit and receive operations are fully synchronized like SPI. Yes, the V25 has separate Tx and Rx clock pins — but for SPI communication, you must tie them together and use the transmit clock (TxC) as the shared SPI clock. That way, both Tx and Rx shift on the same edges, emulating SPI correctly. So we have full-duplex SPI-style communication, even though:
|
Yes! |
|
@toncho11 , many thanks for the input!
Sounds plausible! I will look into this, after I have checked the fsck problem (see above) |
Also I've just noticed that replacing debug_blk with printk in ssd.c and debug with printk in ssd-sd.c (and some printk's in sd_read) will give lots of info without too much work, you might start with that, although a ROMFS re-flash will be required. |
|
Hi @ghaerr , int ret = read(IN, super_block_buffer, BLOCK_SIZE); This is what I get on boot: ELKS Setup START ELKS 0.9.0-dev [2.18 secs] login: root
SSD: open I could not find out which function is called by fsck read(IN, super_block_buffer, BLOCK_SIZE) Any suggestion for further research? |
|
Ok @swausd, I can see you're getting as addicted to this stuff as I am. So |
|
Allright - I've figured it out: all block device drivers are required to set the block device's i_size field to the number of bytes contained on the (possibly dynamically changed) device. This is done in ssd.c using In your case, this is set from the ssd_init() function into ssd_num_sects from the result of the ssddev_init() sub driver function call. Also in your case, but a bug in general, your mounted SD card happens to have a huge number of sectors, which (shown from your debug output) is 2013265920K. Thus I think what is happening is that this large 32-bit value is being shifted left by 9 (to convert sectors to bytes) and ends up setting i_size to zero. Thus, the Another kernel bug smashed! Try testing this by forcing a much smaller return value from ssddev_init(). Let me think about a final fix, the value will have to be clamped in ssd.c such that a left shift << 9 won't overflow an unsigned long. Thank you! |
|
Booting from ROMFS after power on. And there was no read from ssd-sd.c |
This might work. In fact, the 80c188 also has two UARTs and one (or both?) can work in this way. Too bad I've selected the UART0 for the main console, because that's probably the only one that supports synchronous. Should check, but please let us know. HOWEVER, I'm not sure if the CPU can generate the TxC or if it needs to be provided outside. I think it might just be able to generate it from the main CPU clk. Please let us know, since this might speed things up a LOT. Common SD cards can operate in 20MHz SPI mode without issues when your traces aren't that long (5cm/2" or less). |
|
@fhendrikx |
Last year I build a system based on a 68000 variant, a 16 MHz 68EZ328 alias Dragonball, and ported CPM 68K to it. This CPU has a slow build in SPI interface. I wanted to go faster and added a 16 bit bus mapped CF interface. This was significant faster. Then I tried a CF to SD adapter on this CF interface and this was even faster than the sandisk Ultra II CF cards. |
As you say, you can use one of those adaptors to use SD cards in a CF slot. TBH I can't remember testing it, but should work just fine. |
|
Hi @ghaerr
I just did a git pull and a rebuild and everything just work! Many Thanks!
|
|
@toncho11, @cocus Despite @toncho11 finding:
In my real life CPU there is no I will check if a hardware inverter for the signal to the SD card CLK can fix this. But I think a flip flop gate is necessary. That would fill up board space, add hardware complexity and the second serial port is lost for ELKS. I then would like a different solution better, for example an ATmega coprocessor for a parallel bus to SPI interface. |
|
@swausd Try to ask ChatGPT. I got: From the µPD70320 datasheet: TxC / RxC in synchronous mode behaves like CPOL = 1, CPHA = 1, i.e., SPI mode 3 There is no configuration option to invert edges or shift phase. That means the V25’s synchronous serial will:
SD cards in SPI mode 0 expect idle low, sample on rising edges. Possible solutions:
You can manually toggle SCK at the exact edges and idle state the SD card expects. This gives you full control over CPOL/CPHA.
Use a logic inverter or XOR gate to produce the correct CPOL/CPHA combination from the V25 clock. Some people use a D flip-flop to shift phase by half a clock cycle, effectively converting mode 3 → mode 0.
Some SD cards work with mode 3 for initialization, but not all. Might work for later reads/writes, but risky for standard initialization sequence. |
|
Hi @toncho11,
Currently I try to invert and delay the serial Tx clock using five 7404 gates in series to the SD card + one 7404 gate to invert this clock back again for feeding to the serial Rx. Usually propagation delay is something we don't want. Here it could be the solution. With very high SPI clock the propagation delay is significant, but signal distortion as well. Will report. |
I didn't dive deep enough to check, but if the phase problem is greater than 1 data bit, what if a 9 bit serial is used and the first (i.e. the first one to depart the CPU) is always a dummy bit? Otherwise for the idle state, yeah, a simple inverter should work. Not sure if that fixes the phase problem because I didn't check the CPU datasheet. It might be possible to create some additional hardware to make it work. EDIT: sorry for the 2x post, I was using another account and forgot to switch to my personal one :) |
|
@cocus So there is no simple option to phase shift 1/2 clock cycle / 180 degrees which (and the inversion) is necessary for SPI mode 0 to mode 3. That is my understanding, but may be I am missing something. |
|
@cocus |
|
@toncho11 , @cocus , Problem I have now is, it works with max. „baud rate setting“. As soon as this baud rate is reduced, the receiver reports wrong values - reproducible. On the oscilloscope everything looks good. Receiver, transmitter and clock all shows the expected values and in sync, Signals look clean too. But the receiver thinks different :-( So there is some work to do. Speed has doubled compared to best values before. Copy /bin from ROMFS to SD card now is done in 29 sec. Not bad. May be it can be pushed further with some optimisation using inline assembler in the block read/write routines. I will report… |
|
The usual culprits are:
Fixes: verify both ends use the same clock domain, measure exact timing of sample instant vs MOSI, and add a small programmable delay (or hardware phase-shifter) so the receive-sample point sits squarely in the data valid window. Alternatives:
|
|
Fantastic progress going from 55 secs to 29 secs using MINIX filesystem and mode 3. Are you still going to use Mode 0 for initialization, as was mentioned above, or is the whole driver now running Mode 3? I'm wondering if there's a way for the init routine to determine whether the SD card supports Mode 3, or whether we should even be concerned about that? |
|
To be fair, I don't think adding a MCU just to do the phase shift is a good idea. For instance, I remember that some PIC microcontrollers can be used in parallel bus systems as slave devices. Most of them have SPI transceivers. In fact, I bet there are commercial chips that can work in parallel bus systems and are SPI master devices (I've seen I2C ones at least, but they have to exist). From my point of view, maybe the "high" baud rate works due to the fact that the wiring might add an additional delay that's just "good enough" for the SD. Just speculating. but excellent work! |
|
I like the idea of ATtiny85 because it is a single chip solution. But indeed it does not have real UART and SPI. While the SPI approaches the real thing, the serial must got through bitbanged serial. It will also be read byte on serial, write it on SPI. The final speed will not be impressive. Limited throughput:
|
|
Or maybe ATmega328PB TQFP - it is both small enough 7mm x 7mm and has hardware UART and SPI. |
|
Hmm what about a kind of mountable driver (similar or Minix/FAT) over the serial port? When you mount remote serial 0 /dev/rs0 then it uses the serial port to connect to a host modern computer. The modern computer exposes its file system through a protocol and a host program written in C, .NET, Java or Python. This way we get "networking" and it avoids the entire sd card need. You use the serial port 1 for communication and serial port 0 for file transfer (as before). It saves you the need to copy to the SD card and eject, put the SD card each time you want to add a new file! Both the sd card and this idea of a serial mountable driver are limited by the same UART speed of the CPU. |
|
Technically that's been done before, with protocols lile slip. Does ELKS
supports it?
El lun., 10 de noviembre de 2025 21:10, toncho11 ***@***.***>
escribió:
… *toncho11* left a comment (ghaerr/elks#2445)
<#2445 (comment)>
Hmm what about a kind of mountable driver over the serial port? When you
mount remote serial 0 /dev/rs0 then it uses the serial port to connect to a
host modern computer. The modern computer exposes its file system through a
protocol and host .c or .net or java program.
This way you get "networking" and it avoids the entire sd card need. You
use the serial port 1 for communication and serial port 0 for file transfer.
Both the sd card and this idea of a serial mountable driver are limited by
the same UART speed of the CPU.
—
Reply to this email directly, view it on GitHub
<#2445 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD73TX5UHPE2E5MWISSBWD34ESPNAVCNFSM6AAAAACLPANBICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMJUGQZDONBXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
NFS over SLIP does that. NFS and most all other remote filesystem protocols require UDP (instead of TCP), since it is connectionless and so much quicker than TCP.
Aside from it would be very very slow, first one would have to write an NFS (or other) filesystem for ELKS from scratch, and secondly, UDP would have to be implemented in the ELKS TCP/IP stack - a huge in both programming effort and memory usage. Even moving the "SPI" SD implementation through a UART serial port/driver would end up increasing the number of instructions executed per byte sent from the current 80+ to 500-1000+. So, after it's all done you'd probably be looking at 11+ minutes to transfer /bin to or from another system. I haven't even considered how a lower baud rate would perform, ELKS maxes out at ~9600 baud on early CPUs before getting UART receiver overrun errors due to ELKS interrupt overhead. We do have the CONFIG_FAST_IRQ4 option for bypassing kernel overhead that allows baud rates up to ~38.4K, depending on the UART FIFO size. |
|
@ghaerr :
Only mode 3 is used. And no slow down for initialization, it runs at the max clock.
Don't know how to get this info. Didn't see this mentioned on Elm Chan site. |
|
@toncho11, @cocus : About parallel to SPI: I have seen multiple discussions on the web why there isn't an available chip doing this like for I2C. But there really seems to be none. Right now the speed of SD cards with my V25 is OK with some potential for optimization. |
|
1] I am suggesting a driver that does work over pure serial port. No network packets packaging. You define the protocol. You can ask chatgtp to define you the protocol - all commands and their parameters. It is suggesting something similar to FTP optimized over serial. Then with some effort you can ask it to write the ELKS driver and the Linux host program. 2] I see that Plan 9 has a protocol called 9P. A reduced set of this protocol can also be used for this task: [ELKS] <—UART—> [Linux host running u9fs] You can simply redirect u9fs I/O to /dev/ttyS0. |
|
Hi @ghaerr;
The bit banged driver should only be used if the serial port 0 is not available and for now has to be manual selected in the Makefile. |
|
Hello @swausd,
Fantastic work @swausd, as usual! Thank you! You've done an amazing job showing 5x speedup using CONFIG_HW_SPI and your new hardware SPI driver. I like how CONFIG_HW_SPI now allows either software or hardware SPI to be used. I would suggest the following minor changes from your current repo, then yes, please post a PR:
Everything else looks great, thank you! |




Implementation of a bitbanged SPI interface using 4 pins on a parallel port of the NEC V25. The port is based on the SSD-SD block driver and the 8018x SPI interface created by @cocus. This interface is rather slow - feels like a floppy disk - as every transfered byte has to go through about 80 assembler commands.
Some existing files of the NEC V25 port had to be reformated.
Picture of console output with 2GB SD card, FAT16, no partition:
