Conversation
|
Nice. Please switch it to ON by default. I will try to test it. |
|
It is ON now by default.
On Jul 25, 2025, at 1:30 PM, toncho11 ***@***.***> wrote:
[https://avatars.githubusercontent.com/u/1851997?s=20&v=4]toncho11 left a comment (ghaerr/elks#2370)<#2370 (comment)>
Nice. Please switch it to ON by default. I will try to test it.
—
Reply to this email directly, view it on GitHub<#2370 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC3OFZIQPY25OVNQWCHNMP33KKHVDAVCNFSM6AAAAACCKQVZFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMRQGI2TINZYHE>.
You are receiving this because you modified the open/close state.Message ID: ***@***.***>
|
|
Ah, yes. Sorry. |
|
Everything seems to work with the last build except for missing cfa* in /dev. Once I rebooted with root=cfa1 and disabled=hda I was not able to mount any /dev/cf* because there are none. It is strange because at boot /dev/cf* seems to be present. The system boots normally and prints /dev/cfa1 for root, but when you look in dev it is not there. The /hda* are there and I have 43 entries in /dev. Maybe I reached the max for /dev? Maybe the /dev/hda* are not renamed to /dev/cfa* when disabled=hda ? |
|
I use a minix image this time and xtide=3. |
|
Let me check why there is not a /dev/cfa* on MINIX filesystems. Strange, because that's what I test with. It could be that |
Yes - that's the problem, the new /dev/cf* entries had not been added to |
|
So I made tests: copying the bin folder 2 times on a minix filesystem with time cba1: 4m.16s and 4m.17s I restarted to avoid any caching. |
|
Chatgpt suggested these optimizations: |
Wow, the speed difference sure is big. Although XUB is extremely tightly-coded ASM language, we should be able to do better than twice as slow though. It will be hard to replicate on my side since I can't actually emulate XUB, but I'll try to think up some ideas as to why.
Well - no. XUB definitely does 16-bit transfers for XTIDE cards, but XTCF is purposely built for 8-bit I/O, so I don't think that can be the reason. (It might be worth testing with your XTIDE v2 card and setting xtide=2 to both test that portion of the ATA CF driver as well as speed sometime though, since that I/O mechanism is 16-bit transfers, which was the purpose of the "high-speed" XTIDE v2 mod).
Both of these would be great - except that the INSB/OUTSB (and INSW/OUTSW) instructions are not present on the 8088. There were all added in the 80186. We use a loop like the following for the recently-introduced fast I/O: That is, three instructions per 8-bit read. That could be optimized faster by doing two I/O instructions per loop using something like: I'll try that. QEMU is very fast on my box that it's very hard to tell the difference in speeds, but should show differences when lots of data is being copied.
While not using ASM for the polling, I don't think these are the issue since they only occur once per sector read, whereas the I/O itself occurs 512x per sector. It seems to me only the latter has the capability to decrease the total throughput by half. I'll look into both though. Thanks for your testing and suggestions. |
Was this test made reading from, or writing to, the CF card? This could make a big difference. The current driver design waits up to 50ms after each CF read for the CF to become "not busy". This was part of the original design and could be slowing things down a lot. On writing, the CF driver waits for the card to signal not busy as well, since according to spec it could potentially take seconds for some sectors to write to flash. So this wait is not as easily removed. It sounds like a potential redesign of the driver wait mechanism might make sense, at least on the read side, if your testing was CF reads only. |
|
It was: |
|
📚 Intel Manual Confirmation (8086 Programmer's Reference Manual) Opcode: 6C Available on: 8086, 8088, 80186, 80188 ✅ Conclusion They are safe to use for your XT-IDE driver on real 8086 hardware. |
|
So it should be: |
|
Chatgpt rewrite: |
|
Are you using chatGPT for "Intel Manual Confirmation (8086 Programmer's Reference Manual)"?? Did you actually read the manual, or are you still believing chatGPT?
Sorry, but not true. Here's the original Intel application note on the 80186 showing the new instructions: None of the emulators nor disassemblers I've written for 8086 include opcode 6C, it's part of a well defined "undefined" section of opcodes in the 8086. The speed issue is somewhere else, I wish it were as easy as just adding INSB/OUTSB. |
In that case, we've likely got the worst of three worlds: possibly slow I/O using single IN or OUT byte instruction per loop; wait after read, and wait after write. Can you perform the test(s) FD/HD -> CF or CF -> FD/HD separately? That might help figure where the issue(s) are. I suspect they are in all three areas. I can change the driver for speed, at the potential risk of the third issue above: not knowing when a write fails to the CF card. What do you think? |
|
Sorry. I won't be able to test for while. |
No problem, I almost believed it for a moment, but had to set the record straight.
I'm looking at the XUB source and are seeing ways this can be sped up, which I'll add. Thanks for your testing, I'll devise some ways to test over here and hopefully improve the speed. |
|
I think we can start with adding a FASTCF (for fast CF cards) flag or ATACF (only for CF) that reduces the wait write confirm times to a few ms in ata_wait for CF cards. Chatgpt says that a write should typically take between 2 and 7 ms with the confirmation. So we can check every 3 ms. Why not set the wait time dynamically? Maybe check the response time at boot and adjust the wait. Or make statistics at the first write and then adjust for all future writes. |
|
Also ELKS reported boot time is 2 times slower with cfa compared to hda. |
|
I was thinking of this character that is turning on the right upper corner of the screen on all reads/writes. I think the tight loop in ata_wait() and the console might interfere. Also it seems over-polling can degrade performance on some CF cards. We can:
|
Wow, that's a big difference. I'm looking into this, I don't think things like the spinning cursor have much to do with a 2x difference in speed between our ATA CF driver and XUB through our BIOS driver (the cursor spins during both anyways). However, our BIOS driver uses a cylinder track cache which could be producing a big difference - it certainly does on floppies. Both that mechanism and XUB use multi-sector reads which might also change the timing. For now, its quite a lot of work to pull out the BIOS track caching, and the ELKS block drivers will use 2-sector read/writes on MINIX, and larger full-cylinder multi-sector reads with floppy track caching. We need to prove that track caching is the bottleneck before implementing it, but the BIOS driver doesn't use track caching for hard drives, just 2-sector read/writes. So there's lots going on under the hood that could contribute to this. Also, setting the number of EXT buffers larger (or smaller) than 64k could also make a difference. Lots of tuning that'd need to be done on real hardware if it matters. I've finally hacked up a version of PCem that will emulate both XTIDE v1 as well as (for the moment) XT CF, so that I can actually test w/o real hardware. BTW, the ATA CF XTIDE works, so no big rush to test your real hardware. This setup will allow me to test enhancements to the insb and outsb macros for XT CF to try to speed it up. I'm not really sure if that's a bottleneck or not. I just tested using the FASTIO vs not for a 500 sector
We don't want more delays in waiting. Remember, the waiting is only performed while it has to: when the device is signaling that it is not ready. We can't decrease that wait time, and the max wait times are set by the ATA spec. If your device is faster, then the waits will be smaller. There is an unneeded wait after reading a sector that will be removed, and I found that we're not actually checking for a write error in the wait after the write. Both will be fixed.
We check as fast as possible; waiting longer to check doesn't help any because there's nothing else for the system to do when locked in a read/write loop using |
|
I see that there are a thousand aspects. So a write, read or wait can not be interrupted by the kernel for let's say console output? Still I think the console output might be slowing the whole IO. The CPU must execute it after all. |
Yes, the I/O wait is interrupted by the kernel by the hardware interrupt, which occurs every 10ms. Every 80ms, the spinner is updated (that's just less than 1/10 second). Very little time, really. Remember, our BIOS driver going through XUB does the same thing (updated the spinner) so the spinner isn't different between the two, but there's apparently a 2x speed difference in booting anyways.
Yes, it might slow it down by 1/10 second overall. No way does it slow down from 9 seconds to 18. Can you remind me what the boot timing numbers were for ATA CF vs XUB? I can compare them to the PCem emulator which is currently booting from ATA CF in 6.64 secs. |
|
On my Amstrad 1640 I got 3.5 seconds and 7 seconds for cfa and hda. I was suggesting to avoid over-polling to prevent CF card bus saturation by adding a small delay (small nop loop) between the bsy register checks. I think it is worth checking. |
|
Do not blame me too much. I am too tired this evening :). Chatgpt is suggesting this: Benefits:
|
|
The BIOS might be able to better handle both console and int 13h. It is aware of both. |
|
Based on some indirect evidence I think the read is more problematic. When I use the dd command towards the floppy I noticed that it was slower than on DOS. If the floppy speed is the same, then it must be the cfa read, |
None of my emulators will emulate bus saturation. If you really think this would make a difference, go ahead and add it and test on your hardware. I haven't read anything about a delay required when polling the status register, but who knows? I can also add it if you want to test again without compiling.
The ATA CF driver is essentially the same as what ChatGPT is suggesting (with all the other listed benefits), with the exception of the bus delay. If you can have ChatGPT dig up a link as to how excessive polling slows down CF cards, I'd definitely like to read about that.
You mean that hda is 3.5 and cfa is 2x as slow - 7? |
Yes. |
I see that an error is reported only after bsy is cleared, so the above code is not even valid. I thought that if we could detect an error early on then we can save some time until bsy is cleared. But no. Anyway this is only in the case there were many errors. |
A (somewhat crude) idea would be to video the boot screen from HDA, then from CFA. Since one is 3.5 seconds slower than the other, we might be able to tell what is taking all the extra time literally from watching the video. That's quite a bit of time difference and one would think we should be able to notice something... I'll keep playing with PCem to see if I can get anywhere near 2x changes, but so far nothing. |
Holy heck! PCem is duplicating the speed issue, 2.25 seconds with root=hda1, 6.64 seconds with root=cfa1. XUB is being emulated as well through BIOS hda1. It seems the boot is approximately the same until the filesystem is mounted via VFS. Then, our cfa1 driver is taking much longer than hda1 via XUB. I should be able to figure out somehow where the problem is. |

Uses GCC ASM extension for fast I/O in ATA CF driver.
Currently turned on using FASTIO=1 in ata.c. Tested on QEMU but not real hardware. ATA CF I/O should be as fast as possible now. IDE query code also updated to use insw() macro.
Getting the GCC ASM constraints correct took a little doing on this one, given ia16-elf-gcc doesn't seem to exactly match GCC documentation.