-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADF write corruption #32
Comments
I noticed something similar recently. I copied real floppies using the internal floppy disk drive to a Gotek connected as the second/external drive on an A500. I used the Workbench 1.3 diskcopy utility from the command line. Everything seemed to work fine until I tried to use the copies from the disks. I get disk corrupted errors (I do not remember the exact error message at the moment, need to check) for many of them but not all. I have programming skills in C, an oscilloscope, a logic analyzer and a ST_LINK/V2-1 debugger/programmer from a Nucleo-64 board. The only thing I lack is time to really dive into it. But I hope this is better during the Christmas holidays. Happy to get your advice @keirf where to start and how to approach the issue. Anyway thanks for the cool project and providing it for free! |
Always worth trying another USB stick. bstrobel: First port of call is to make a debug build and then you get serial logging. This will tell you, for example, whether any writes were missed (which, if writing to an ADF, will generally mean your flash drive is very slow on some writes). |
Ok, will try this. Will take some time since I have to set up the tool chain first. I'll come back once I have it up and running. |
Sorry, that I need to ask, but how to make a debug build? I have set up the dev env and have also executed a "make dist" successfully. But how do I tell make to build a debug version of FlashFloppy? |
debug=y make -j8 gotek You can adjust the baud rate at the top of src/console.c (default is 3Mbaud) |
Here is the log file of FlashFlopy 0.95a while copying a file that resulted in checksum errors. FlashFloppy compiled with Aside the normal "Write x/y" lines, I noticed these:
An attempt to read the copied file (it's a lha file so I'm just doing a "checksum error on disk block 980". |
Can you attach the damaged ADF? There is nothing obvious in the log. All the Write x/y lines appear to come in groups of 11 (as expected), and there are no warnings about bad headers or missed writes. This might therefore indicate that writes are getting into Gotek RAM okay but writeback to USB is going wrong... |
Here it is: |
Okay, so I extracted the test file from the adf ignoring the checksum errors, hexdumped it and took a Then I noticed that the errors consist of either:
|
If it's not a hardware problem (stick, Gotek) then another plausible explanation is a bug in the USB stack, perhaps sending the same packet twice (USB full-speed packets are 64 bytes). There's a data FIFO in the STM32 chip's USB hardware, and I wonder what happens if that doesn't get filled quickly enough as data is sent. You could try raising the priority of the USB IRQ -- it is USB_IRQ_PRI at the bottom of inc/util.h and you could try changing the value from 14 to 5. |
I'll try that and report back, thanks. |
I was installing MSDOS622 using 0.9.5a today, and I had several read errors. |
Raised the USB IRQ priority to 5, didn't seem to help. Also I compared two ADFs after doing the same operation on both (copy testfile df0:). Both ADFs were corrupted at random different places, so this is not deterministic at the ADF level. However, I can see a pattern: duplications always happens at offsets ending in 0xC0.
|
Wouldn't particularly expect the corruption to be deterministic, this sort of bug often isn't. You haven't said whether you've tried other USB sticks or Gotek hardware, yet? |
I might have solved this problem and it was NOT related to FF. I reformatted the Amiga hard drive (actually a 2 GB CF) and installed WB 2.1 but this time I used scsi.device 43.45 + PFS3 filesystem instead of the default scsi.device + FFS. I'm using Kickstart 37.300 which is known to cause problems with HD sizes >40 MB so it might make sense that corruption was actually originated at the HD reads. I'm cautious because bad HD reads don't explain ADF FFS metadata corruption very well (CRC errores reading back from the ADF). But so far I've tested writing to 8 ADF images on both a Sandisk Cruzer Edge 16 GB (4-5 MB/s write speed) and a Kingston Datatraveler G4 16 GB (~16 MB/s write speed) and they all worked fine whereas I would get 100% chance of corruption with my previous config. |
I wonder if there are any viruses can cause that sort of behaviour? The definitive test would be to revert your setup, reproduce the errors on FlashFloppy, then test same configuration with a real floppy drive. But that sounds like quite a pain! Do you want to test some more before closing this ticket down? |
You did of course set maxtransfer to 0x1fe00 |
@tomse Yes, I used maxtransfer=0x1fe00 in both scenarios. I think switching from kickstart's scsi.device to 43.45 was the key. @keirf Well there were two more reports but you can count mine as a no-bug. The FFS metadata corruption is plausible considering that there is no memory protection and a misbehaved scsi.device could affect anywhere in RAM. |
Also, I'm attaching the FF binary up to commit 09e5b85 with debug enabled, serial bitrate = 230400 and USB IRQ priority = 5 to help people test without going through installing the ARM toolchain and all. |
bstrobel seems to have gone quiet. I will probably close this ticket down next week if there are no further reports. |
Quiet yes, but I have not lost interest in this topic. Anyway, I had some problems with my Amiga that I needed to solve first. That's why I couldn't investigate further. The connector of the PSU was corroded and causing unstable power supply. It took me some time to find this out. The Amiga seems to be stable now. I will test tomorrow. I think the PSU problem could also be a possible cause of the corrupted files. I'll let you know. |
No good news. Each file is corrupt it seems. Test procedure:
Result:
The 5 ADF files and screenshots showing the comparison results can be found here: |
I just noticed that I can easily attach files and pictures to these messages. So for convenience here are the screenshots: How I copied the files on the Amiga: WinMerge displaying the differences: My hex editor showing one occurrence of a difference. Note the strange data pattern there... Zip containing the 5 ADF files that should be identical: |
One more thing I forgot to mention. I used AmigaDOS diskcopy in verify mode. So it verifies each track after it copied it to the destination disk. This verification always succeeds. It never shows an error. I wonder how this can be. Does your software cache the data somehow? I would expect that it always reads the data from the file from the stick if it is asked to... If this is the case it would more point to a problem on the Amiga, but how can this be? |
And finally for today the debug log. I have built the version 0.9.5a with debug enabled and baudrate set to 115200. I have renamed the version to 0.9.5a-d and used the .upd file to update the firmware of the Gotek. I think your right that the USB stick might be too slow, because I see this in the log: Here is the full log. It covers a whole disk copy: I have only 3 USB sticks (don't need them so often any more these days). They are all of the same type: "Kingston Data Travel 2.0 16GByte" USB 2.0 Sticks. They work well in my oscillograph, for OS installs on PCs etc. |
Stick was too slow, try another. I will improve performance with slower sticks in a future release. If you can find one these days, I like Lexar. The reason slow sticks are a problem for disk copiers (except ones that do write+verify track by track) is that there is no flow control: disk copier moves immediately to next track and starts writing to the target, while USB stick is still writing previous track, and while that's happening FlashFloppy can't buffer that next track meanwhile (I will fix that in future). And your Diskcopy verify works because it only verifies that it sees valid amigados track data, it doesn't check that the track data is correct (ie the result of the diskcopy)! |
Yes, I think your right. But... As I wrote above, the disk copy software almost always reads back immediately what it has written to verify the success (not only in the Amiga). I think it is a bug that the disc copy software seems to be happy but the data has actually not been written to the stick. The only explanation for this to me is that FlashFloppy returns the just written data from its internal memory and does not actually read it from the stick. To illustrate the severity just look at my use case. These sticks where not the cheapest and are not very old. They are not fast but sufficient for occasional data transfer. (And Kingston is even something like a "brand".) So I as a user would assume they work fine. How can I know that they are too slow for FlashFloppy? How can I be sure that the data has been successfully written even if I use different sticks? There needs to be consistent behavior in this case and definitely a notification to the user. Not sure how to fix this. Didn't look at your code. But I think if it is not easily achievable to always read the data from the stick I would at least display an error message on the display or blink the LEDs or something. I'd suggest you reopen this case... |
Oh sorry, I just noticed that you already answered that diskcopy thing. Yes, you're right they most probably dont compare the read back data to the data they wrote. Didnt think of this. Anyway, you need to notify the user. |
For example try with XCopy with verify enabled. You will find your copies are reliable. Regarding "notify the user" you mean indicate via display when writes are dropped? |
Yes, exactly. For Amiga I would even "eject" the floppy image. That would notify the user of the Amiga that the copy has failed. You know, I copied about 20 floppies to images in a row only to find out afterwards that I cannot trust a single one of the image copies. |
Well AmigaDOS diskcopy is a bit crap to be fair. Still... it should at least be a config option to error on missed write. I might add: And make either warn or error the default... |
Yes, I agree about Amiga diskcopy. :) I guess I had to learn it the hard way since I have the Amiga for a few month only yet. Aside from that I wonder if other applications are always better than the Amiga. For instance I just can't imagine that a keyboard/synthesizer from the 80's or 90's has a better copy program (my Gotek was originally shipped and packaged for fitting into a synthesizer). I still think ejecting the image - which looks like ejecting the floppy to the device - is the best way to show something is wrong. The image is unusable anyway. On the OLED you could just display something like "Stick too slow, image ejected". On the 7 seg display "SL" for "too SLow" maybe? |
@keirf Are you thinking of double-buffering the track? I can try to implement that and send a PR over for you to review. No guarantees at all since I'm probably overoptimistic about my understanding of the code. |
@jamarju if you can decipher my code and implement that then you are a coding ninja :) |
I'm thinking of setting |
Hi again! So I ordered new USB sticks from Amazon: SanDisk Ultra Flair 16GB USB-Flash-Laufwerk USB 3.0 mit bis zu 130 MB/Sek. I also used X-Copy 6.0.1 for copying, but: Here is the debug log: Anyway, I also did some tests and this is what I noticed:
To sum up I doubt the "slow USB stick" theory seriously. I think there are some other things wrong. What do you think? |
@bstrobel looking at the log you are certainly correct there is something else up. Actually X-Copy glitches the WGATE signal at the start of track writes for some reason. I added a workaround for that back in August but I think I broke that workaround not long after! Please can you try the patch on branch wgate-glitch-filter (i.e., "git checkout wgate-glitch-filter")? Hopefully it will work better. Really you should not see missed writes in X-Copy Verify mode. Let's drill into this further if you still do. You may for example need to put a logic analyser on the floppy interface (especially the WGATE line). |
Thanks @keirf! I tested the wgate-glitch-filter branch. Same result. At least as far as I can tell from the log. Here it is: I noticed one strange thing. Where the log stops there is no "missed write" message but actually X-Copy is reporting "Verify Fehler". At lot of "missed write" errors were actually logged before when X-Copy didn't report an error. So it seems "missed write" messages and actual verify errors in X-Copy are not (directly) connected. I'm happy to invest some time into this. No problem! Starting today I'm 3 days at home with lots of time for this. I will connect the logic analyser to the Gotek and try to make some sense of the signals. Do you have good link where I can find diagrams for the expected data flow? |
@bstrobel I would expect to see approx 400ms or so spent on each track during write. 200ms with WGATE asserted and activity on WDATA. Then 200ms with RDATA active as the Amiga reads the track back to verify.. The "missed write messages" seem to occur at the end of your track writes. Perhaps on your system X-Copy is glitching the WGATE line at the end of a track write? A logic analyser trace would determine what is going on there. The log shows track 54 being written twice. That's weird. It either means X-Copy deliberately made two attempts on that track (unlikely you would get a verify fail on two consecutive writes though, would expect fails to be randomly distributed across tracks). Or perhaps a head-step signal was missed. Worth getting a logic analyser trace for the STEP and SIDE signals too. You don't have a speaker mod on the STEP line I assume? I am around tomorrow, doing some other jobs including an A2000 repair, but will have time to advise you if you want to communicate on this ticket or via email (my email is in the Git repo commits). I may also have time to try X-Copy myself and see if I can repro the issue. |
It appears I will not have as much time as I thought today. Anyway I'm going to connect the LA and set everything up now until I have to leave for some other stuff. I found the Amiga Service Manual provides all information that is needed. I downloaded it from here http://amiga-manuals.xiik.net/amiga.php. I use the Gotek as an external floppy drive. The external connector and logic board is from Ebay. Not sure I can trust it fully. I have another external "real" 3.5" floppy drive as well. I'll play around a bit and see if can find something noticeable. By end of tomorrow I should be able to post a few screenshots from the LA. I will also check the signal quality directly at the Gotek connector with my oscilloscope. Maybe there are some unwanted glitches on the signal lines. I'll keep you posted. |
I was able to spend some time on this. I connected the LA and used X-Copy 6.01 to do a DOSCOPY+. It failed after the second sector: As you can see WGATE is active (low) for 2 times of about 200msec length with a very short break. I assume this is the first sector. That one is verified okby x-copy. For the second sector WGATE is also active for two 200msec periods but with a break of about 200msec in between. May this be the issue? Nothing happens after what you can see in the screenshot. I triggered on DS0 because this is what the drive sees (although it is DS2 in fact.) This is the full capture: You can download the LA software from the Saleae website and drill into it yourself. But I guess you know this already. :) |
Thanks I use Saleae myself so this is useful. SIDE signal would have been more interesting to sample than, say, TRACK00. Then we can tell when it is trying to read/write same track back-to-back, vs switching side and doing other track on same cylinder (if you see what I mean). I will also see about trying this myself. I know last time I tested X-Copy it was a different version (X-Copy (1993)(Cachet).adf). Can you provide me a link to exactly which 1992 version you tested (eg. a link on this site: http://jope.fi/xcopy/masterdisk.html). It is possible that different versions behave differently, also I may have tested DOSCOPY rather than DOSCOPY+, but not sure that should make much difference... |
I will do another trace with SIDE signal instead of TRACK00. I thought with TRACK00 we could reference with the STEP and DIRECTION signal which track is actually written. But we are stuck on track 0 already, so it doesn't make sense. |
Here is the sample: Regarding the X-Copy version, I use a disk that I got with my Amiga. When I boot from it I get this screen: Then I press F6 for version 6.01. |
Discussed this with @bstrobel offline and it seems there are possibly two issues:
|
FYI I will work on pipelined/buffered writes after I release v0.9.7a next week. With luck it will be in v0.9.8a. |
Thanks Keir! Great news!
2018-01-28 11:36 GMT+01:00 Keir Fraser <notifications@github.com>:
… FYI I will work on pipelined/buffered writes after I release v0.9.7a next
week. With luck it will be in v0.9.8a.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFX9MBHkcCJwsXDyI-TOBAMMfDhJVE8Kks5tPE21gaJpZM4Q8eXI>
.
|
Please can you try the following patched firmware. Ignore the version number (it is v0.9.7a with patches on top). Pipelines writes. Should fix your issues. Alternatively you can build tip of current master (this contains the same patches). |
Thanks Keirf! I tried your ZIP. Unfortunately it doesn't work. It always fails on the track 0, side 1: I've also run the LA once with DOSCOPY and once with DOSCOPY+. Both recordings are in this zip file: |
Oh dear. Can you get a serial log output? I may also add more serial logging for you. Perhaps we can continue this by email (my email is in my Github profile page)? I should add I was unable to reproduce this back-to-back writes behaviour, but perhaps it is something to do with using multiple/external drives (I tested DF0-to-DF0 only). The LA output is weird in some respects, eg READY does not follow SEL0. Should be permanently asserted (ie LOW) while the Gotek is selected.... Perhaps READY is actually SIDE? That would make more sense... A further note: RDATA is inactive now for the second back-to-back write, which indicates pipelining is somewhat working for you. It seems to read both sides of DF1, then write both sides of DF2, then verify side 0, then verify side 1 (and fail), then retry write-verify-write-verify of side 1 before failing. Very unhappy on that side 1.... needs serial logging. |
Closing this as @bstrobel issue is not related to the new write pipelining patches, and we suspect hardware problems. If further analysis tells us different then we will open a new issue ticket specific to that. |
Just to finally confirm the hardware issue. Maybe this can also help others that use external floppy adapters from kmtech.co.uk which are available on ebay. This adapter uses a 4 NAND 74HC00 instead of the required open collector version 74LS38. This leads to conflicts between the external floppies on the RDY line which obviously caused this problem. I swapped the 74HC00 with a SN7403N (another open collector version of the 7400). Now I can copy from my 3.5" Floppy (DF1:, 2nd drive on the Amiga) to my Gotek (DF2:, 3rd drive on the Amiga) without problems. Thanks again, Keir, for the great work and the patience with me. :) PS: The other issue was, the input lines of the unused gates and flipflops of the 74HCs are not connected at all. This is just plain bad design and can cause random problems. |
Ah yes I will bear this in mind for future user problems. In fact I have seen these problems in the KMTech design myself, years back, and informed Kevin Mount but never received any response. So I cannot recommend any of his products. |
I'm getting sector checksum errors when writing to an ADF image. Sorry for being so vague, I will try some way to reproduce it consistently. So far it's happened while saving a civilization game but also just copying files over from Workbench. If there is useful data from the firmware out to the serial port I can try and dump that too. Also THANKS for building this, this software is a godsend (therefore you must be a god).
The text was updated successfully, but these errors were encountered: