New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with mass storage in Netusbee #70

Closed
Kroll2017 opened this Issue Sep 25, 2017 · 111 comments

Comments

Projects
None yet
7 participants
@Kroll2017

Kroll2017 commented Sep 25, 2017

Yesterday I was able to use mass storage with Netusbee, on my falcon with the CT63 and CTPCI too,
I have Mint 1-18-realese, and 1-19 build from 24.09, after running from the USB directory loader.prg, gets information

pid 39 (loader): invalid chip ID 0400

I have also found versions that are no longer developed 1-18-1 from 8.02.2016 and then I can read the directory mass storage without any trouble.

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Sep 25, 2017

I don't think the USB code has ever worked for NetUSBee.

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 25, 2017

It is true, I made screenshot under 1-18-1, as I wrote i my first post,
Drive q:\ is directory for mass storage
snap_001

The systyem is Atari Falcon 060, Radeon card

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Sep 25, 2017

Well that's good news if it's now working. Maybe because of other fixes to interrupt handling.

So what is the problem with ...

pid 39 (loader): invalid chip ID 0400

?

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented Sep 26, 2017

The driver did work but not reliable, transfers were getting stuck randomly, perhaps you could be lucky transferring some files but sooner or later it failed.

The message you're getting now is about the driver failing to read the chip identification register, this is one of the first things that the driver does. Lastest driver's code commits should be checked to see if something related to the reading and writing timings(delays) were changed, I don't remember.

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 26, 2017

Thank you very much for the explanation that it works unfortunately randomly,
I just wanted to write that this is a MiNT version 1-18-1, and in recent builds does not work,
I hope, that in the near future it will improve it not only for the 060 mode

@mikrosk

This comment has been minimized.

Member

mikrosk commented Sep 26, 2017

@Kroll2017 just to be on the safe side, can you try the build from alan's website: http://www.freemint.org/builds/freemint first? (trunk) ... if that doesn't work, let's try the oldest available build: https://github.com/freemint/freemint.github.io/blob/ff1825c33d46e238fb817640f2ca797101b00382/builds/freemint/master/freemint-1-19-f21-020.zip ... and if that doesn't work either, I'll try to prepare a few builds for you.

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 27, 2017

@mikrosk
Today I tested it, and in the photos shows what are the effects,
All tests I've done on Falcon 060 with CTPCI on a Radeon card with netusbee mass storage are connected.

First, I installed the trunk-26 build and it does not work, it behaves just like before (see photo)
photo_02

Then I downloaded freemint-1-19-f21-020.zip, there is a USB directory, I ran loader.prg, it looks like the driver is installed, but no additional drive. Is this the driver for netusbee?

photo_01

There is also freemint-1-19-f21-usb4tos.zip file, I ran the usb.prg and storage.prg files, the drivers were installed but the effect would be as above, no additional drive.

photo_04

Please inform me if something is wrong, I will be test it

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented Sep 27, 2017

The last 2 screenshots show that you were loading the EHCI driver, this is for the Milan, it looks like there is no NetUSBee driver in the directories

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 27, 2017

I think so, but in the freemint-1-19-f21-020.zip in USB directory there is nothing more (see picture) and in the freemint-1-19-f21-usb4tos.zip directory, that it is UNICORN drivers.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Sep 27, 2017

Well, if nothing else, this bug report has revealed that netusbee.ucd was overlooked in the build script. That is now fixed. What I don't follow is how did you test the latest kernel then if there was no netusbee.ucd included?

Unless @DavidGZ has some magic suggestion what to look for, I'll try to find my script which was able to build a kernel for each commit and build something to test for all the commits to the 1.18 release.

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 28, 2017

@mikrosk, may be I dont undesrtand you, but on the previously post where I presented a photo on the first photo where is built from 26 September netusbee.ucd is included,

@mikrosk

This comment has been minimized.

Member

mikrosk commented Sep 28, 2017

In that case I assume you used the build from freemint.org, right? I'll look into this after your networking issue.

@Kroll2017

This comment has been minimized.

Kroll2017 commented Sep 28, 2017

Exactly, Yes, of course you're right, I've always downloaded new builds from freemint.org

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 4, 2017

Thanks to @Kroll2017's heroic patience we have pinpointed not only the offending commit but also a hint for the reason.

Everything had broke, rather surprisingly, in 1adba0e (just changing the default CPU for storage.udd and netusbee.ucd to 68000!).

./loader.prg before this change:
usb_good

./loader.prg after this change:
usb_bad

See the SIGFPEs? Now I'm waiting for confirmation that it's really caused only by this -- storage.udd gets compiled for 060 in newer kernels, so this one is off the list but netusbee.ucd is still 000 by default even in latest kernel.

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 4, 2017

All this needs building for all CPUs. NetUSBee plugs into the ROM port so it can work on plain STs too.

I suspect it's down to mixing of CPU code, whereas the entire build really needs fixing to build for one CPU type.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 4, 2017

Still, it shouldn't be a problem. We don't use floats or anything related, is kernel module somehow depend on the stack frame format?

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 4, 2017

I don't know, but if someone has the time to dig in. 👍

@vinriviere

This comment has been minimized.

Member

vinriviere commented Oct 4, 2017

(just changing the default CPU for storage.udd and netusbee.ucd to 68000!)

Any change which is not tested should be considered buggy by default. Never assume that small changes can't break anything, specifically when related to CPU settings.

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 4, 2017

If changing the CPU setting has caused it then the driver should be fixed, no CPU setting should cause this level of driver bug. Just changing the CPU setting back doesn't cut it.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 11, 2017

Unfortunately, the source history is full of regressions and untested code. For instance, at one point netusbee/ethernat drivers broke because of an innocent-looking removal in 9bffd56, kept broken for one year (!) until 4d39030 (ethernat) and 6517d21 (netusbee).

So while this issue doesn't seem to be related to the NetUSBee at all (everything suggests an XHDI related bug, as, ironically, everything works up until the proper commit 0ef684a), the actual investigation was really painful and misleading (the CPU & endian issues affect directly the NetUSBee driver).

I understand CVS didn't exactly encourage people working on their own branches but with code base like this, git bisect is basically useless. :-/

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 12, 2017

That goes to show, that no one has been interested in NetUSBee or EtherNAT development for a very long time. Things get broken when this level of zero attention is paid to drivers.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 13, 2017

OK, so in the end, this is a combination of two problems:

  1. The CPU issue. Even with 020-60 storage.udd, one still needs to compile netusbee.ucd with 020-60 (or 060).
  2. Yet another regression, quite recent actually: 7017068#diff-a7b18cf4474e621324fbb8245e20bb93R475. While there are two commits which are supposed to fix the regression introduced by 7017068 (a7061c4 and c67d9f1), the former doesn't seem to have any effect. Basically it looks like the stack size is ignored (I tried setting three times the value, i.e. around 270 KB).

Because if I change

unsigned char buffer[65536];
to

static unsigned char buffer[65536];

@Kroll2017 reports all good. This looks like equally deep problem in the kernel (and/or modules handling) as the CPU one.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 16, 2017

Both mysteries solved:

  1. The CPU difference is causing different code path taken in https://github.com/freemint/freemint/blob/master/sys/mint/arch/delay.h#L102 (see the discussion in the ML)

  2. The stack corruption is also clear, _stksize is valid only for the TOS target! So when compiled as a kernel module, the stack is abused above its (supervisor?) limits. My guess would be that this is the reason why @DavidGZ used so many kmalloc() calls. Is the static buffer solution presented above sufficient as a fix?

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 16, 2017

I think in this case we need to do a kmalloc on MINT, and leave as "unsigned char" for TOSONLY.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Oct 16, 2017

Any specific reason for such direction? I mean, static would ensure the buffer goes into heap anyway.

@atic-atac

This comment has been minimized.

Collaborator

atic-atac commented Oct 16, 2017

On second thoughts, yes, static is fine. I'd be concerned about recursiveness, but that would also blow the stack anyway.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented Oct 16, 2017

I didn't remember why I made those memory allocations dynamic, I did a quick search in my mail and I found these threads:

https://mikro.naprvyraz.sk/mint/201410/msg00058.html

https://mikro.naprvyraz.sk/mint/201411/msg00029.html

By the way, why do mallocs cause recursive problems in TOS? If I understand it correctly every time the function is entered and a malloc is called we have a new position in the heap memory.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Feb 19, 2018

@th-otto I meant regarding the underscores, not the symbol itself. :)

I've just seen that when I typed M680x0 with underscores github's editors remove them when comment is posted

@th-otto

This comment has been minimized.

Contributor

th-otto commented Feb 19, 2018

BTW i've found the reason now, it was caused by commit gcc-mirror/gcc@17e4f17, which maps all the -m680* switches to the corresponding -mcpu= switch. Will be fixed soon (i'm currently updating the archives to 7.3.0)

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 2, 2018

I must reopen this issue. I've now the hardware to test this myself and it doesn't work for me.
Even I've checkout the sources for the commit that supposedly fixed the driver and the device it's not detected. So this driver continues being very unreliable, it's my personal nightmare ;-).

The new hope it's that the TOS version of the driver seems rock solid at least on the 030, on the 060 I've found some minor issues, usb.acc used for handling hot-plugging not always detects the device being attached and partitioning with hddrutil.app (HDDRIVER) "stalls" the device and freezes the USB controller in the NetUSBee completely, but transferring files works very good.

The TOS version shares most of the code with the MiNT version I hope this can give us some hints.
Using the TOS binaries with MiNT, loading them before or after the kernel don't work either, they show the same symptoms than the MiNT modules, this could be another important hint.

@DavidGZ DavidGZ reopened this May 2, 2018

@Perdrix24

This comment has been minimized.

Contributor

Perdrix24 commented May 3, 2018

Please, don't get too frustrated this time ;-)

For the USB.ACC, I see that it polls every second. I'll try half a second, like you are doing for MiNT polling, because I too have issues with 060, it works better without the usb.acc. Testers report mass storage works solid on ST, Falcon and TT. The mouse is crashing, I hope someone can determine why.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 3, 2018

Please, don't get too frustrated this time ;-)

I'm..... ALREADY ;-)

For the USB.ACC, I see that it polls every second. I'll try half a second, like you are doing for MiNT
polling, because I too have issues with 060, it works better without the usb.acc.

I tried that some days ago and it felt that didn't improve the plug/unplug detection with the 060, this led me to think that the problem was in the driver and not the accessory, also because the issue with HDDRIVER I wrote above. Today I tried with 200 ms and may be improved but I'm not really sure, from time to time the detection fails perhaps around 2% or 3% of the times. Attached the accessory that sets the timer to 200ms if a 060 is detected (it rings the system bell if it's done). @Perdrix24, Could you send it to your 060 beta tester? May be he has a better perception whether it's working better. (Or do you have a CT60 yourself?)
usb_acc.zip

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 3, 2018

About the hddrutil.app issue, this happen 100% of the times for me with the 060, so it can be perfectly replicated. I'm testing with hddrutil in hddriver version 8.46. Select the "Partition" option in the "Harddisk" menu entry. hddrutil will complain that it can't establish the device size, after that you can't use any more the device.

@Perdrix24

This comment has been minimized.

Contributor

Perdrix24 commented May 4, 2018

@DavidGZ

Could you send it to your 060 beta tester? May be he has a better perception whether it's working better. (Or do you have a CT60 yourself?)
I have a CT63 with CTPCI/Radeon, I can test it but will also send it to Kroll, my beta tester.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 8, 2018

@Kroll2017 I have a new driver for you to test (attached). Do you have a rev 6 68060 CPU? Because I'd like to see how the drivers behave at high clock frequencies. For me at 66 Mhz is working very reliable now I'm not getting a failed transfer since days when I did the latest important changes in the code. In 030 mode it's another story, I can't make it work with reliability under MiNT, the TOS driver works nicely with the 030.

The issue with hddrutil went away more or less when I upgraded to the latest HDDRIVER version, I say more or less because it still does strange things when trying to partition the USB device, sometimes I must try several times for hddrutil to read the device capacity and be able to enter the partition dialog, but at least it doesn't hang the NetUSBee USB function as with the other HDDRIVER version.
netusbee.zip

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 8, 2018

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 8, 2018

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 9, 2018

@Kroll2017, thanks for the testing. I removed the delays for reading/writing the host controller registers, this works fine here at 66 MHz, but I guess it's not right for 95 MHz. @Kroll2017, to confirm that this is the problem could you please set your CT60's clock at 66MHz and try the driver again?

In any case I'm going to prepare another driver with delays for accessing the HC registers.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 10, 2018

Hi @Kroll2017, could you please try these drivers to see if any of them work at 95 MHz?
Still I'd like to know if the previous driver from 2 days ago works for you at 66 MHz.

Thanks!

netusbee_v1.zip
netusbee_v2.zip
netusbee_v3.zip
netusbee_v4.zip
netusbee_v5.zip

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 10, 2018

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 11, 2018

is working OK. The next test please wait, I have some problem with SD card ( I have just lost all partition :(, I dont know why) in my falcon CT63.

I'm sorry to read that. I hope that you have a backup of your data.
Do you think it has anything to do with the driver testing?
Don't worry about the other tests there is no hurry.

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 11, 2018

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 15, 2018

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 15, 2018

Hi Kroll, thanks a lot for the report.

I managed to recover most of the data on my SD card, unfortunately I still do not know what caused the loss of all partitions. I installed

I've done many failed tests with my Falcon and never had a hard disk data loss. What I've had with a failed transfer is a space loss in the partition where the file was being copied. I recover the space lost running fsck.msdos on that partition.

The size of the netusbee.ucd file in the netusbee_1 and netusbee_4 archives is the same.

This is expected, anyway thanks for telling.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 22, 2018

Yesterday I pushed the code that makes the driver reliable in 060 mode.
I leave the issue open as the 030 version needs to be still improved.

@Kroll2017

This comment has been minimized.

Kroll2017 commented May 22, 2018

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 23, 2018

Hi Kroll, thanks for the feedback.

I've taken a quick look and it seems trunk raw build (old style) is taking the modules compiled for the 68000, while the other package has the module compiled for 020-60. This explains the difference in size.
But in any case both should work but it doesn't surprise me that one of them don't. Now I don't have time but I'll explain you later the reasons why they don't.

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 24, 2018

As promised here is an explanation of what we have to deal with :-)

  1. The USB host controller chip in the NetUSBee is a little bit picky about the timings between accesses to his registers

  2. this chip is behind an interface (rom-port) that is also picky about the timings when data is read/written through it.

  3. The NetUSBee driver should work with CPUs from the 68000 to the 68060, and we are trying to only have one binary for all CPUs (at least for TOS).

  4. also the same source code files supports three operating systems (TOS, MiNT and MagiC). It's puzzling how good the same code works in 030 mode for TOS and not MiNT.

  5. How the compiler arrange the instructions around the register's accesses routines influence the timings. The C code to read and write the registers are a mess (this is my fault ;-)), some small changes makes the compiler to rearrange the instructions and modify the timings. This could be why the 68000 code doesn't work and it does the 020-60. We could write the routines in assembler, but I think a small clean up in the C code would be enough to make the compiler to make it better, I've already done it but now the timings must be readjusted again.

I'd like to work on something else for a while. Keep in mind that when the code doesn't work, it doesn't fail immediately. I transfer one 100 MB file for testing and sometimes it fails in the second or third attempt. The transfer rate with a Falcon in 030 mode is around 100 kb/s, so sometimes it takes a while to see that your code doesn't work.

@th-otto

This comment has been minimized.

Contributor

th-otto commented May 24, 2018

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented May 25, 2018

Thanks a lot for the advices

It also helps to check the generated code with different compiler versions,

About this I have some question but I'm going to do it on the list to don't go off topic here.

@mikrosk mikrosk referenced this issue Jun 11, 2018

Open

Freemint 1.19 #85

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented Jul 21, 2018

I have cleaned the routines to access the USB registers for the NetUSBee, and now FINALLY the MiNT driver works too on the 68030. Before I push these commits I'm waiting for some people to test on other systems to make sure there is no regressions on them. I hope that these changes also help to make work the TOS driver that wasn't running reliable on machines with 030 faster than a standard Falcon.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Nov 8, 2018

@DavidGZ has the testing mentioned above been done?

@DavidGZ

This comment has been minimized.

Member

DavidGZ commented Nov 8, 2018

Yes, the tests were done and all the code was pushed. There are still some issues with some TTs but according to some people there are some hardware known problems with the cartridge port in some TTs. Also it was reported in atari-forum that the driver doesn't work in some exotic hardware (MegaST+PAK/3), regarding this I'm out of ideas, I don't know what the problem could be.

@mikrosk

This comment has been minimized.

Member

mikrosk commented Nov 8, 2018

Unless @Kroll2017 (the original reporter) has something new, closing.

@mikrosk mikrosk closed this Nov 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment