Permalink
32 comments
on commit
sign in to comment.
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
usb/uas: blacklist a some Seagate USB3.0 disks with broken UAS
Patch found here: https://github.com/armbian/build/blob/06f056ae9581471c1c1760c5b2096446abc9c125/patch/kernel/odroidxu4-next/uas-blacklist-more-seagate-enclosures.patch Written by ThomasKaiser Change-Id: I8170983030bc8c912ed996560dfb7a21b2e6a39c
- Loading branch information
This comment has been minimized.
This patch will not do what it claims. It only enables a "quirk". It needs "US_FL_IGNORE_UAS" instead of "US_FL_NO_ATA_1X".
This comment has been minimized.
Hello @OtherCrashOverride You are right!
Should I change to ignore uas or leave as is ?
This comment has been minimized.
Changing it should be enough for now until we determine a long term strategy for UAS in general. I will make the change, test it, and submit a pull request in a few minutes.
This comment has been minimized.
Ok, Thank you @OtherCrashOverride
This comment has been minimized.
This patch will need to be reverted before my pull request can be merged. I created a patch for the drive I have and could test. The other drive in this patch will need its own patch, but I do not know what its SCSI identification name is (on my drive its different from the product name).
This comment has been minimized.
Pull request is here: #291
This comment has been minimized.
@OtherCrashOverride
I have the other drive I think
0x0bc2:0x3321
here is a full lsusb -v for it
https://pastebin.com/1bdjhjq8
Using this works (when module): options usb-storage quirks=0x0bc2:0x3321:u
so I assume that it is the correct SCSI ID
Let me know if you need me to run any other commands to get what you need for the 2nd patch :)
This comment has been minimized.
Just post what it says in dmesg log. For example:
[ 6.756279] scsi 0:0:0:0: Direct-Access Seagate Expansion 9300 PQ: 0 ANSI: 6I will write the patch if nobody else has.
The string is only important for those reporting issues. The driver doesn't care. Its for us to match against when an issue is reported.
This comment has been minimized.
So while the rest of the linux world has identified the issue with Seagate's broken ASM1153 firmware and applies the respective quirk with ODROID-XU4 it's necessary to completely disable UAS?
How much testing has been done? Has anyone of you ever looked through commit history of the file it's about and searched for the string 'ATA_1'?
This comment has been minimized.
I have an alternative proposal presented here:
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=100#p188661
This comment has been minimized.
scsi 1:0:0:0: Direct-Access Seagate Expansion Desk 0604 PQ: 0 ANSI: 6is the output for my drives :)
Does the US_FL_NO_ATA_1X quirk not fix the UAS issue you have @OtherCrashOverride ?
Should I try it and see if it fixes my issue?
I have USB storage as module in my current kernel, so I think I can test with:
options usb-storage quirks=0x0bc2:0x3321:t
(note t instead of u)
Looks like t is the US_FL_NO_ATA_1X flag.
And u is the US_FL_IGNORE_UAS flag.
(https://en.opensuse.org/SDB:USB_3.0_Hard_Drive_troubleshooting)
I'm trying that now.
Drives now showing using UAS.
I couldn't see any dmesg related to the quirk (not sure if there should be)
This comment has been minimized.
@OtherCrashOverride that's completetly disabling UAS. Why do you think those Seagate devices you and @matthuisman own are so different compared to all other Seagates? Why do you think UAS needs to be totally disabled instead of adding quirks to deal correctly with their broken firmware (NO_REPORT_LUNS maybe also needed)?
This comment has been minimized.
@matthuisman I've a couple of JMS567 (USB part identical as JMS561 used in Cloudshell 2) that require the
US_FL_BROKEN_FUAandUS_FL_NO_REPORT_OPCODESquirks which aren't reported viadmesg. It's great that at least you follow the usual/social behaviour trying to help identifying necessary quirks with testing so an appropriate fix can be sent upstream to Linux kernel USB maintainers.This comment has been minimized.
At the end of the day, if we can get UAS working - then surely that should be the goal?
It seems to be the new standard and therefore I assume is going to get the most support & updates in the kernel?
Is there any test I can run to check if the quirk is working?
The old fault used to just happen randomly (usually after an hour or two of playing a video).
I guess if it doesn't happen within the next few days - I'd call it a fix for me.
However, I'd prefer a more solid method for confirming it's fixed.
Also, can you set multiple quirks?
eg. t (NO_ATA) and j (no report luns)? (maybe a comma looking at source code)
options usb-storage quirks=0x0bc2:0x3321:t,j
This comment has been minimized.
Huh? There's nothing you need to 'get UAS working' -- UAS simply works. Unfortunately there are a few devices out there with firmware flaws that require quirks (as we know this applies to most if not all Seagate disk enclosures that need individual handling due to unfortunate vendor behaviour using many different product IDs for one and the same firmware+ASM1153 combination).
The problems ODROID XU4 users are facing are:
ddespecially with large blocksizes for example is a great way to fool yourself if your real use case is dealing with a lot of small files. Sequential disk performance with large blocksizes does not reflect situation with random IO and many small files -- better use fio or iozone if you're interested in real-world performance -- UAS or not then makes a difference even with HDDs)Instead of 'protecting' the ODROID XU4 community from UAS it would be way better to live social behaviour, get in touch with USB kernel maintainers and get those 2 other broken Seagate devices you own handled correctly upstream.
This comment has been minimized.
Oh man, your pretty sensitive.
I meant to say "if we can get UAS working on these particular devices - then that should be the goal"
I prefer quirk vs disabling - as to me that seems the correct way forward (due to UAS obviously being more supported / developed in the future),
The "micro community" is irrelevant.
I could easily have found the same issue using my X86_64.
Don't blame Odroid for having an excellent SBC that people actually want to use for NAS with their HDD's.
What makes Linux strong is that patches can come from anywhere.
Not everyone wants to try to use their mailing lists etc.
Some people prefer their "micro communities".
Also, isn't it best to have patches / changes tested in smaller communities before hitting mainline?
They can then get pushed to mainline from these "micro communities" as you put it.
You blindly sent a patch in to mainline for quirks before even testing them or getting any kind of feedback from people who do.
How is that smart?
This comment has been minimized.
@matthuisman The US_FL_NO_ATA_1X flag does not address the problems I have seen with UAS:
https://patchwork.kernel.org/patch/5703251/
If your drive is always unavailable (no /dev/sda), then the flag addresses the issue. Since there are so many differing reports of UAS being broken everywhere on everything, I can only offer that the flag does not apply to my specific drive and issue. Since your drive is operational, we can infer that it also does not apply to that drive.
This comment has been minimized.
Oh right... s
My drive is present with and without that flag meaning US_FL_NO_ATA_1X is not a fix for 0x0bc2:0x3321.
So, I'm probably best to actually test using the NO_REPORT_LUNS flag,
as that's the only other unusual flag for Seagates (so far)
This comment has been minimized.
OK... that flag does CRAZY things (do not use!)
It keeps adding the drive over and over and over.
So, if US_FL_NO_ATA_1X is not the issue (as my drive does show up)
and NO_REPORT_LUNS is not the issue (completely breaks)...
then what other UAS option / quire (apart from disabling) do we have?
This comment has been minimized.
A SCSI LUN identifies a logical unit of a device. If the drive is present (/dev/sda), this is also unlikely to be an issue.
https://en.wikipedia.org/wiki/Logical_unit_number
This comment has been minimized.
Maybe I should start trying the only other two in that file:
US_FL_NO_REPORT_OPCODES ("f" flag)
US_FL_BROKEN_FUA (can't see a flag for this?)
or combination of both
Do you have a test / stress test that can force an error?
(not for testing speed but for testing for errors)
This comment has been minimized.
linux/Documentation/kernel-parameters.txt
Lines 4193 to 4243 in 050bc4e
This comment has been minimized.
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246
Since UAS is broken differently on different devices, the test may or may not affect your drive.
This comment has been minimized.
I have also observed that copying a large data file to the drive (60GB+) will randomly fault on occasion.
This comment has been minimized.
Hi,
If the US_FL_NO_ATA_1X quirk does not help and adding a "u" flag does help then chances are USB-3 bulk-streams are broken on the USB controller on the odroidxu4.
Given that I am the (ex) maintainer of the uas kernel driver and that I have seen 0 bug-reports on issues with the seagate enclosures in question when used with regular PCs that seems quite likely.
Other possible causes are a bad usb cable and/or power-supply issues. Are these USB disk enclosures powered through the USB-bus? If so that is a likely cause of the problem, chances are the odroidxu4 simply cannot deliver enough reliable power through its USB connector. You could try using a power USB-3 hub in between.
And before people dismiss my bad cable / power-issues argument with a "but it works with usb-storage" counter argument, please keep in mind that UAS is a much MUCH more efficient protocol, the amount of io-requests the disk is serving per second with usb-storage is easily 10 times less then it can serve through the uas driver. And this much higher io-load has a tendency to expose weak cables / power-supplies. I've already helped dozens of users solve UAS issues by replacing a bad cable or fixing power-supply issues.
Talking about power supply issues I remember recently helping an user with a Seagate enclosure (IIRC) which did have an external power-supply but the included original power-supply was too weak to handle heavy io-loads. So if you're enclosure does have an externel power-supply try replacing that.
If non of this helps then I believe the correct fix is the disable USB-3 bulk streams on the odroidxu4 is the right fix, disabling uas on these enclosures all together is not the right fix, as said there are 0 bug reports from regular PC-users with these. To do this you will need a patch similar to this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/host/xhci-pci.c?id=d95815ba6a0f287213118c136e64d8c56daeaeab
Regards,
Hans
p.s.
About setting multiple-quirks on a single device, simply combine the letters, e.g. : options usb-storage quirks=0x0bc2:0x3321:tj
About the US_FL_NO_ATA_1X only being necessary if the device does not show up at all, that is not entirely true, things like smartd can cause issues after probe too. What would be useful to diagnose this is "dmesg" output after the problem happened this will show the troublesome scsi commands when things went south and the ATA_1X pass through command is easily recognized.
This comment has been minimized.
@jwrdegoede , we already eliminated the XU4 as the cause since the issues are reproducible on Intel x86-64 equipment.
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246
Also, https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=100#p188580
and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1584557
Despite being USB powered, the drive in question also functions normally in Windows and MacOS environments.
This comment has been minimized.
O well, I also overread "As the HDD is USB powered" in your forum posts completely. :(
So let's add N°5 to the list of 'ODROID XU4 UAS' problems:
@matthuisman are you able to test in a reasonable manner (ruling out potential powering problems and providing
dmesgoutput as Hans suggested)?This comment has been minimized.
We have not identified any Odroid XU4 UAS problems. Only Linux UAS problems have been reported. The issues reported affect all USB3 UAS capable platforms.
This comment has been minimized.
@ThomasKaiser
I have 3x of the same drive (0x0bc2:0x3321)
All powered from mains (not USB powered)
They are running via a USB3.0 HUB, but I can connect directly to USB if required.
They are all EXT4 and I am also using AUFS (but cant test directly to drives not AUFS mount)
I am using ARCH and am familiar with compiling kernel (I build my own with AUFS and USB as module).
I am happy to compile new kernels with patches to test etc.
Here is my dmesg output:
https://forum.odroid.com/viewtopic.php?f=146&t=26016#p186301
I suspect I may have cut some off, so may need to try to get it to fault again.
Also, when you google the "ERROR Transfer event for disabled endpoint or incorrect stream ring" error, there are quite a few posts around the internet about it (and these are not XU4 users)
https://bbs.archlinux.org/viewtopic.php?id=192850
https://answers.launchpad.net/ubuntu/+question/404094
https://askubuntu.com/questions/50866/external-usb-3-0-hard-drive-is-not-recognised-when-plugged-into-usb-3-port
https://bugzilla.kernel.org/show_bug.cgi?id=189631
This seems like exact same issue happening on a RPI with Seagate drives:
raspberrypi/linux#1287
This comment has been minimized.
Anyone have a good test not using dd as I have data on my drives and don't want to wipe?
Looks like fio just writes a file so won't wipe my data
Very slow - but I guess this is due to 4k byte size?
Updated to --bs=5M
Run status group 0 (all jobs): READ: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=4095MiB (4294MB), run=40451-40451msecRun status group 0 (all jobs): WRITE: bw=99.2MiB/s (104MB/s), 99.2MiB/s-99.2MiB/s (104MB/s-104MB/s), io=4095MiB (4294MB), run=41288-41288msecStill haven't managed to get UAS to fail.... hmmmm
Not using any quirks (just trying to find a way to make it error)
I have it as a module (this wouldn't for some reason fix our issue I assume?)
UPDATE
OK, I got an error when I used fio on my AUFS virtual directory
I had to change direct=0 (due to destination does not support O_DIRECT)
Oooo, tried the excact same fio test (direct=0) to the drive direct (not via AUFS) and managed to get an error on the 3rd run
Perfect, so this is the test I can (a few times) to test for a fix
fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=4G --readwrite=randreadStill get same error with
options usb-storage quirks=0x0bc2:0x3321:f (US_FL_NO_REPORT_OPCODES)
Now trying 0x0bc2:0x3321:t (US_FL_NO_ATA_1X) ...
On 6th run without error.... Looking good
This comment has been minimized.
OK, I have done a lot (15+) of those 4G tests without any errors.
Then I did a "big boy" 20g R/W test to make extra sure
fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=20G --readwrite=randrwNo errors in dmesg. WOOHOO
WAIT!!!!!!
So, went back to a small 4g read test and the error just happened!! :(
BUGGER!
This comment has been minimized.
This one mentions the same potential fix Hans already suggested:
xhci->quirks |= XHCI_BROKEN_STREAMS;(since you're knowledgeable seems like a good idea to try this out)As for the tests: 4K blocksize will be slow of course especially with HDDs but is maybe more sufficient to trigger errors. I'll get my XU4 back later this week and will then run a series of tests with an SSD on both an ASM1153 (with original and not 'branded' Seagate firmware) and a JMS567. Unfortunately we are not able to test on XU4 without an USB hub in between host controller and USB-to-SATA bridge.
Does anyone of you know whether Hardkernel guys might have dev samples without the GL3521 so that the USB port is directly accessible?