Permalink
Browse files

usb/uas: blacklist a some Seagate USB3.0 disks with broken UAS

  • Loading branch information...
mdrjr committed May 1, 2017
1 parent a08d2e3 commit c18781b1ef56a800572e8342488504e4e818013a
Showing with 15 additions and 0 deletions.
  1. +15 −0 drivers/usb/storage/unusual_uas.h
@@ -58,6 +58,14 @@ UNUSUAL_DEV(0x0bc2, 0x2312, 0x0000, 0x9999,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
/* https://forum.odroid.com/viewtopic.php?f=146&t=26016 */
UNUSUAL_DEV(0x0bc2, 0x2322, 0x0000, 0x9999,
"Seagate",
"Expansion",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
/* https://bbs.archlinux.org/viewtopic.php?id=183190 */
UNUSUAL_DEV(0x0bc2, 0x3312, 0x0000, 0x9999,
"Seagate",
@@ -79,6 +87,13 @@ UNUSUAL_DEV(0x0bc2, 0x3320, 0x0000, 0x9999,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
/* https://forum.odroid.com/viewtopic.php?f=146&t=26016 */
UNUSUAL_DEV(0x0bc2, 0x3321, 0x0000, 0x9999,
"Seagate",
"Expansion",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
/* Reported-by: Bogdan Mihalcea <bogdan.mihalcea@infim.ro> */
UNUSUAL_DEV(0x0bc2, 0xa003, 0x0000, 0x9999,
"Seagate",

32 comments on commit c18781b

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 2, 2017

This patch will not do what it claims. It only enables a "quirk". It needs "US_FL_IGNORE_UAS" instead of "US_FL_NO_ATA_1X".

OtherCrashOverride replied May 2, 2017

This patch will not do what it claims. It only enables a "quirk". It needs "US_FL_IGNORE_UAS" instead of "US_FL_NO_ATA_1X".

@mdrjr

This comment has been minimized.

Show comment
Hide comment
@mdrjr

mdrjr May 2, 2017

Collaborator

Hello @OtherCrashOverride You are right!
Should I change to ignore uas or leave as is ?

Collaborator

mdrjr replied May 2, 2017

Hello @OtherCrashOverride You are right!
Should I change to ignore uas or leave as is ?

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 2, 2017

Changing it should be enough for now until we determine a long term strategy for UAS in general. I will make the change, test it, and submit a pull request in a few minutes.

OtherCrashOverride replied May 2, 2017

Changing it should be enough for now until we determine a long term strategy for UAS in general. I will make the change, test it, and submit a pull request in a few minutes.

@mdrjr

This comment has been minimized.

Show comment
Hide comment
@mdrjr

mdrjr May 2, 2017

Collaborator

Ok, Thank you @OtherCrashOverride

Collaborator

mdrjr replied May 2, 2017

Ok, Thank you @OtherCrashOverride

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 2, 2017

This patch will need to be reverted before my pull request can be merged. I created a patch for the drive I have and could test. The other drive in this patch will need its own patch, but I do not know what its SCSI identification name is (on my drive its different from the product name).

OtherCrashOverride replied May 2, 2017

This patch will need to be reverted before my pull request can be merged. I created a patch for the drive I have and could test. The other drive in this patch will need its own patch, but I do not know what its SCSI identification name is (on my drive its different from the product name).

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 2, 2017

Pull request is here: #291

[    5.752044] usb 4-1.2: UAS is blacklisted for this device, using usb-storage instead
[    5.752062] usb-storage 4-1.2:1.0: USB Mass Storage device detected
[    5.752827] usb-storage 4-1.2:1.0: Quirks match for vid 0bc2 pid 2322: 800000
[    5.753059] scsi host0: usb-storage 4-1.2:1.0
...
[    6.756279] scsi 0:0:0:0: Direct-Access     Seagate  Expansion        9300 PQ: 0 ANSI: 6

OtherCrashOverride replied May 2, 2017

Pull request is here: #291

[    5.752044] usb 4-1.2: UAS is blacklisted for this device, using usb-storage instead
[    5.752062] usb-storage 4-1.2:1.0: USB Mass Storage device detected
[    5.752827] usb-storage 4-1.2:1.0: Quirks match for vid 0bc2 pid 2322: 800000
[    5.753059] scsi host0: usb-storage 4-1.2:1.0
...
[    6.756279] scsi 0:0:0:0: Direct-Access     Seagate  Expansion        9300 PQ: 0 ANSI: 6
@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 2, 2017

@OtherCrashOverride
I have the other drive I think

0x0bc2:0x3321

here is a full lsusb -v for it
https://pastebin.com/1bdjhjq8

Using this works (when module): options usb-storage quirks=0x0bc2:0x3321:u
so I assume that it is the correct SCSI ID

Let me know if you need me to run any other commands to get what you need for the 2nd patch :)

matthuisman replied May 2, 2017

@OtherCrashOverride
I have the other drive I think

0x0bc2:0x3321

here is a full lsusb -v for it
https://pastebin.com/1bdjhjq8

Using this works (when module): options usb-storage quirks=0x0bc2:0x3321:u
so I assume that it is the correct SCSI ID

Let me know if you need me to run any other commands to get what you need for the 2nd patch :)

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 2, 2017

Just post what it says in dmesg log. For example:
[ 6.756279] scsi 0:0:0:0: Direct-Access Seagate Expansion 9300 PQ: 0 ANSI: 6

I will write the patch if nobody else has.

The string is only important for those reporting issues. The driver doesn't care. Its for us to match against when an issue is reported.

OtherCrashOverride replied May 2, 2017

Just post what it says in dmesg log. For example:
[ 6.756279] scsi 0:0:0:0: Direct-Access Seagate Expansion 9300 PQ: 0 ANSI: 6

I will write the patch if nobody else has.

The string is only important for those reporting issues. The driver doesn't care. Its for us to match against when an issue is reported.

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 2, 2017

So while the rest of the linux world has identified the issue with Seagate's broken ASM1153 firmware and applies the respective quirk with ODROID-XU4 it's necessary to completely disable UAS?

How much testing has been done? Has anyone of you ever looked through commit history of the file it's about and searched for the string 'ATA_1'?

ThomasKaiser replied May 2, 2017

So while the rest of the linux world has identified the issue with Seagate's broken ASM1153 firmware and applies the respective quirk with ODROID-XU4 it's necessary to completely disable UAS?

How much testing has been done? Has anyone of you ever looked through commit history of the file it's about and searched for the string 'ATA_1'?

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride replied May 2, 2017

I have an alternative proposal presented here:
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=100#p188661

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

scsi 1:0:0:0: Direct-Access Seagate Expansion Desk 0604 PQ: 0 ANSI: 6

is the output for my drives :)

Does the US_FL_NO_ATA_1X quirk not fix the UAS issue you have @OtherCrashOverride ?
Should I try it and see if it fixes my issue?

I have USB storage as module in my current kernel, so I think I can test with:
options usb-storage quirks=0x0bc2:0x3321:t
(note t instead of u)

Looks like t is the US_FL_NO_ATA_1X flag.
And u is the US_FL_IGNORE_UAS flag.
(https://en.opensuse.org/SDB:USB_3.0_Hard_Drive_troubleshooting)

I'm trying that now.
Drives now showing using UAS.
I couldn't see any dmesg related to the quirk (not sure if there should be)

matthuisman replied May 3, 2017

scsi 1:0:0:0: Direct-Access Seagate Expansion Desk 0604 PQ: 0 ANSI: 6

is the output for my drives :)

Does the US_FL_NO_ATA_1X quirk not fix the UAS issue you have @OtherCrashOverride ?
Should I try it and see if it fixes my issue?

I have USB storage as module in my current kernel, so I think I can test with:
options usb-storage quirks=0x0bc2:0x3321:t
(note t instead of u)

Looks like t is the US_FL_NO_ATA_1X flag.
And u is the US_FL_IGNORE_UAS flag.
(https://en.opensuse.org/SDB:USB_3.0_Hard_Drive_troubleshooting)

I'm trying that now.
Drives now showing using UAS.
I couldn't see any dmesg related to the quirk (not sure if there should be)

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 3, 2017

I have an alternative proposal presented here

@OtherCrashOverride that's completetly disabling UAS. Why do you think those Seagate devices you and @matthuisman own are so different compared to all other Seagates? Why do you think UAS needs to be totally disabled instead of adding quirks to deal correctly with their broken firmware (NO_REPORT_LUNS maybe also needed)?

ThomasKaiser replied May 3, 2017

I have an alternative proposal presented here

@OtherCrashOverride that's completetly disabling UAS. Why do you think those Seagate devices you and @matthuisman own are so different compared to all other Seagates? Why do you think UAS needs to be totally disabled instead of adding quirks to deal correctly with their broken firmware (NO_REPORT_LUNS maybe also needed)?

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 3, 2017

I couldn't see any dmesg related to the quirk (not sure if there should be

@matthuisman I've a couple of JMS567 (USB part identical as JMS561 used in Cloudshell 2) that require the US_FL_BROKEN_FUA and US_FL_NO_REPORT_OPCODES quirks which aren't reported via dmesg. It's great that at least you follow the usual/social behaviour trying to help identifying necessary quirks with testing so an appropriate fix can be sent upstream to Linux kernel USB maintainers.

ThomasKaiser replied May 3, 2017

I couldn't see any dmesg related to the quirk (not sure if there should be

@matthuisman I've a couple of JMS567 (USB part identical as JMS561 used in Cloudshell 2) that require the US_FL_BROKEN_FUA and US_FL_NO_REPORT_OPCODES quirks which aren't reported via dmesg. It's great that at least you follow the usual/social behaviour trying to help identifying necessary quirks with testing so an appropriate fix can be sent upstream to Linux kernel USB maintainers.

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

At the end of the day, if we can get UAS working - then surely that should be the goal?

It seems to be the new standard and therefore I assume is going to get the most support & updates in the kernel?

Is there any test I can run to check if the quirk is working?
The old fault used to just happen randomly (usually after an hour or two of playing a video).
I guess if it doesn't happen within the next few days - I'd call it a fix for me.

However, I'd prefer a more solid method for confirming it's fixed.

Also, can you set multiple quirks?
eg. t (NO_ATA) and j (no report luns)? (maybe a comma looking at source code)
options usb-storage quirks=0x0bc2:0x3321:t,j

matthuisman replied May 3, 2017

At the end of the day, if we can get UAS working - then surely that should be the goal?

It seems to be the new standard and therefore I assume is going to get the most support & updates in the kernel?

Is there any test I can run to check if the quirk is working?
The old fault used to just happen randomly (usually after an hour or two of playing a video).
I guess if it doesn't happen within the next few days - I'd call it a fix for me.

However, I'd prefer a more solid method for confirming it's fixed.

Also, can you set multiple quirks?
eg. t (NO_ATA) and j (no report luns)? (maybe a comma looking at source code)
options usb-storage quirks=0x0bc2:0x3321:t,j

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 3, 2017

At the end of the day, if we can get UAS working - then surely that should be the goal?

Huh? There's nothing you need to 'get UAS working' -- UAS simply works. Unfortunately there are a few devices out there with firmware flaws that require quirks (as we know this applies to most if not all Seagate disk enclosures that need individual handling due to unfortunate vendor behaviour using many different product IDs for one and the same firmware+ASM1153 combination).

The problems ODROID XU4 users are facing are:

  • being trapped in a micro community where UAS is something new since a few months while the rest of the world uses it since years
  • the internal USB hub adding to complexity (see this great example where most probably after a reset only the USB2 part of the hub was up and running and SuperSpeed data lines were cut)
  • average user not aware of cabling/contact problems that now also count as 'UAS problems'
  • lacking knowledge how to measure the impact of storage protocols (using dd especially with large blocksizes for example is a great way to fool yourself if your real use case is dealing with a lot of small files. Sequential disk performance with large blocksizes does not reflect situation with random IO and many small files -- better use fio or iozone if you're interested in real-world performance -- UAS or not then makes a difference even with HDDs)

Instead of 'protecting' the ODROID XU4 community from UAS it would be way better to live social behaviour, get in touch with USB kernel maintainers and get those 2 other broken Seagate devices you own handled correctly upstream.

ThomasKaiser replied May 3, 2017

At the end of the day, if we can get UAS working - then surely that should be the goal?

Huh? There's nothing you need to 'get UAS working' -- UAS simply works. Unfortunately there are a few devices out there with firmware flaws that require quirks (as we know this applies to most if not all Seagate disk enclosures that need individual handling due to unfortunate vendor behaviour using many different product IDs for one and the same firmware+ASM1153 combination).

The problems ODROID XU4 users are facing are:

  • being trapped in a micro community where UAS is something new since a few months while the rest of the world uses it since years
  • the internal USB hub adding to complexity (see this great example where most probably after a reset only the USB2 part of the hub was up and running and SuperSpeed data lines were cut)
  • average user not aware of cabling/contact problems that now also count as 'UAS problems'
  • lacking knowledge how to measure the impact of storage protocols (using dd especially with large blocksizes for example is a great way to fool yourself if your real use case is dealing with a lot of small files. Sequential disk performance with large blocksizes does not reflect situation with random IO and many small files -- better use fio or iozone if you're interested in real-world performance -- UAS or not then makes a difference even with HDDs)

Instead of 'protecting' the ODROID XU4 community from UAS it would be way better to live social behaviour, get in touch with USB kernel maintainers and get those 2 other broken Seagate devices you own handled correctly upstream.

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

Oh man, your pretty sensitive.

I meant to say "if we can get UAS working on these particular devices - then that should be the goal"

I prefer quirk vs disabling - as to me that seems the correct way forward (due to UAS obviously being more supported / developed in the future),

The "micro community" is irrelevant.
I could easily have found the same issue using my X86_64.
Don't blame Odroid for having an excellent SBC that people actually want to use for NAS with their HDD's.

What makes Linux strong is that patches can come from anywhere.
Not everyone wants to try to use their mailing lists etc.
Some people prefer their "micro communities".

Also, isn't it best to have patches / changes tested in smaller communities before hitting mainline?
They can then get pushed to mainline from these "micro communities" as you put it.

You blindly sent a patch in to mainline for quirks before even testing them or getting any kind of feedback from people who do.

How is that smart?

matthuisman replied May 3, 2017

Oh man, your pretty sensitive.

I meant to say "if we can get UAS working on these particular devices - then that should be the goal"

I prefer quirk vs disabling - as to me that seems the correct way forward (due to UAS obviously being more supported / developed in the future),

The "micro community" is irrelevant.
I could easily have found the same issue using my X86_64.
Don't blame Odroid for having an excellent SBC that people actually want to use for NAS with their HDD's.

What makes Linux strong is that patches can come from anywhere.
Not everyone wants to try to use their mailing lists etc.
Some people prefer their "micro communities".

Also, isn't it best to have patches / changes tested in smaller communities before hitting mainline?
They can then get pushed to mainline from these "micro communities" as you put it.

You blindly sent a patch in to mainline for quirks before even testing them or getting any kind of feedback from people who do.

How is that smart?

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

@matthuisman The US_FL_NO_ATA_1X flag does not address the problems I have seen with UAS:
https://patchwork.kernel.org/patch/5703251/

these need the
US_FL_NO_ATA_1X to not crash when udev probes them.

If your drive is always unavailable (no /dev/sda), then the flag addresses the issue. Since there are so many differing reports of UAS being broken everywhere on everything, I can only offer that the flag does not apply to my specific drive and issue. Since your drive is operational, we can infer that it also does not apply to that drive.

OtherCrashOverride replied May 3, 2017

@matthuisman The US_FL_NO_ATA_1X flag does not address the problems I have seen with UAS:
https://patchwork.kernel.org/patch/5703251/

these need the
US_FL_NO_ATA_1X to not crash when udev probes them.

If your drive is always unavailable (no /dev/sda), then the flag addresses the issue. Since there are so many differing reports of UAS being broken everywhere on everything, I can only offer that the flag does not apply to my specific drive and issue. Since your drive is operational, we can infer that it also does not apply to that drive.

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

Oh right... s
My drive is present with and without that flag meaning US_FL_NO_ATA_1X is not a fix for 0x0bc2:0x3321.

So, I'm probably best to actually test using the NO_REPORT_LUNS flag,
as that's the only other unusual flag for Seagates (so far)

matthuisman replied May 3, 2017

Oh right... s
My drive is present with and without that flag meaning US_FL_NO_ATA_1X is not a fix for 0x0bc2:0x3321.

So, I'm probably best to actually test using the NO_REPORT_LUNS flag,
as that's the only other unusual flag for Seagates (so far)

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

OK... that flag does CRAZY things (do not use!)
It keeps adding the drive over and over and over.

So, if US_FL_NO_ATA_1X is not the issue (as my drive does show up)
and NO_REPORT_LUNS is not the issue (completely breaks)...

then what other UAS option / quire (apart from disabling) do we have?

matthuisman replied May 3, 2017

OK... that flag does CRAZY things (do not use!)
It keeps adding the drive over and over and over.

So, if US_FL_NO_ATA_1X is not the issue (as my drive does show up)
and NO_REPORT_LUNS is not the issue (completely breaks)...

then what other UAS option / quire (apart from disabling) do we have?

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

A SCSI LUN identifies a logical unit of a device. If the drive is present (/dev/sda), this is also unlikely to be an issue.
https://en.wikipedia.org/wiki/Logical_unit_number

OtherCrashOverride replied May 3, 2017

A SCSI LUN identifies a logical unit of a device. If the drive is present (/dev/sda), this is also unlikely to be an issue.
https://en.wikipedia.org/wiki/Logical_unit_number

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

Maybe I should start trying the only other two in that file:

US_FL_NO_REPORT_OPCODES ("f" flag)
US_FL_BROKEN_FUA (can't see a flag for this?)
or combination of both

Do you have a test / stress test that can force an error?
(not for testing speed but for testing for errors)

matthuisman replied May 3, 2017

Maybe I should start trying the only other two in that file:

US_FL_NO_REPORT_OPCODES ("f" flag)
US_FL_BROKEN_FUA (can't see a flag for this?)
or combination of both

Do you have a test / stress test that can force an error?
(not for testing speed but for testing for errors)

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

then what other UAS option / quire (apart from disabling) do we have?

usb-storage.quirks=
[UMS] A list of quirks entries to supplement or
override the built-in unusual_devs list. List
entries are separated by commas. Each entry has
the form VID:PID:Flags where VID and PID are Vendor
and Product ID values (4-digit hex numbers) and
Flags is a set of characters, each corresponding
to a common usb-storage quirk flag as follows:
a = SANE_SENSE (collect more than 18 bytes
of sense data);
b = BAD_SENSE (don't collect more than 18
bytes of sense data);
c = FIX_CAPACITY (decrease the reported
device capacity by one sector);
d = NO_READ_DISC_INFO (don't use
READ_DISC_INFO command);
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
f = NO_REPORT_OPCODES (don't use report opcodes
command, uas only);
g = MAX_SECTORS_240 (don't transfer more than
240 sectors at a time, uas only);
h = CAPACITY_HEURISTICS (decrease the
reported device capacity by one
sector if the number is odd);
i = IGNORE_DEVICE (don't bind to this
device);
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
l = NOT_LOCKABLE (don't try to lock and
unlock ejectable media);
m = MAX_SECTORS_64 (don't transfer more
than 64 sectors = 32 KB at a time);
n = INITIAL_READ10 (force a retry of the
initial READ(10) command);
o = CAPACITY_OK (accept the capacity
reported by the device);
p = WRITE_CACHE (the device cache is ON
by default);
r = IGNORE_RESIDUE (the device reports
bogus residue values);
s = SINGLE_LUN (the device has only one
Logical Unit);
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
commands, uas only);
u = IGNORE_UAS (don't bind to the uas driver);
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
even if the device claims no cache)
Example: quirks=0419:aaf5:rl,0421:0433:rc

OtherCrashOverride replied May 3, 2017

then what other UAS option / quire (apart from disabling) do we have?

usb-storage.quirks=
[UMS] A list of quirks entries to supplement or
override the built-in unusual_devs list. List
entries are separated by commas. Each entry has
the form VID:PID:Flags where VID and PID are Vendor
and Product ID values (4-digit hex numbers) and
Flags is a set of characters, each corresponding
to a common usb-storage quirk flag as follows:
a = SANE_SENSE (collect more than 18 bytes
of sense data);
b = BAD_SENSE (don't collect more than 18
bytes of sense data);
c = FIX_CAPACITY (decrease the reported
device capacity by one sector);
d = NO_READ_DISC_INFO (don't use
READ_DISC_INFO command);
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
f = NO_REPORT_OPCODES (don't use report opcodes
command, uas only);
g = MAX_SECTORS_240 (don't transfer more than
240 sectors at a time, uas only);
h = CAPACITY_HEURISTICS (decrease the
reported device capacity by one
sector if the number is odd);
i = IGNORE_DEVICE (don't bind to this
device);
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
l = NOT_LOCKABLE (don't try to lock and
unlock ejectable media);
m = MAX_SECTORS_64 (don't transfer more
than 64 sectors = 32 KB at a time);
n = INITIAL_READ10 (force a retry of the
initial READ(10) command);
o = CAPACITY_OK (accept the capacity
reported by the device);
p = WRITE_CACHE (the device cache is ON
by default);
r = IGNORE_RESIDUE (the device reports
bogus residue values);
s = SINGLE_LUN (the device has only one
Logical Unit);
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
commands, uas only);
u = IGNORE_UAS (don't bind to the uas driver);
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
even if the device claims no cache)
Example: quirks=0419:aaf5:rl,0421:0433:rc

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

Do you have a test / stress test that can force an error?

https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246

Since UAS is broken differently on different devices, the test may or may not affect your drive.

OtherCrashOverride replied May 3, 2017

Do you have a test / stress test that can force an error?

https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246

Since UAS is broken differently on different devices, the test may or may not affect your drive.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

I have also observed that copying a large data file to the drive (60GB+) will randomly fault on occasion.

OtherCrashOverride replied May 3, 2017

I have also observed that copying a large data file to the drive (60GB+) will randomly fault on occasion.

@jwrdegoede

This comment has been minimized.

Show comment
Hide comment
@jwrdegoede

jwrdegoede May 3, 2017

Hi,

If the US_FL_NO_ATA_1X quirk does not help and adding a "u" flag does help then chances are USB-3 bulk-streams are broken on the USB controller on the odroidxu4.

Given that I am the (ex) maintainer of the uas kernel driver and that I have seen 0 bug-reports on issues with the seagate enclosures in question when used with regular PCs that seems quite likely.

Other possible causes are a bad usb cable and/or power-supply issues. Are these USB disk enclosures powered through the USB-bus? If so that is a likely cause of the problem, chances are the odroidxu4 simply cannot deliver enough reliable power through its USB connector. You could try using a power USB-3 hub in between.

And before people dismiss my bad cable / power-issues argument with a "but it works with usb-storage" counter argument, please keep in mind that UAS is a much MUCH more efficient protocol, the amount of io-requests the disk is serving per second with usb-storage is easily 10 times less then it can serve through the uas driver. And this much higher io-load has a tendency to expose weak cables / power-supplies. I've already helped dozens of users solve UAS issues by replacing a bad cable or fixing power-supply issues.

Talking about power supply issues I remember recently helping an user with a Seagate enclosure (IIRC) which did have an external power-supply but the included original power-supply was too weak to handle heavy io-loads. So if you're enclosure does have an externel power-supply try replacing that.

If non of this helps then I believe the correct fix is the disable USB-3 bulk streams on the odroidxu4 is the right fix, disabling uas on these enclosures all together is not the right fix, as said there are 0 bug reports from regular PC-users with these. To do this you will need a patch similar to this one:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/host/xhci-pci.c?id=d95815ba6a0f287213118c136e64d8c56daeaeab

Regards,

Hans

p.s.

About setting multiple-quirks on a single device, simply combine the letters, e.g. : options usb-storage quirks=0x0bc2:0x3321:tj

About the US_FL_NO_ATA_1X only being necessary if the device does not show up at all, that is not entirely true, things like smartd can cause issues after probe too. What would be useful to diagnose this is "dmesg" output after the problem happened this will show the troublesome scsi commands when things went south and the ATA_1X pass through command is easily recognized.

jwrdegoede replied May 3, 2017

Hi,

If the US_FL_NO_ATA_1X quirk does not help and adding a "u" flag does help then chances are USB-3 bulk-streams are broken on the USB controller on the odroidxu4.

Given that I am the (ex) maintainer of the uas kernel driver and that I have seen 0 bug-reports on issues with the seagate enclosures in question when used with regular PCs that seems quite likely.

Other possible causes are a bad usb cable and/or power-supply issues. Are these USB disk enclosures powered through the USB-bus? If so that is a likely cause of the problem, chances are the odroidxu4 simply cannot deliver enough reliable power through its USB connector. You could try using a power USB-3 hub in between.

And before people dismiss my bad cable / power-issues argument with a "but it works with usb-storage" counter argument, please keep in mind that UAS is a much MUCH more efficient protocol, the amount of io-requests the disk is serving per second with usb-storage is easily 10 times less then it can serve through the uas driver. And this much higher io-load has a tendency to expose weak cables / power-supplies. I've already helped dozens of users solve UAS issues by replacing a bad cable or fixing power-supply issues.

Talking about power supply issues I remember recently helping an user with a Seagate enclosure (IIRC) which did have an external power-supply but the included original power-supply was too weak to handle heavy io-loads. So if you're enclosure does have an externel power-supply try replacing that.

If non of this helps then I believe the correct fix is the disable USB-3 bulk streams on the odroidxu4 is the right fix, disabling uas on these enclosures all together is not the right fix, as said there are 0 bug reports from regular PC-users with these. To do this you will need a patch similar to this one:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/host/xhci-pci.c?id=d95815ba6a0f287213118c136e64d8c56daeaeab

Regards,

Hans

p.s.

About setting multiple-quirks on a single device, simply combine the letters, e.g. : options usb-storage quirks=0x0bc2:0x3321:tj

About the US_FL_NO_ATA_1X only being necessary if the device does not show up at all, that is not entirely true, things like smartd can cause issues after probe too. What would be useful to diagnose this is "dmesg" output after the problem happened this will show the troublesome scsi commands when things went south and the ATA_1X pass through command is easily recognized.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

@jwrdegoede , we already eliminated the XU4 as the cause since the issues are reproducible on Intel x86-64 equipment.
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246

So what is interesting about the above? Its from a Core i5 PC, not an XU4. It faults identically on both.

Also, https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=100#p188580
and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1584557

Despite being USB powered, the drive in question also functions normally in Windows and MacOS environments.

OtherCrashOverride replied May 3, 2017

@jwrdegoede , we already eliminated the XU4 as the cause since the issues are reproducible on Intel x86-64 equipment.
https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p188246

So what is interesting about the above? Its from a Core i5 PC, not an XU4. It faults identically on both.

Also, https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=100#p188580
and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1584557

Despite being USB powered, the drive in question also functions normally in Windows and MacOS environments.

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 3, 2017

Despite being USB powered

O well, I also overread "As the HDD is USB powered" in your forum posts completely. :(

So let's add N°5 to the list of 'ODROID XU4 UAS' problems:

  • average user not aware of higher power requirements due to UAS being more efficient than usb-storage that now also count as 'UAS problems'

@matthuisman are you able to test in a reasonable manner (ruling out potential powering problems and providing dmesg output as Hans suggested)?

ThomasKaiser replied May 3, 2017

Despite being USB powered

O well, I also overread "As the HDD is USB powered" in your forum posts completely. :(

So let's add N°5 to the list of 'ODROID XU4 UAS' problems:

  • average user not aware of higher power requirements due to UAS being more efficient than usb-storage that now also count as 'UAS problems'

@matthuisman are you able to test in a reasonable manner (ruling out potential powering problems and providing dmesg output as Hans suggested)?

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride May 3, 2017

So let's add N°5 to the list of 'ODROID XU4 UAS' problems:

We have not identified any Odroid XU4 UAS problems. Only Linux UAS problems have been reported. The issues reported affect all USB3 UAS capable platforms.

OtherCrashOverride replied May 3, 2017

So let's add N°5 to the list of 'ODROID XU4 UAS' problems:

We have not identified any Odroid XU4 UAS problems. Only Linux UAS problems have been reported. The issues reported affect all USB3 UAS capable platforms.

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

@ThomasKaiser

I have 3x of the same drive (0x0bc2:0x3321)
All powered from mains (not USB powered)

They are running via a USB3.0 HUB, but I can connect directly to USB if required.
They are all EXT4 and I am also using AUFS (but cant test directly to drives not AUFS mount)

I am using ARCH and am familiar with compiling kernel (I build my own with AUFS and USB as module).

I am happy to compile new kernels with patches to test etc.

Here is my dmesg output:
https://forum.odroid.com/viewtopic.php?f=146&t=26016#p186301

I suspect I may have cut some off, so may need to try to get it to fault again.

Also, when you google the "ERROR Transfer event for disabled endpoint or incorrect stream ring" error, there are quite a few posts around the internet about it (and these are not XU4 users)

https://bbs.archlinux.org/viewtopic.php?id=192850
https://answers.launchpad.net/ubuntu/+question/404094
https://askubuntu.com/questions/50866/external-usb-3-0-hard-drive-is-not-recognised-when-plugged-into-usb-3-port
https://bugzilla.kernel.org/show_bug.cgi?id=189631

This seems like exact same issue happening on a RPI with Seagate drives:
raspberrypi/linux#1287

matthuisman replied May 3, 2017

@ThomasKaiser

I have 3x of the same drive (0x0bc2:0x3321)
All powered from mains (not USB powered)

They are running via a USB3.0 HUB, but I can connect directly to USB if required.
They are all EXT4 and I am also using AUFS (but cant test directly to drives not AUFS mount)

I am using ARCH and am familiar with compiling kernel (I build my own with AUFS and USB as module).

I am happy to compile new kernels with patches to test etc.

Here is my dmesg output:
https://forum.odroid.com/viewtopic.php?f=146&t=26016#p186301

I suspect I may have cut some off, so may need to try to get it to fault again.

Also, when you google the "ERROR Transfer event for disabled endpoint or incorrect stream ring" error, there are quite a few posts around the internet about it (and these are not XU4 users)

https://bbs.archlinux.org/viewtopic.php?id=192850
https://answers.launchpad.net/ubuntu/+question/404094
https://askubuntu.com/questions/50866/external-usb-3-0-hard-drive-is-not-recognised-when-plugged-into-usb-3-port
https://bugzilla.kernel.org/show_bug.cgi?id=189631

This seems like exact same issue happening on a RPI with Seagate drives:
raspberrypi/linux#1287

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 3, 2017

Anyone have a good test not using dd as I have data on my drives and don't want to wipe?

Looks like fio just writes a file so won't wipe my data

pacman -S fio
cd /media/sda1

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
fio-2.19
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][0.6%][r=668KiB/s,w=232KiB/s][r=167,w=58 IOPS][eta 01h:15m:48s]

Very slow - but I guess this is due to 4k byte size?

Updated to --bs=5M

Run status group 0 (all jobs): READ: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=4095MiB (4294MB), run=40451-40451msec

Run status group 0 (all jobs): WRITE: bw=99.2MiB/s (104MB/s), 99.2MiB/s-99.2MiB/s (104MB/s-104MB/s), io=4095MiB (4294MB), run=41288-41288msec

Still haven't managed to get UAS to fail.... hmmmm
Not using any quirks (just trying to find a way to make it error)

I have it as a module (this wouldn't for some reason fix our issue I assume?)

UPDATE

OK, I got an error when I used fio on my AUFS virtual directory
I had to change direct=0 (due to destination does not support O_DIRECT)

[  183.893214] sd 3:0:0:0: [sdc] tag#0 data cmplt err -75 uas-tag 1 inflight: CMD
[  183.899105] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 76 89 5f 00 00 40 00
[  215.531207] sd 3:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
[  215.537384] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 76 89 5f 00 00 40 00
[  215.545184] scsi host3: uas_eh_bus_reset_handler start
[  215.631350] usb 4-1.1.4: reset SuperSpeed USB device number 7 using xhci-hcd
[  215.659713] scsi host3: uas_eh_bus_reset_handler success

Oooo, tried the excact same fio test (direct=0) to the drive direct (not via AUFS) and managed to get an error on the 3rd run

577.002420] sd 3:0:0:0: [sdc] tag#0 data cmplt err -75 uas-tag 1 inflight: CMD
[  577.008339] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 01 1c 9a 5f 00 00 20 00
[  609.770555] sd 3:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
[  609.776749] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 01 1c 9a 5f 00 00 20 00
[  609.784525] scsi host3: uas_eh_bus_reset_handler start
[  609.870655] usb 4-1.1.4: reset SuperSpeed USB device number 7 using xhci-hcd
[  609.898943] scsi host3: uas_eh_bus_reset_handler success

Perfect, so this is the test I can (a few times) to test for a fix

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=4G --readwrite=randread

Still get same error with
options usb-storage quirks=0x0bc2:0x3321:f (US_FL_NO_REPORT_OPCODES)

Now trying 0x0bc2:0x3321:t (US_FL_NO_ATA_1X) ...
On 6th run without error.... Looking good

matthuisman replied May 3, 2017

Anyone have a good test not using dd as I have data on my drives and don't want to wipe?

Looks like fio just writes a file so won't wipe my data

pacman -S fio
cd /media/sda1

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
fio-2.19
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][0.6%][r=668KiB/s,w=232KiB/s][r=167,w=58 IOPS][eta 01h:15m:48s]

Very slow - but I guess this is due to 4k byte size?

Updated to --bs=5M

Run status group 0 (all jobs): READ: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=4095MiB (4294MB), run=40451-40451msec

Run status group 0 (all jobs): WRITE: bw=99.2MiB/s (104MB/s), 99.2MiB/s-99.2MiB/s (104MB/s-104MB/s), io=4095MiB (4294MB), run=41288-41288msec

Still haven't managed to get UAS to fail.... hmmmm
Not using any quirks (just trying to find a way to make it error)

I have it as a module (this wouldn't for some reason fix our issue I assume?)

UPDATE

OK, I got an error when I used fio on my AUFS virtual directory
I had to change direct=0 (due to destination does not support O_DIRECT)

[  183.893214] sd 3:0:0:0: [sdc] tag#0 data cmplt err -75 uas-tag 1 inflight: CMD
[  183.899105] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 76 89 5f 00 00 40 00
[  215.531207] sd 3:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
[  215.537384] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 76 89 5f 00 00 40 00
[  215.545184] scsi host3: uas_eh_bus_reset_handler start
[  215.631350] usb 4-1.1.4: reset SuperSpeed USB device number 7 using xhci-hcd
[  215.659713] scsi host3: uas_eh_bus_reset_handler success

Oooo, tried the excact same fio test (direct=0) to the drive direct (not via AUFS) and managed to get an error on the 3rd run

577.002420] sd 3:0:0:0: [sdc] tag#0 data cmplt err -75 uas-tag 1 inflight: CMD
[  577.008339] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 01 1c 9a 5f 00 00 20 00
[  609.770555] sd 3:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
[  609.776749] sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 01 1c 9a 5f 00 00 20 00
[  609.784525] scsi host3: uas_eh_bus_reset_handler start
[  609.870655] usb 4-1.1.4: reset SuperSpeed USB device number 7 using xhci-hcd
[  609.898943] scsi host3: uas_eh_bus_reset_handler success

Perfect, so this is the test I can (a few times) to test for a fix

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=4G --readwrite=randread

Still get same error with
options usb-storage quirks=0x0bc2:0x3321:f (US_FL_NO_REPORT_OPCODES)

Now trying 0x0bc2:0x3321:t (US_FL_NO_ATA_1X) ...
On 6th run without error.... Looking good

@matthuisman

This comment has been minimized.

Show comment
Hide comment
@matthuisman

matthuisman May 4, 2017

OK, I have done a lot (15+) of those 4G tests without any errors.

Then I did a "big boy" 20g R/W test to make extra sure
fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=20G --readwrite=randrw

Run status group 0 (all jobs):
   READ: bw=46.9MiB/s (49.8MB/s), 46.9MiB/s-46.9MiB/s (49.8MB/s-49.8MB/s), io=10.5GiB (10.8GB), run=219878-219878msec
  WRITE: bw=46.4MiB/s (48.6MB/s), 46.4MiB/s-46.4MiB/s (48.6MB/s-48.6MB/s), io=9.98GiB (10.7GB), run=219878-219878msec

No errors in dmesg. WOOHOO

WAIT!!!!!!

So, went back to a small 4g read test and the error just happened!! :(

BUGGER!

matthuisman replied May 4, 2017

OK, I have done a lot (15+) of those 4G tests without any errors.

Then I did a "big boy" 20g R/W test to make extra sure
fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=5M --iodepth=64 --size=20G --readwrite=randrw

Run status group 0 (all jobs):
   READ: bw=46.9MiB/s (49.8MB/s), 46.9MiB/s-46.9MiB/s (49.8MB/s-49.8MB/s), io=10.5GiB (10.8GB), run=219878-219878msec
  WRITE: bw=46.4MiB/s (48.6MB/s), 46.4MiB/s-46.4MiB/s (48.6MB/s-48.6MB/s), io=9.98GiB (10.7GB), run=219878-219878msec

No errors in dmesg. WOOHOO

WAIT!!!!!!

So, went back to a small 4g read test and the error just happened!! :(

BUGGER!

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser May 4, 2017

https://bbs.archlinux.org/viewtopic.php?id=192850

This one mentions the same potential fix Hans already suggested: xhci->quirks |= XHCI_BROKEN_STREAMS; (since you're knowledgeable seems like a good idea to try this out)

As for the tests: 4K blocksize will be slow of course especially with HDDs but is maybe more sufficient to trigger errors. I'll get my XU4 back later this week and will then run a series of tests with an SSD on both an ASM1153 (with original and not 'branded' Seagate firmware) and a JMS567. Unfortunately we are not able to test on XU4 without an USB hub in between host controller and USB-to-SATA bridge.

Does anyone of you know whether Hardkernel guys might have dev samples without the GL3521 so that the USB port is directly accessible?

ThomasKaiser replied May 4, 2017

https://bbs.archlinux.org/viewtopic.php?id=192850

This one mentions the same potential fix Hans already suggested: xhci->quirks |= XHCI_BROKEN_STREAMS; (since you're knowledgeable seems like a good idea to try this out)

As for the tests: 4K blocksize will be slow of course especially with HDDs but is maybe more sufficient to trigger errors. I'll get my XU4 back later this week and will then run a series of tests with an SSD on both an ASM1153 (with original and not 'branded' Seagate firmware) and a JMS567. Unfortunately we are not able to test on XU4 without an USB hub in between host controller and USB-to-SATA bridge.

Does anyone of you know whether Hardkernel guys might have dev samples without the GL3521 so that the USB port is directly accessible?

Please sign in to comment.