-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to configure user/dedicated workqueue with the "config-wq" command #11
Comments
I think the 5.11 kernel may still have the driver bug that causes wq mode change to not happen. I believe that has been fixed in later kernels. Can you see if a 5.12 kernel works any better? |
"threshold" is only for shared wqs. Try without the "--threshold" option. |
Thanks @ramesh-thomas. I was reading the log wrong and was mistaken incorrect mode rather than threshold. |
# accel-config list
[
]
# accel-config config-wq --type=user --mode=dedicated --name="dsa0.0" --group-id=0 --wq-size=16 --priority=10 --block-on-fault=1 dsa0/wq0.0
# accel-config config-engine --group-id=0 dsa0/engine0.0
# accel-config enable-device dsa0
failed in dsa0
enabled 0 device(s) out of 1
Error[ 0x15] dsa0: Invalid group config: lack of wq or engines
# accel-config list
[
]
# accel-config load-config -c dsa0.conf
# accel-config enable-device dsa0
enabled 1 device(s) out of 1
# accel-config enable-wq dsa0/wq0.0
enabled 1 wq(s) out of 1
#
OK, thanks, we will try that. In principle, it works in 5.11 through load-configuration. |
@ozhuraki you don't need to switch kernel. I misread your earlier log. |
Can you try rebooting or resetting by unloading and reloading idxd module? Those commands worked for me. |
After rebooting "config-wq" works, but only once. # modprobe -r idxd
# modprobe idxd
# accel-config list
[
]
# accel-config config-wq --type=user --mode=dedicated --name="dsa0.0" --group-id=0 --wq-size=16 --priority=10 --block-on-fault=1 dsa0/wq0.0
# accel-config config-engine --group-id=0 dsa0/engine0.0
# accel-config enable-device dsa0
failed in dsa0
enabled 0 device(s) out of 1
Error[ 0x16] dsa0: Invalid group config: wq misconfigured
# accel-config list
[
]
# accel-config load-config -c dsa0.conf
# accel-config enable-device dsa0
enabled 1 device(s) out of 1
# accel-config enable-wq dsa0/wq0.0
enabled 1 wq(s) out of 1
# |
Can you try putting group id as the first parameter? I wonder if there's an ordering issue for whatever reason. |
# accel-config list
[
]
# accel-config config-wq --group-id=0 --type=user --mode=dedicated --name="dsa0.0" --wq-size=16 --priority=10 --block-on-fault=1 dsa0/wq0.0
# accel-config config-engine --group-id=0 dsa0/engine0.0
# accel-config enable-device dsa0
failed in dsa0
enabled 0 device(s) out of 1
Error[ 0x16] dsa0: Invalid group config: wq misconfigured
# accel-config list
[
]
# accel-config load-config -c dsa0.conf
# accel-config enable-device dsa0
enabled 1 device(s) out of 1
# accel-config enable-wq dsa0/wq0.0
enabled 1 wq(s) out of 1
# |
Can you attach the dsa0.conf? Also, given it's a dedicated wq, can you try the latest upstream kernel? 5.15-rc5 would be great. Thanks! |
# cat dsa0.conf
[
{
"dev":"dsa0",
"token_limit":0,
"groups":[
{
"dev":"group0.0",
"tokens_reserved":0,
"use_token_limit":0,
"tokens_allowed":8,
"grouped_workqueues":[
{
"dev":"wq0.0",
"mode":"dedicated",
"size":16,
"group_id":0,
"priority":10,
"block_on_fault":1,
"type":"user",
"name":"dsa0.0",
"threshold":15
}
],
"grouped_engines":[
{
"dev":"engine0.0",
"group_id":0
}
]
},
{
"dev":"group0.1",
"tokens_reserved":0,
"use_token_limit":0,
"tokens_allowed":8,
"grouped_workqueues":[
{
"dev":"wq0.1",
"mode":"dedicated",
"size":16,
"group_id":1,
"priority":10,
"block_on_fault":1,
"type":"user",
"name":"dsa0.1",
"threshold":15
}
],
"grouped_engines":[
{
"dev":"engine0.1",
"group_id":1
}
]
},
{
"dev":"group0.2",
"tokens_reserved":0,
"use_token_limit":0,
"tokens_allowed":8,
"grouped_workqueues":[
{
"dev":"wq0.2",
"mode":"dedicated",
"size":16,
"group_id":2,
"priority":10,
"block_on_fault":1,
"type":"user",
"name":"dsa0.2",
"threshold":15
}
],
"grouped_engines":[
{
"dev":"engine0.2",
"group_id":2
}
]
},
{
"dev":"group0.3",
"tokens_reserved":0,
"use_token_limit":0,
"tokens_allowed":8,
"grouped_workqueues":[
{
"dev":"wq0.3",
"mode":"dedicated",
"size":16,
"group_id":3,
"priority":10,
"block_on_fault":1,
"type":"user",
"name":"dsa0.3",
"threshold":15
}
],
"grouped_engines":[
{
"dev":"engine0.3",
"group_id":3
}
]
}
]
}
]
# |
You don't have a conf file that only configures a single wq same as the commandline? Can you do a 'accel-config list -i' after you have configured with commandline? Curious what accel-config has configured so far after commandline. |
Reducing the conf to fewer than 3 workqueus doesn't work, i.e. such configuration fails to load through "load-configuration".
# accel-config list
[
]
# accel-config list --idle | jq '.[].dev' | grep dsa
"dsa0"
"dsa1"
"dsa2"
"dsa3"
"dsa4"
"dsa5"
"dsa6"
"dsa7"
# accel-config list --idle | jq '.[0]'
{
"dev": "dsa0",
"token_limit": 0,
"max_groups": 4,
"max_work_queues": 8,
"max_engines": 4,
"work_queue_size": 128,
"numa_node": 0,
"op_cap": [
"0x1003f03ff",
"0",
"0",
"0"
],
"gen_cap": "0x40915f010f",
"version": "0x100",
"state": "disabled",
"max_tokens": 96,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"configurable": 1,
"pasid_enabled": 1,
"cdev_major": 234,
"clients": 0,
"groups": [
{
"dev": "group0.0",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_engines": [
{
"dev": "engine0.0",
"group_id": 0
}
]
},
{
"dev": "group0.1",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_engines": [
{
"dev": "engine0.1",
"group_id": 1
}
]
},
{
"dev": "group0.2",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_engines": [
{
"dev": "engine0.2",
"group_id": 2
}
]
},
{
"dev": "group0.3",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_engines": [
{
"dev": "engine0.3",
"group_id": 3
}
]
}
],
"ungrouped workqueues": [
{
"dev": "wq0.0",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.1",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.2",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.3",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.4",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.5",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.6",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.7",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
}
]
}
# accel-config load-config -c dsa0.conf
# accel-config enable-device dsa0
enabled 1 device(s) out of 1
# accel-config enable-wq dsa0/wq0.0
enabled 1 wq(s) out of 1
# accel-config list --idle | jq '.[0]'
{
"dev": "dsa0",
"token_limit": 0,
"max_groups": 4,
"max_work_queues": 8,
"max_engines": 4,
"work_queue_size": 128,
"numa_node": 0,
"op_cap": [
"0x1003f03ff",
"0",
"0",
"0"
],
"gen_cap": "0x40915f010f",
"version": "0x100",
"state": "enabled",
"max_tokens": 96,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"configurable": 1,
"pasid_enabled": 1,
"cdev_major": 234,
"clients": 0,
"groups": [
{
"dev": "group0.0",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_workqueues": [
{
"dev": "wq0.0",
"mode": "dedicated",
"size": 16,
"group_id": 0,
"priority": 10,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"cdev_minor": 0,
"type": "user",
"name": "dsa0.0",
"threshold": 0,
"ats_disable": 0,
"state": "enabled",
"clients": 0
}
],
"grouped_engines": [
{
"dev": "engine0.0",
"group_id": 0
}
]
},
{
"dev": "group0.1",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_workqueues": [
{
"dev": "wq0.1",
"mode": "dedicated",
"size": 16,
"group_id": 1,
"priority": 10,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "user",
"name": "dsa0.1",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
}
],
"grouped_engines": [
{
"dev": "engine0.1",
"group_id": 1
}
]
},
{
"dev": "group0.2",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_workqueues": [
{
"dev": "wq0.2",
"mode": "dedicated",
"size": 16,
"group_id": 2,
"priority": 10,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "user",
"name": "dsa0.2",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
}
],
"grouped_engines": [
{
"dev": "engine0.2",
"group_id": 2
}
]
},
{
"dev": "group0.3",
"tokens_reserved": 0,
"use_token_limit": 0,
"tokens_allowed": 8,
"traffic_class_a": 0,
"traffic_class_b": 1,
"grouped_workqueues": [
{
"dev": "wq0.3",
"mode": "dedicated",
"size": 16,
"group_id": 3,
"priority": 10,
"block_on_fault": 1,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "user",
"name": "dsa0.3",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
}
],
"grouped_engines": [
{
"dev": "engine0.3",
"group_id": 3
}
]
}
],
"ungrouped workqueues": [
{
"dev": "wq0.4",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.5",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.6",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
},
{
"dev": "wq0.7",
"mode": "shared",
"size": 0,
"priority": 0,
"block_on_fault": 0,
"max_batch_size": 1024,
"max_transfer_size": 2147483648,
"type": "none",
"name": "",
"threshold": 0,
"ats_disable": 0,
"state": "disabled",
"clients": 0
}
]
}
#
# accel-config list --idle | jq '.[].dev' | grep dsa
"dsa0"
"dsa1"
"dsa2"
"dsa3"
"dsa4"
"dsa5"
"dsa6"
"dsa7" |
I find the engines all pre-assigned to each group to be strange. |
we find this rebooting a bit strange. would rmmod/modprobe idxd be enough as suggested by @ramesh-thomas earlier:
|
There are multiple users, unfortunately, this is problematic. Resetting by unloading/loading the idxd module was already tried #11 (comment). Are there any other ways to reset the DSA HW? While discovering this, an earlier observation is that "config-wq", "config-engine", "enable-device", "enable-wq", "disable..." works only a limited number of times after a reboot and was reproducible in multiple physical setups. Since the identical configuration can be succesfully loaded and enabled through "load-configuration", is the problem in the order of setting the sysfs entries by accel-config in case of "config-wq" / "config-engine" / "enable-device"? |
You can unload module. But I really want a clean slate to see if this is a problem or something else caused this. Also, the 5.11 kernel is pretty old consider 5.15 is about to be released. The 5.11 probably has a lot of bugs that are fixed in later kernels. Unless you are reproducing a bug on the latest upstream kernel, there isn't much we can do. BTW, what silicon stepping are you using? |
The text was updated successfully, but these errors were encountered: