Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool update and pool creation does not honor subFailureDomain #14295

Closed
ideepika opened this issue Jun 3, 2024 · 5 comments
Closed

Pool update and pool creation does not honor subFailureDomain #14295

ideepika opened this issue Jun 3, 2024 · 5 comments

Comments

@ideepika
Copy link
Contributor

ideepika commented Jun 3, 2024

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
Parameters like subFailureDomain and CrushRoot are not honored when updating crush rule(this is observed on new pool creation as well as crush rule update using yaml edit), the feature to support crush rule change was brought by this PR: #12263 but if subfailureDomain is specified, it is completely ignored and instead a new rule get's created.

Eg:

+ failureDomain: zone
 replicated:
- size: 3
+ size: 4
+ replicasPerFailureDomain: 2
+ subFailureDomain: host
+ crushRoot: default
+ deviceClass: nvme

2 crush rules get's created:

2024-06-03 11:51:25.237613 I | cephclient: creating a new crush rule for changed deviceClass ("default~hdd"-->"hdd") on crush rule "test-crush-bug-az-4"
2024-06-03 11:51:25.237647 I | cephclient: updating pool "test-crush-bug-az-4" failure domain from "zone" to "zone" with new crush rule "test-crush-bug-az-ab-4_zone_hdd"

2 rules get's created:

   {
        "rule_id": 4,
        "rule_name": "test-crush-bug-az-ab-4_zone_hdd",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~hdd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "zone"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 157,
        "rule_name": "test-crush-bug-az-ab-4",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~hdd"
            },
            {
                "op": "choose_firstn",
                "num": 0,
                "type": "zone"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 2,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

of which 2nd one is correct, also crush rule update should honor subFailureDomain

@ideepika ideepika added the bug label Jun 3, 2024
@ideepika
Copy link
Contributor Author

ideepika commented Jun 3, 2024

@subhamkrai

@travisn
Copy link
Member

travisn commented Jun 3, 2024

SubfailureDomain is a field currently only supported for stretch clusters. Are you wanting to use it and support updates for some other scenario, or is this for stretch? I just wouldn't expect the subfailure domain to change in stretch clusters.

@ideepika
Copy link
Contributor Author

ideepika commented Jun 4, 2024

SubfailureDomain is a field currently only supported for stretch clusters. Are you wanting to use it and support updates for some other scenario, or is this for stretch? I just wouldn't expect the subfailure domain to change in stretch clusters.

yes, for advanced cases, it was succcesfully created but I suppose on OSD restart or when operator rescans the configuration, it replaces the subdomain crush rule with the one with just failuredomain, maybe related to #14300

Copy link

github-actions bot commented Aug 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants