Placement Group Management #560

hunter · 2017-04-09T04:57:36Z

Following from some of the discussion in #554 (and #558), it may be useful to add PG configuration to the Pool TPR. That would give more control over the global default which are not a great fit for some of the default pools (the PG calc has some recommendations - http://ceph.com/pgcalc/)

One of the benefits of the dynamic nature of Rook is that some of the PG configuration could be automated. Since the PGs should change as more OSDs are added, we can get some of that info from the rook cluster and adjust if needed (assuming the PGs don't scale down).

The calculator uses the formula of (( Target PGs per OSD ) x ( OSD # ) x ( %Data )) / ( Size ) so perhaps the estimated %Data could be specified in the Pool TPR to let Rook manage the PG Allocation across all the pools.

This also raises the question, when FS or RGW services are launched, should they pre-create the Pool TPRs to allow control over the smaller/less utilised pools.

The text was updated successfully, but these errors were encountered:

jbw976 · 2017-04-27T00:47:34Z

This is interesting @hunter, it seems like it has two separate parts:

When Rook creates a pool, it creates it with an intelligent number of PGs
As the number of OSDs increase in the cluster, the rook-operator could increase the number of PGs in existing pools

http://ceph.com/pgcalc/ can get a bit complicated as it depends on a fair amount of cluster "forecasting" knowledge to be known by the user, but I do like your thinking about trying to scope that down to a slim set of information that we'd need to collect from the user via the TPRs.

bassam · 2017-04-27T03:16:32Z

I think ceph needs to support shrinking pgs. Without this it would be hard to make things dynamic.

DanKerns · 2017-04-27T17:43:15Z

I don't think we should be exposing PG's to the end user. I think we need to figure out how to do the "right thing." One of the tenants of Rook is that we manage the SDS system so clients of storage don't need to be experts.

hunter · 2017-04-28T02:01:40Z

The difficulty in hiding the PGs is that we need some way of knowing the amount of data to be stored inside a pool. Exposing a % may be an option, although may confuse things since an end user won't know about the system pools (.rgw.*, etc). Perhaps a simplified bucket system may work (eg. small, medium, large) which can then calculate based on the existing pools.

Alternatively for smaller clusters, it is possible to use standard sizes (although not as optimal) - http://docs.ceph.com/docs/master/rados/operations/placement-groups/

jpds · 2017-10-23T23:46:09Z

Ceph appears to be recommending the following on my Luminous cluster:

too few PGs per OSD (25 < min 30)

Prehaps it'd be worth using the data from ceph osd df tree and the above to calculate the next power of two to use for the recommended PGs in a cluster?

jpds · 2017-10-26T22:04:14Z

Interesting and very relevant:

http://ceph.com/community/new-luminous-pg-overdose-protection/

stale · 2018-11-04T17:09:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2018-11-11T17:26:05Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

whereisaaron · 2019-01-07T04:45:47Z

It looks like ceph intends to add the option to autoscale placement groups, so rook may not need to attend to this. Rook could calculate a good starting, or just wait for the autoscaling to increase the default 100.

http://docs.ceph.com/docs/master/rados/operations/health-checks/?highlight=backfillfull%20ratio#pool-too-few-pgs

ceph osd pool set <pool-name> pg_autoscale_mode on

Don't see this in 13.2.2, so maybe a later version.

travisn · 2019-01-16T21:54:03Z

Need to confirm that the mgr module for placement mgmt solves this

sebastian-philipp · 2019-04-10T16:44:22Z

I think all we need is a CRD property (cluster or pool?) to enable/disable pg autoscaling.

do we need a generic way to enable or disable mgr modues? e.g. dashboard, prometheus, diskprediction, crash, influx, pg_autoscaler, etc.

dimm0 · 2019-04-10T20:15:52Z

I had to enable the OSDs balancer in mgr manually... That's a super useful thing to have

dimm0 · 2019-04-10T21:09:06Z

Although this stuff is heavily dependent on ceph features currently used...
After upgrading the ceph version, I had to update the straw to straw2 (whatever it means) and set the reweight-compat... And I'm still behind of what the newest features are. Not sure how you can deal with those updates in rook

liewegas · 2019-04-10T21:24:14Z

👍 on adding a CRD property to enable mgr modules, as long as specifying some set of modules to enable does not make rook disable any modules not explicitly called out (since we tend to add new modules with each release).

As for converting straw to straw2, I think that's something that rook could do (as an opinionated operator), but it's also something that ceph-mgr could do for the benefit of all users. I'll add a ticket for ceph to do this on its own...

liewegas · 2019-04-10T22:06:02Z

Ha, ceph already has a CLI command to do this: ceph osd crush set-all-straw-buckets-to-straw2. Rook could just run this at the end of an upgrade. If existing buckets were straw there may be a small amount of data movement but it is generally pretty minimal.

dimm0 · 2019-04-11T19:32:39Z

Yup, but as I said it's heavily dependent on tunables version, which is dependent on versions of client kernels.. Which you can see the current ones, but you don't know what user will want to connect

stale · 2019-07-10T19:39:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-07-17T19:42:46Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

mykaul · 2019-07-29T13:32:06Z

Ha, ceph already has a CLI command to do this: ceph osd crush set-all-straw-buckets-to-straw2. Rook could just run this at the end of an upgrade. If existing buckets were straw there may be a small amount of data movement but it is generally pretty minimal.

@travisn - I assume the above is still relevant? Should we open this item? (it's not clear to me why it's 'done' under 1.1)

travisn · 2019-07-29T13:56:02Z

Agreed, reopening... the bot just closed it and marked it as done

sebastian-philipp · 2019-07-29T15:17:33Z

@LenzGr this might create conflicts between what the dashboard things and what rook things are enabled modules

We can now enable via the CRD any manager module, to do this simply do the following: ``` mgr: modules: - name: pg_autoscaler ``` Closes: rook#560 Signed-off-by: Sébastien Han <seb@redhat.com>

bzub mentioned this issue Apr 25, 2017

memory tuning config for bluestore and OSDs #293

Closed

travisn added this to the 0.7 milestone Nov 10, 2017

travisn added this to To Do in v0.7 via automation Nov 10, 2017

travisn assigned jbw976 Nov 10, 2017

jbw976 mentioned this issue Dec 5, 2017

update ceph to 12.2.2 #1286

Merged

jbw976 mentioned this issue Dec 15, 2017

design: proposal for supporting Cluster CRD updates #1248

Merged

jbw976 removed this from To Do in v0.7 Feb 12, 2018

jbw976 modified the milestones: 0.7.5, 0.9 Feb 28, 2018

travisn added this to To do in v0.9 via automation Aug 2, 2018

galexrt added the ceph main ceph tag label Aug 6, 2018

stale bot added the wontfix label Nov 4, 2018

stale bot closed this as completed Nov 11, 2018

v0.9 automation moved this from To do to Done Nov 11, 2018

travisn reopened this Jan 16, 2019

v0.9 automation moved this from Done to To do Jan 16, 2019

stale bot removed the wontfix label Jan 16, 2019

travisn modified the milestones: 0.9, 1.0 Jan 16, 2019

travisn added this to To do in v1.0 via automation Jan 16, 2019

BlaineEXE added the feature label Apr 10, 2019

stale bot added the wontfix label Jul 10, 2019

stale bot closed this as completed Jul 17, 2019

v1.1 automation moved this from To do to Done Jul 17, 2019

travisn reopened this Jul 29, 2019

v1.1 automation moved this from Done to In progress Jul 29, 2019

stale bot removed the wontfix label Jul 29, 2019

travisn mentioned this issue Sep 2, 2019

ceph docs: identify required ceph configs #3745

Merged

9 tasks

leseb mentioned this issue Sep 4, 2019

Ceph: Ability to enable pg_autoscaler mgr module via CRD #3769

Merged

9 tasks

mbukatov mentioned this issue Sep 5, 2019

WA for BZ 1747388 with WARN too few PGs red-hat-storage/ocs-ci#687

Closed

travisn moved this from In progress to Review in progress in v1.1 Sep 6, 2019

travisn closed this as completed in #3769 Sep 6, 2019

v1.1 automation moved this from Review in progress to Done Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Placement Group Management #560

Placement Group Management #560

hunter commented Apr 9, 2017

jbw976 commented Apr 27, 2017

bassam commented Apr 27, 2017

DanKerns commented Apr 27, 2017

hunter commented Apr 28, 2017 •

edited

jpds commented Oct 23, 2017

jpds commented Oct 26, 2017

stale bot commented Nov 4, 2018

stale bot commented Nov 11, 2018

whereisaaron commented Jan 7, 2019 •

edited

travisn commented Jan 16, 2019

sebastian-philipp commented Apr 10, 2019

dimm0 commented Apr 10, 2019

dimm0 commented Apr 10, 2019

liewegas commented Apr 10, 2019

liewegas commented Apr 10, 2019

dimm0 commented Apr 11, 2019

stale bot commented Jul 10, 2019

stale bot commented Jul 17, 2019

mykaul commented Jul 29, 2019

travisn commented Jul 29, 2019

sebastian-philipp commented Jul 29, 2019

Placement Group Management #560

Placement Group Management #560

Comments

hunter commented Apr 9, 2017

jbw976 commented Apr 27, 2017

bassam commented Apr 27, 2017

DanKerns commented Apr 27, 2017

hunter commented Apr 28, 2017 • edited

jpds commented Oct 23, 2017

jpds commented Oct 26, 2017

stale bot commented Nov 4, 2018

stale bot commented Nov 11, 2018

whereisaaron commented Jan 7, 2019 • edited

travisn commented Jan 16, 2019

sebastian-philipp commented Apr 10, 2019

dimm0 commented Apr 10, 2019

dimm0 commented Apr 10, 2019

liewegas commented Apr 10, 2019

liewegas commented Apr 10, 2019

dimm0 commented Apr 11, 2019

stale bot commented Jul 10, 2019

stale bot commented Jul 17, 2019

mykaul commented Jul 29, 2019

travisn commented Jul 29, 2019

sebastian-philipp commented Jul 29, 2019

hunter commented Apr 28, 2017 •

edited

whereisaaron commented Jan 7, 2019 •

edited