Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placement Group Management #560

Closed
hunter opened this issue Apr 9, 2017 · 25 comments · Fixed by #3769
Closed

Placement Group Management #560

hunter opened this issue Apr 9, 2017 · 25 comments · Fixed by #3769
Assignees
Labels
ceph main ceph tag feature
Projects
Milestone

Comments

@hunter
Copy link
Contributor

hunter commented Apr 9, 2017

Following from some of the discussion in #554 (and #558), it may be useful to add PG configuration to the Pool TPR. That would give more control over the global default which are not a great fit for some of the default pools (the PG calc has some recommendations - http://ceph.com/pgcalc/)

One of the benefits of the dynamic nature of Rook is that some of the PG configuration could be automated. Since the PGs should change as more OSDs are added, we can get some of that info from the rook cluster and adjust if needed (assuming the PGs don't scale down).

The calculator uses the formula of (( Target PGs per OSD ) x ( OSD # ) x ( %Data )) / ( Size ) so perhaps the estimated %Data could be specified in the Pool TPR to let Rook manage the PG Allocation across all the pools.

This also raises the question, when FS or RGW services are launched, should they pre-create the Pool TPRs to allow control over the smaller/less utilised pools.

@jbw976
Copy link
Member

jbw976 commented Apr 27, 2017

This is interesting @hunter, it seems like it has two separate parts:

  1. When Rook creates a pool, it creates it with an intelligent number of PGs
  2. As the number of OSDs increase in the cluster, the rook-operator could increase the number of PGs in existing pools

http://ceph.com/pgcalc/ can get a bit complicated as it depends on a fair amount of cluster "forecasting" knowledge to be known by the user, but I do like your thinking about trying to scope that down to a slim set of information that we'd need to collect from the user via the TPRs.

@bassam
Copy link
Member

bassam commented Apr 27, 2017

I think ceph needs to support shrinking pgs. Without this it would be hard to make things dynamic.

@DanKerns
Copy link
Member

I don't think we should be exposing PG's to the end user. I think we need to figure out how to do the "right thing." One of the tenants of Rook is that we manage the SDS system so clients of storage don't need to be experts.

@hunter
Copy link
Contributor Author

hunter commented Apr 28, 2017

The difficulty in hiding the PGs is that we need some way of knowing the amount of data to be stored inside a pool. Exposing a % may be an option, although may confuse things since an end user won't know about the system pools (.rgw.*, etc). Perhaps a simplified bucket system may work (eg. small, medium, large) which can then calculate based on the existing pools.

Alternatively for smaller clusters, it is possible to use standard sizes (although not as optimal) - http://docs.ceph.com/docs/master/rados/operations/placement-groups/

@jpds
Copy link

jpds commented Oct 23, 2017

Ceph appears to be recommending the following on my Luminous cluster:

too few PGs per OSD (25 < min 30)

Prehaps it'd be worth using the data from ceph osd df tree and the above to calculate the next power of two to use for the recommended PGs in a cluster?

@jpds
Copy link

jpds commented Oct 26, 2017

Interesting and very relevant:

@travisn travisn added this to the 0.7 milestone Nov 10, 2017
@travisn travisn added this to To Do in v0.7 via automation Nov 10, 2017
@jbw976 jbw976 removed this from To Do in v0.7 Feb 12, 2018
@jbw976 jbw976 modified the milestones: 0.7.5, 0.9 Feb 28, 2018
@travisn travisn added this to To do in v0.9 via automation Aug 2, 2018
@galexrt galexrt added the ceph main ceph tag label Aug 6, 2018
@stale
Copy link

stale bot commented Nov 4, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 4, 2018
@stale
Copy link

stale bot commented Nov 11, 2018

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@stale stale bot closed this as completed Nov 11, 2018
v0.9 automation moved this from To do to Done Nov 11, 2018
@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 7, 2019

It looks like ceph intends to add the option to autoscale placement groups, so rook may not need to attend to this. Rook could calculate a good starting, or just wait for the autoscaling to increase the default 100.

http://docs.ceph.com/docs/master/rados/operations/health-checks/?highlight=backfillfull%20ratio#pool-too-few-pgs

ceph osd pool set <pool-name> pg_autoscale_mode on

Don't see this in 13.2.2, so maybe a later version.

@travisn
Copy link
Member

travisn commented Jan 16, 2019

Need to confirm that the mgr module for placement mgmt solves this

@travisn travisn reopened this Jan 16, 2019
v0.9 automation moved this from Done to To do Jan 16, 2019
@stale stale bot removed the wontfix label Jan 16, 2019
@travisn travisn modified the milestones: 0.9, 1.0 Jan 16, 2019
@travisn travisn added this to To do in v1.0 via automation Jan 16, 2019
@sebastian-philipp
Copy link
Member

I think all we need is a CRD property (cluster or pool?) to enable/disable pg autoscaling.

do we need a generic way to enable or disable mgr modues? e.g. dashboard, prometheus, diskprediction, crash, influx, pg_autoscaler, etc.

@dimm0
Copy link
Contributor

dimm0 commented Apr 10, 2019

I had to enable the OSDs balancer in mgr manually... That's a super useful thing to have

@dimm0
Copy link
Contributor

dimm0 commented Apr 10, 2019

Although this stuff is heavily dependent on ceph features currently used...
After upgrading the ceph version, I had to update the straw to straw2 (whatever it means) and set the reweight-compat... And I'm still behind of what the newest features are. Not sure how you can deal with those updates in rook

@liewegas
Copy link
Member

👍 on adding a CRD property to enable mgr modules, as long as specifying some set of modules to enable does not make rook disable any modules not explicitly called out (since we tend to add new modules with each release).

As for converting straw to straw2, I think that's something that rook could do (as an opinionated operator), but it's also something that ceph-mgr could do for the benefit of all users. I'll add a ticket for ceph to do this on its own...

@liewegas
Copy link
Member

Ha, ceph already has a CLI command to do this: ceph osd crush set-all-straw-buckets-to-straw2. Rook could just run this at the end of an upgrade. If existing buckets were straw there may be a small amount of data movement but it is generally pretty minimal.

@dimm0
Copy link
Contributor

dimm0 commented Apr 11, 2019

Yup, but as I said it's heavily dependent on tunables version, which is dependent on versions of client kernels.. Which you can see the current ones, but you don't know what user will want to connect

@stale
Copy link

stale bot commented Jul 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 10, 2019
@stale
Copy link

stale bot commented Jul 17, 2019

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@stale stale bot closed this as completed Jul 17, 2019
v1.1 automation moved this from To do to Done Jul 17, 2019
@mykaul
Copy link
Contributor

mykaul commented Jul 29, 2019

Ha, ceph already has a CLI command to do this: ceph osd crush set-all-straw-buckets-to-straw2. Rook could just run this at the end of an upgrade. If existing buckets were straw there may be a small amount of data movement but it is generally pretty minimal.

@travisn - I assume the above is still relevant? Should we open this item? (it's not clear to me why it's 'done' under 1.1)

@travisn
Copy link
Member

travisn commented Jul 29, 2019

Agreed, reopening... the bot just closed it and marked it as done

@travisn travisn reopened this Jul 29, 2019
v1.1 automation moved this from Done to In progress Jul 29, 2019
@stale stale bot removed the wontfix label Jul 29, 2019
@sebastian-philipp
Copy link
Member

@LenzGr this might create conflicts between what the dashboard things and what rook things are enabled modules

leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/rook that referenced this issue Sep 4, 2019
We can now enable via the CRD any manager module, to do this simply do
the following:

```
mgr:
  modules:
    - name: pg_autoscaler
```

Closes: rook#560
Signed-off-by: Sébastien Han <seb@redhat.com>
@travisn travisn moved this from In progress to Review in progress in v1.1 Sep 6, 2019
v1.1 automation moved this from Review in progress to Done Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag feature
Projects
No open projects
v1.1
  
Done
Development

Successfully merging a pull request may close this issue.