New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: create EC pool as secondary pool #8452
Conversation
d49c470
to
767f80a
Compare
@subhamkrai what's the status of this? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
the problem is the solution in working(or I'm missing something), will into this week |
Documentation/ceph-filesystem-crd.md
Outdated
```console | ||
mkdir /mnt/cephfs/myssddir | ||
$ setfattr -n ceph.dir.layout.pool -v <ec-pool-name> /mnt/cephfs/myssddir #example ec-pool-name, myfs-ec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we making users go through all these manual hoops to get a working file store?
IMO it would be better treat the first pool given as the default
pool and configure this automatically if the user supplies multiple pools. If the user supplies a default
pool that is EC, we can rely on Ceph to fail for us.
Also, this user #8210 (comment) reported that they followed steps here with no success.
When Rook loads the pool list, does it keep the same order from k8s? If not, that could be a source of failure.
@batrick can you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise lgtm
13a643f
to
f8f5095
Compare
Steps I did to create ec file-system
Test result:
sh-4.4$ ceph status
cluster:
id: 408d62f7-130d-4522-b531-021c79fc14e5
health: HEALTH_WARN
clock skew detected on mon.b
services:
mon: 3 daemons, quorum a,b,c (age 9m)
mgr: a(active, since 10m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 9m), 3 in (since 9m)
data:
volumes: 1/1 healthy
pools: 6 pools, 161 pgs
objects: 36 objects, 3.2 MiB
usage: 35 MiB used, 59 GiB / 59 GiB avail
pgs: 161 active+clean
io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
sh-4.4$ ceph status
cluster:
id: 408d62f7-130d-4522-b531-021c79fc14e5
health: HEALTH_WARN
clock skew detected on mon.b
services:
mon: 3 daemons, quorum a,b,c (age 11m)
mgr: a(active, since 11m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 11m), 3 in (since 11m)
data:
volumes: 1/1 healthy
pools: 6 pools, 161 pgs
objects: 36 objects, 3.2 MiB
usage: 36 MiB used, 59 GiB / 59 GiB avail
pgs: 161 active+clean
io:
client: 1.5 KiB/s rd, 1 op/s rd, 0 op/s wr
sh-4.4$ ceph status
cluster:
id: 408d62f7-130d-4522-b531-021c79fc14e5
health: HEALTH_WARN
clock skew detected on mon.b
services:
mon: 3 daemons, quorum a,b,c (age 12m)
mgr: a(active, since 12m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 12m), 3 in (since 12m)
data:
volumes: 1/1 healthy
pools: 6 pools, 161 pgs
objects: 36 objects, 3.8 MiB
usage: 38 MiB used, 59 GiB / 59 GiB avail
pgs: 161 active+clean
io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr sh-4.4$ ceph osd erasure-code-profile get default
k=2
m=2
plugin=jerasure
technique=reed_sol_van Application pod output: # pwd
/var/lib/www/html
# ls
lost+found test test1
# @travisn let me know if I missed any thing to verify it's working |
18e87b5
to
58e3468
Compare
From the overall ceph status I can't tell if the data is in the EC pool. What about a command such as |
on toolbox pod sh-4.4$ ceph df detail
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 117 GiB 117 GiB 93 MiB 93 MiB 0.08
TOTAL 117 GiB 117 GiB 93 MiB 93 MiB 0.08
--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
device_health_metrics 1 64 0 B 0 B 0 B 0 0 B 0 B 0 B 0 37 GiB N/A N/A N/A 0 B 0 B
myfs-ec-metadata 2 256 76 KiB 73 KiB 3.2 KiB 25 322 KiB 312 KiB 9.7 KiB 0 37 GiB N/A N/A N/A 0 B 0 B
myfs-ec-data0 3 32 158 B 158 B 0 B 2 12 KiB 12 KiB 0 B 0 37 GiB N/A N/A N/A 0 B 0 B
myfs-ec-data1 4 32 0 B 0 B 0 B 0 0 B 0 B 0 B 0 74 GiB N/A N/A N/A 0 B 0 B
sh-4.4$```
on application pod
```console
ls -alh
total 4.5K
drwxrwxrwx 2 root root 1 Nov 9 09:09 .
drwxr-xr-x 3 root root 4.0K Nov 9 09:02 ..
-rw-r--r-- 1 root root 327 Nov 9 09:07 file1 I manually created file name |
58e3468
to
a5543d6
Compare
@travisn should be also updated in default |
I update that but I would like to do it in follow-up PR. The only reason is this PR is lon
I would like to do that in follow-up PR because this PR is here for some time I would like to get it merged. I'll create an issue to track. |
LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to hear the testing is working now!
@@ -14,7 +14,7 @@ parameters: | |||
|
|||
# Ceph pool into which the volume shall be created | |||
# Required for provisionVolume: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a comment here also about the required replicated data pool?
@@ -17,6 +17,8 @@ spec: | |||
size: 3 | |||
# The list of data pool specs | |||
dataPools: | |||
- replicated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And let's add the comment here. Better to document in multiple places so it's more discoverable
@@ -94,8 +96,7 @@ spec: | |||
activeStandby: true | |||
``` | |||
|
|||
(These definitions can also be found in the [`filesystem-ec.yaml`](https://github.com/rook/rook/blob/{{ branchName }}/cluster/examples/kubernetes/ceph/filesystem-ec.yaml) file. | |||
Also see an example in the [`storageclass-ec.yaml`](https://github.com/rook/rook/blob/{{ branchName }}/cluster/examples/kubernetes/ceph/csi/cephfs/storageclass-ec.yaml) for how to configure the volume.) | |||
**IMPORTANT**: For erasure coded pools, we have to create a replicated pool as the default data pool and an erasure-coded pool as a secondary pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a confirmation from the cephfs team about the setattr that was suggested before? It isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, if you want a CephFS's dir tree data to be stored in a specific data pool (an ec pool in this scenario), then you'd need to set the dir.layout.pool attribute as done here, https://docs.ceph.com/en/latest/cephfs/file-layouts/#adding-a-data-pool-to-the-file-system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we wanted to confirm as while testing I didn't run any command for file layout and we can see here data is in the erasure-coded pool.
@subhamkrai , later in the doc 'Documentation/ceph-filesystem-crd.md' I see,
In the block above, you may want to add a sentence explicitly saying that the erasure coding pool should not be used as the default primary data pool. FYI, I've requested a clarification here, #8210 (comment) |
Thanks for the review I'll update the block. |
This pull request has merge conflicts that must be resolved before it can be merged. @subhamkrai please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork |
@Omar007 with the current change in my testing ec pool is working with cephfs. Can you confirm if it working for you or not? Thanks |
If we're talking about the original problem and being able to set it up as documented with only EC pools? No, not at all. This is still just a work-around relying on a replica pool. The core problem is not addressed so that is still not going to work. You should be able to use just EC pools. Either a single pool of the EC type or even any amount; the dir layouts setup should also be possible with all EC pools (e.g. a 6+2 and a 2+1 or even more variations), without the use of a (replica) dummy pool. Neither are currently possible and this MR changes nothing to fix that, nor does it introduce anything new that wasn't already working. As such, at this point there is no reason for me to put any more time into this subject until either the supposed-to-be-working setup is:
With the information I've currently seen, I'm inclined to say the only valid 'final' solution right now would be the first or second bullet. |
I was able to get filesystem-ec working with the patches mentioned in https://github.com/rook/rook/pull/8452/files please see this link #9244 (comment) for details. |
This is going to interfere a bit with this as well. #9296 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we go ahead and get this merged. The instructions do enable a working EC filesystem as has been confirmed from multiple sources. Defining the extra replica pool is currently a requirement from cephfs, so will need separate follow-up on the possibility of removing that requirement. For now, let's at least merge this to fix the docs and example.
@subhamkrai Please do resolve the merge conflicts and review my final comments from last week.
Would you mind to add some validation on cephfs create, that first element of dataPools is not erasureCoded? This will be helpfull for runtime debug without inspecting docs. |
Could we not? Or at least only log it because blocking it completely would completely prevent manually intervening to enable a should-be-working use case. At least now I can still implement it as documented by manually intervening but if the operator puts a restriction here, even that is no longer possible :/ |
Logging a warning sounds fine so you can keep the manual workaround. |
1dde701
to
4d4e76f
Compare
When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also, changing the pool name in storageclass-ec file. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>
4d4e76f
to
5bb29f1
Compare
rebased! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine with the note that we will have to do a little work to consider in
#9296 how to verify the same thing. And maybe longer term we can find a more elegant solution.
With the latest discussion in #9296 I don't see that the EC filesystem would be impacted other than you could choose to name the pools. The ordering of the pools in the CR would still be used. |
mds: create EC pool as secondary pool (backport #8452)
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.
Closes: #8210
Signed-off-by: subhamkrai srai@redhat.com
Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #8210
Checklist:
make codegen
) has been run to update object specifications, if necessary.