Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default install rookctl returns mount: wrong fs type, bad option, bad superblock #1220

Closed
page-fault-in-nonpaged-area opened this issue Nov 9, 2017 · 10 comments

Comments

@page-fault-in-nonpaged-area
Copy link

Hello all,

I followed the instructions here and here on a brand new K8S 1.8 cluster and when testing the cluster rookctl gave me this when attempting to test mount as per the guide (Shared filesystem):

root@rook-tools:/# rookctl filesystem mount --name myfs --path /tmp/registry
2017-11-09 22:04:40.933608 I | mount 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/: mount: wrong fs type, bad option, bad superblock on 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/,
2017-11-09 22:04:40.933789 I | mount 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/:        missing codepage or helper program, or other error
2017-11-09 22:04:40.933848 I | mount 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/: 
2017-11-09 22:04:40.933894 I | mount 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/:        In some cases useful info is found in syslog - try
2017-11-09 22:04:40.933950 I | mount 10.96.36.154:6790,10.105.4.115:6790,10.110.163.232:6790:/:        dmesg | tail or so.

Upon inspecting the ceph status:

root@rook-tools:/# ceph status                                                                                                                                                                                                    
  cluster:
    id:     40a50b5c-ba77-44e4-9697-77c0a79dd0c6
    health: HEALTH_WARN
            Reduced data availability: 100 pgs inactive, 100 pgs incomplete
            Degraded data redundancy: 42/63 objects degraded (66.667%), 200 pgs unclean, 100 pgs degraded, 100 pgs undersized
            too many PGs per OSD (400 > max 300)
 
  services:
    mon: 3 daemons, quorum rook-ceph-mon0,rook-ceph-mon2,rook-ceph-mon1
    mgr: rook-ceph-mgr0(active), standbys: rook-ceph-mgr1
    mds: myfs-1/1/1 up  {0=mj4qft=up:active}, 1 up:standby-replay
    osd: 1 osds: 1 up, 1 in
 
  data:
    pools:   2 pools, 200 pgs
    objects: 21 objects, 2246 bytes
    usage:   2050 MB used, 16783 MB / 18833 MB avail
    pgs:     50.000% pgs not active
             42/63 objects degraded (66.667%)
             100 active+undersized+degraded
             100 creating+incomplete
 
  io:
    client:   1277 B/s rd, 2 op/s rd, 0 op/s wr

All pods are running fine. The myfs volume mounts fine inside pods but writing anything to it it hangs.

root@rook-tools:/# ceph status                                                                                                       
  cluster:
    id:     40a50b5c-ba77-44e4-9697-77c0a79dd0c6
    health: HEALTH_WARN
            Reduced data availability: 100 pgs inactive, 100 pgs incomplete
            Degraded data redundancy: 42/63 objects degraded (66.667%), 200 pgs unclean, 100 pgs degraded, 100 pgs undersized
            2 slow requests are blocked > 32 sec
            too many PGs per OSD (400 > max 300)
 
  services:
    mon: 3 daemons, quorum rook-ceph-mon0,rook-ceph-mon2,rook-ceph-mon1
    mgr: rook-ceph-mgr0(active), standbys: rook-ceph-mgr1
    mds: myfs-1/1/1 up  {0=m6km28=up:active}, 1 up:standby-replay
    osd: 1 osds: 1 up, 1 in
 
  data:
    pools:   2 pools, 200 pgs
    objects: 21 objects, 4254 bytes
    usage:   2051 MB used, 16782 MB / 18833 MB avail
    pgs:     50.000% pgs not active
             42/63 objects degraded (66.667%)
             100 active+undersized+degraded
             100 creating+incomplete
 
  io:
    client:   851 B/s rd, 1 op/s rd, 0 op/s wr
 

after hanging it shows I have slow requests? This is what rookctl gives me

root@rook-tools:/# rookctl status
OVERALL STATUS: WARNING

SUMMARY:
SEVERITY   NAME              MESSAGE
WARNING    PG_AVAILABILITY   Reduced data availability: 100 pgs inactive, 100 pgs incomplete
WARNING    PG_DEGRADED       Degraded data redundancy: 42/63 objects degraded (66.667%), 200 pgs unclean, 100 pgs degraded, 100 pgs undersized
WARNING    REQUEST_SLOW      2 slow requests are blocked > 32 sec
WARNING    TOO_MANY_PGS      too many PGs per OSD (400 > max 300)

USAGE:
TOTAL       USED       DATA       AVAILABLE
18.39 GiB   2.00 GiB   5.68 KiB   16.39 GiB

MONITORS:
NAME             ADDRESS                 IN QUORUM   STATUS
rook-ceph-mon0   10.96.36.154:6790/0     true        OK
rook-ceph-mon2   10.105.4.115:6790/0     true        OK
rook-ceph-mon1   10.110.163.232:6790/0   true        OK

MGRs:
NAME             STATUS
rook-ceph-mgr0   Active
rook-ceph-mgr1   Standby

OSDs:
TOTAL     UP        IN        FULL      NEAR FULL
1         1         1         false     false

PLACEMENT GROUPS (200 total):
STATE                        COUNT
active+undersized+degraded   100
creating+incomplete          100

Any ideas?

@travisn
Copy link
Member

travisn commented Nov 9, 2017

The hangs are likely happening because the pools in the file system were created with too much redundancy. You only have one osd in the cluster, which means you can only have one replica, and you can't use erasure coding.

That means that the file system spec would need to look like this:

  metadataPool:
    replicated:
      size: 1
  dataPools:
    - replicated:
       size: 1

@page-fault-in-nonpaged-area
Copy link
Author

Hmm... doesn't seem to be working.

If I leave the filesystem spec as is without mds rookctl complains no api endpoints even though the api endpoints are up and running just fine.

If I leave the filesystem spec like this:

demonfuse@Williams-MacBook-Pro ~/K/r/r/c/e/kubernetes> cat rook-filesystem.yaml 
apiVersion: rook.io/v1alpha1
kind: Filesystem
metadata:
  name: myfs
  namespace: rook
spec:
  # The metadata pool spec
  metadataPool:
    replicated:
      size: 1
  # The list of data pool specs
  dataPools:
    - replicated:
       size: 1
  # The metadata service (mds) configuration
  metadataServer:
    activeCount: 1
    activeStandby: true

rookctl gives me exactly the same error

root@rook-tools:/# rookctl filesystem mount --name myfs --path /tmp/registry
2017-11-09 23:36:51.793055 I | mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/: mount: wrong fs type, bad option, bad superblock on 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/,
2017-11-09 23:36:51.795005 I | mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/:        missing codepage or helper program, or other error
2017-11-09 23:36:51.795152 I | mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/: 
2017-11-09 23:36:51.795243 I | mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/:        In some cases useful info is found in syslog - try
2017-11-09 23:36:51.795329 I | mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/:        dmesg | tail or so.
command mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/ failed: Failed to complete mount 10.99.86.50:6790,10.100.78.40:6790,10.102.153.23:6790:/: exit status 32

@kokhang
Copy link
Member

kokhang commented Nov 9, 2017

@Ascendance what kernel version is the node using?

@page-fault-in-nonpaged-area
Copy link
Author

ubuntu@slave-1:/var/lib$ uname -a
Linux slave-1 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

4.4.0

@page-fault-in-nonpaged-area
Copy link
Author

I kept tearing down and creating new rook cluster, here's how it went so far:

demonfuse@Williams-MacBook-Pro ~/K/r/r/c/e/kubernetes> kubectl exec -it rook-tools -n rook -- bash 
root@rook-tools:/# rookctl status
OVERALL STATUS: OK

USAGE:
TOTAL       USED       DATA       AVAILABLE
18.39 GiB   2.00 GiB   2.19 KiB   16.39 GiB

MONITORS:
NAME             ADDRESS                 IN QUORUM   STATUS
rook-ceph-mon2   10.96.220.234:6790/0    true        OK
rook-ceph-mon0   10.99.180.182:6790/0    true        OK
rook-ceph-mon1   10.104.151.240:6790/0   true        OK

MGRs:
NAME             STATUS
rook-ceph-mgr0   Active
rook-ceph-mgr1   Standby

OSDs:
TOTAL     UP        IN        FULL      NEAR FULL
1         1         1         false     false

PLACEMENT GROUPS (200 total):
STATE          COUNT
active+clean   200
root@rook-tools:/# mkdir /tmp/registry
root@rook-tools:/# rookctl filesystem mount --name myfs --path /tmp/registry
2017-11-10 00:32:18.978671 I | mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/: mount: wrong fs type, bad option, bad superblock on 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/,
2017-11-10 00:32:18.979499 I | mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/:        missing codepage or helper program, or other error
2017-11-10 00:32:18.979613 I | mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/: 
2017-11-10 00:32:18.979671 I | mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/:        In some cases useful info is found in syslog - try
2017-11-10 00:32:18.979703 I | mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/:        dmesg | tail or so.
command mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/ failed: Failed to complete mount 10.96.220.234:6790,10.99.180.182:6790,10.104.151.240:6790:/: exit status 32
root@rook-tools:/# 

with this as the filesystem spec

apiVersion: rook.io/v1alpha1
kind: Filesystem
metadata:
  name: myfs
  namespace: rook
spec:
  metadataPool:
    replicated:
      size: 1
  dataPools:
    - replicated:
       size: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

Any ideas what went wrong?

@page-fault-in-nonpaged-area
Copy link
Author

:D

I figured this might be an issue related to #1044. I bashed into my other nodes with the volume mounted and it works fine!

@travisn Quick question. The official guide assumes that the K8S cluster has > 1 worker nodes, correct? My cluster currently has only 1 master and 1 worker. If I scale up the filesystem specs in the guide would work just fine?

@travisn
Copy link
Member

travisn commented Nov 10, 2017

yes, the documentation and samples do actually require 3 nodes to work. We should likely change the samples to work for a single node by default.

@kokhang
Copy link
Member

kokhang commented Nov 10, 2017

@Ascendance I think you might be running into #1200.

Are you using rookctl or Flexvolume mount in the pod?

@kokhang
Copy link
Member

kokhang commented Nov 10, 2017

If you use the Flexvolume to mount, it should work. Are you running 0.6.0?

@page-fault-in-nonpaged-area
Copy link
Author

I'm using rookctl :) it's working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants