Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bluefs _allocate unable to allocate 0x90000 on bdev 1 #9885

Closed
osaffer opened this issue Mar 10, 2022 · 10 comments
Closed

bluefs _allocate unable to allocate 0x90000 on bdev 1 #9885

osaffer opened this issue Mar 10, 2022 · 10 comments
Labels

Comments

@osaffer
Copy link

osaffer commented Mar 10, 2022

ceph-version: 16.2.6-0
rook-version: v1.7.4

Hi,

One nice morning that all of my osd pod were crashed, except one by node.
When I check OSD log I can see :
ebug -7> 2022-03-10T09:59:05.158+0000 7ff21a9db080 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1646906345165107, "job": 1, "event": "recovery_started", "log_files": [17870]}
debug -6> 2022-03-10T09:59:05.158+0000 7ff21a9db080 4 rocksdb: [db_impl/db_impl_open.cc:760] Recovering log #17870 mode 2
debug -5> 2022-03-10T09:59:05.430+0000 7ff21a9db080 3 rocksdb: [le/block_based/filter_policy.cc:584] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_version>=5.
debug -4> 2022-03-10T09:59:05.434+0000 7ff21a9db080 1 bluefs _allocate unable to allocate 0x90000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x4ffc00000, block size 0x10000, free 0x0, fragmentation 0, allocated 0x0
debug -3> 2022-03-10T09:59:05.434+0000 7ff21a9db080 -1 bluefs _allocate allocation failed, needed 0x80cbb
debug -2> 2022-03-10T09:59:05.434+0000 7ff21a9db080 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x80cbb
debug -1> 2022-03-10T09:59:05.442+0000 7ff21a9db080 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7ff21a9db080 time 2022-03-10T09:59:05.440116+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/os/bluestore/BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")

After some researches, I noticed that some people got more or less the same issue:
They talked about a workaround :
[osd]
bluestore_allocator = bitmap

Can you tell me where I can set this parameter?

I have also added some new disk on each nodes

Thank you very much

@osaffer osaffer added the bug label Mar 10, 2022
@osaffer
Copy link
Author

osaffer commented Mar 10, 2022

Ok I found to change the type, but it did not solve :
ceph config set osd.0 bluestore_allocator bitmap
So I put back hybrid

@travisn
Copy link
Member

travisn commented Mar 10, 2022

Yes, that command in the toolbox should work. Another way to set it is in the ceph.conf overrides.

@osaffer
Copy link
Author

osaffer commented Mar 11, 2022

Hi,

It has been fixed.

  1. Extend disk
  2. resize pvresize /dev/sdX
  3. pvdisplay -m to find the disk associated
  4. lvextend -L +10G /dev/mapper/ceph--eda319c5--cce0--4a33--90d1--9ecf950676f5-osd--data--08884bac--6bac--4ae5--8a28--cdc43de1b85e

When done, put a sleep inside the pod
Edit osd deployment, change command ceph-osd to
command:

  • sh
  • -c
  • sleep 10000
    Open session inside osd pod
    ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-X
    parted
    edit osd deployment put back ceph-osd

Then OSD should be green again

@BlaineEXE
Copy link
Member

Closing this since it seems resolved.

@harrykas
Copy link

harrykas commented Sep 21, 2022

@osaffer I've encountered the same issue and your plan works, thank you very much!
Just want to ask - what the exact reason of this and how to prevent it from happening again?

@osaffer
Copy link
Author

osaffer commented Sep 21, 2022

Some OSD were full If I remember well ... I would say, monitor your OSD :D
I have enabled prometheus and I monitor with Grafana

@harrykas
Copy link

@osaffer hehe, you're right, that ceph instance was full. It is for development and another team supports it so I have no monitoring for it. Yet :)
Thank you!

@osaffer
Copy link
Author

osaffer commented Sep 22, 2022

@osaffer hehe, you're right, that ceph instance was full. It is for development and another team supports it so I have no monitoring for it. Yet :) Thank you!

You are welcome ... my environment is also a development one so , I had not configured any monitoring.
But thanks to the problems, we acquire good knowledge :D

@thenamehasbeentake
Copy link

thenamehasbeentake commented Oct 13, 2023

https://tracker.ceph.com/issues/53466
We encountered the same problem. Add bluefs_shared_alloc_size=4096 to our ceph.conf for the OSDs, and the osd can be restored temporarily.

@microyahoo
Copy link
Member

microyahoo commented Nov 3, 2023

The issue is fixed in ceph/ceph#48854, and related issues: https://tracker.ceph.com/issues/53899
and https://tracker.ceph.com/issues/53466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants