Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: FSAL's state_free function doesn't actually free the state - was Question: Is the memory occupied by ceph_alloc_state normal? #1028

Open
Haroldll opened this issue Nov 9, 2023 · 9 comments

Comments

@Haroldll
Copy link

Haroldll commented Nov 9, 2023

The ganesha version I am using is 5.5.2 and the backend is CEPH.
Entries_HWMark is 500000, Chunks_HWMark is 0.
When I use vdbench to continuously write files to the nfs export, ganesha's memory keeps rising and the cache inode number remains at 500000.
So I was curious about what memory was increasing, so I used massif to analyze it. Then I found that the rising memory is ceph_alloc_state.
see:

> --------------------------------------------------------------------------------
>   n       time(ms)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
> --------------------------------------------------------------------------------
> 684     34,438,318    3,416,733,008    3,269,519,223   147,213,785            0
> 685     34,466,111    3,416,774,864    3,269,541,081   147,233,783            0
> 95.69% (3,269,541,081B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
> ->34.00% (1,161,613,056B) 0x1F409659: ceph_alloc_state (handle.c:1045)
> | ->34.00% (1,161,613,056B) 0x5580641: mdcache_alloc_state (mdcache_export.c:888)
> |   ->34.00% (1,161,613,056B) 0x5536867: open4_ex (nfs4_op_open.c:966)
> |     ->34.00% (1,161,613,056B) 0x5537D91: nfs4_op_open (nfs4_op_open.c:1404)
> |       ->34.00% (1,161,613,056B) 0x551FE20: process_one_op (nfs4_Compound.c:918)
> |         ->34.00% (1,161,613,056B) 0x55210F2: nfs4_Compound (nfs4_Compound.c:1382)
> |           ->34.00% (1,161,613,056B) 0x54638E3: nfs_rpc_process_request (nfs_worker_thread.c:1517)
> |             ->34.00% (1,161,613,056B) 0x5463E76: nfs_rpc_valid_NFS (nfs_worker_thread.c:1732)
> |               ->34.00% (1,161,613,056B) 0x5A2D6BA: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                 ->34.00% (1,161,613,056B) 0x5A28BD8: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                   ->34.00% (1,161,613,056B) 0x5A2D5BF: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                     ->34.00% (1,161,613,056B) 0x5A28B58: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                       ->31.47% (1,075,145,856B) 0x5A296EC: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                       | ->31.47% (1,075,145,856B) 0x5A34BB3: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                       |   ->31.47% (1,075,145,856B) 0x8E082FE: start_thread (in /usr/lib64/libpthread-2.28.so)
> |                       |     ->31.47% (1,075,145,856B) 0x9A39DD2: clone (in /usr/lib64/libc-2.28.so)
> |                       |
> |                       ->02.53% (86,467,200B) 0x5A34BB3: ??? (in /usr/lib64/libntirpc.so.5.0)
> |                         ->02.53% (86,467,200B) 0x8E082FE: start_thread (in /usr/lib64/libpthread-2.28.so)
> |                           ->02.53% (86,467,200B) 0x9A39DD2: clone (in /usr/lib64/libc-2.28.so)

This keeps growing...

->16.98% (580,000,000B) 0x55958A3: alloc_cache_entry (mdcache_lru.c:1719)
| ->16.98% (580,000,000B) 0x5595983: mdcache_lru_get (mdcache_lru.c:1766)
|   ->16.98% (580,000,000B) 0x558357E: mdcache_alloc_handle (mdcache_helpers.c:178)
|     ->16.98% (580,000,000B) 0x5585667: mdcache_new_entry (mdcache_helpers.c:728)

cache remains unchanged.

I saw that the ceph_alloc_state function applied for memory, but ceph_free_state does not release the requested memory, Is it specially designed? Is the memory here normal?

@dang
Copy link
Contributor

dang commented Nov 9, 2023

5.6 and 5.7 have fixes for memory usage in libcephfs. Please try 5.7 and see if that fixes your problem.

@ffilz
Copy link
Member

ffilz commented Nov 9, 2023

This looks like something different from what we've addressed. I'll have a look at it.

@ffilz ffilz added the Analyzing label Nov 9, 2023
@ffilz ffilz self-assigned this Nov 9, 2023
@Haroldll
Copy link
Author

I tried version 5.7 and still have this problem.

In addition, the ceph_alloc_state/ceph_free_state function seems to be called only by NFS V4. This problem does not occur when using V3.

	state = init_state(gsh_calloc(1, sizeof(struct ceph_state_fd)),
			   ceph_free_state, state_type, related_state);

I don’t see where the memory applied for here is released.

@ffilz
Copy link
Member

ffilz commented Nov 10, 2023

Yea, I think it actually affects all FSALs, or at least all FSALs probably have the same bug. Will try to address this today.

@ffilz ffilz changed the title Question: Is the memory occupied by ceph_alloc_state normal? BUG: FSAL's state_free function doesn't actually free the state - was Question: Is the memory occupied by ceph_alloc_state normal? Nov 10, 2023
@ffilz ffilz added bug and removed Analyzing labels Nov 10, 2023
@ffilz
Copy link
Member

ffilz commented Nov 10, 2023

This does impact all FSALs that implement state_free...

@Haroldll
Copy link
Author

I tried the new patch, the bug seems to be fixed, memory of ganesha is stable.

@ffilz
Copy link
Member

ffilz commented Nov 13, 2023

I tried the new patch, the bug seems to be fixed, memory of ganesha is stable.

Thanks for the verification.

@rhuitl
Copy link

rhuitl commented Dec 9, 2023

I think I'm facing the same problem! I see that the fix is scheduled for version 6. Do you already have an idea when it's going to be released? It has a nice list of fixes already, so soon would be highly appreciated :-) Or another bugfix release maybe?

@ffilz
Copy link
Member

ffilz commented Dec 11, 2023

This needs a backport to V5-stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants