Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors and internal connection leak #15022

Closed
lihaohua66 opened this issue Jun 1, 2022 · 14 comments
Closed

File descriptors and internal connection leak #15022

lihaohua66 opened this issue Jun 1, 2022 · 14 comments

Comments

@lihaohua66
Copy link

lihaohua66 commented Jun 1, 2022

NOTE

After upgrading to RELEASE.2022-05-19T18-20-59Z, we encountered the following issues in both UI and clients after running the cluster for few days (The cluster has very low work load) :

 A timeout exceeded while waiting to proceed with the request, please reduce your request rate.

We could observe the file descriptor keeping increasing in the last few days.
image

We could also see thousands of file descriptors as below:

minio   20109 minio  178u     sock                0,9      0t0   513460923 protocol: TCPv6
minio   20109 minio  179u     sock                0,9      0t0   513255278 protocol: TCPv6
minio   20109 minio  180u     sock                0,9      0t0   513603050 protocol: TCPv6
minio   20109 minio  181u     sock                0,9      0t0   513340630 protocol: TCPv6
minio   20109 minio  182u     sock                0,9      0t0   513542780 protocol: TCPv6
minio   20109 minio  183u     sock                0,9      0t0   513382558 protocol: TCPv6
minio   20109 minio  184u     sock                0,9      0t0   513726529 protocol: TCPv6

Expected Behavior

The cluster should recycle the file descriptor/connections and should not stop working.

Current Behavior

The cluster could not handle client request. The performance is downgrade

Possible Solution

Steps to Reproduce (for bugs)

  1. Run monio baremetal with RELEASE.2022-05-19T18-20-59Z for few days
  2. Check the file descriptor on each servers.

Context

Regression

Your Environment

  • Version used (minio --version): RELEASE.2022-05-19T18-20-59Z
  • Server setup and configuration: 6 servers with Minio Baremetal
  • Operating System and version (uname -a): Linux 5.4.0-54-generic Adding more api suite tests #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@harshavardhana
Copy link
Member

@lihaohua66 this doesn't make sense - you should provide us some profiling data regarding the server.

There should be no leak etc at all unless you are making some calls.

Please run mc support profile and attach it here.

@lihaohua66
Copy link
Author

@harshavardhana
The profile is attached profile.zip
Currently, it has about 300 entries like minio 20109 minio 484u sock 0,9 0t0 513646721 protocol: TCPv6.

lsof -p 20109 | grep TCPv6 | wc -l
335

btw, do you know what does this kind of open file mean?

@lihaohua66
Copy link
Author

We restarted the whole cluster today, you could see the file descriptor keep increasing.
image

@harshavardhana
Copy link
Member

@harshavardhana The profile is attached profile.zip Currently, it has about 300 entries like minio 20109 minio 484u sock 0,9 0t0 513646721 protocol: TCPv6.

lsof -p 20109 | grep TCPv6 | wc -l
335

btw, do you know what does this kind of open file mean?

I have the same working cluster for the last 6days nothing is going on - are you running replication etc on this setup or is it a single setup?

are you using operator? @lihaohua66

@harshavardhana
Copy link
Member

there is only one node in the profile - is this a single disk setup? @lihaohua66

@harshavardhana
Copy link
Member

It looks like you have Client API calls calling the server

goroutine profile: total 1191
331 @ 0x439b96 0x4057c5 0x40537d 0x1e157b0 0x1e1a827 0x1e19a7b 0x46b8c1
#       0x1e157af       github.com/minio/minio/cmd.mergeEntryChannels+0x68f                     github.com/minio/minio/cmd/metacache-entries.go:689
#       0x1e1a826       github.com/minio/minio/cmd.(*erasureServerPools).listMerged+0x4e6       github.com/minio/minio/cmd/metacache-server-pool.go:303
#       0x1e19a7a       github.com/minio/minio/cmd.(*erasureServerPools).listPath.func3+0xda    github.com/minio/minio/cmd/metacache-server-pool.go:236

331 @ 0x439b96 0x44accc 0x44aca6 0x4675a5 0x477691 0x1e194e9 0x1d75bd6 0x1d7465c 0x1c4c73a 0x1dd29f3 0x1dd29ee 0x6ce22f 0x794887 0x1dd0759 0x1dd0734 0x1dd2eb2 0x1dd2e99 0x6ce22f 0x1dcb0d8 0x6ce22f 0x1dcbc42 0x6ce22f 0x1dcac6a 0x6ce22f 0x1dc9d0e 0x6ce22f 0x1dc90af 0x6ce22f 0x1cbce8f 0x6ce22f 0x1dc93c3 0x6ce22f
#       0x4675a4        sync.runtime_Semacquire+0x24                                            runtime/sema.go:56
#       0x477690        sync.(*WaitGroup).Wait+0x70                                             sync/waitgroup.go:130
#       0x1e194e8       github.com/minio/minio/cmd.(*erasureServerPools).listPath+0x1428        github.com/minio/minio/cmd/metacache-server-pool.go:242
#       0x1d75bd5       github.com/minio/minio/cmd.(*erasureServerPools).ListObjects+0x795      github.com/minio/minio/cmd/erasure-server-pool.go:1213
#       0x1d7465b       github.com/minio/minio/cmd.(*erasureServerPools).ListObjectsV2+0xdb     github.com/minio/minio/cmd/erasure-server-pool.go:1093
#       0x1c4c739       github.com/minio/minio/cmd.objectAPIHandlers.ListObjectsV2Handler+0x679 github.com/minio/minio/cmd/bucket-listobjects-handlers.go:241
#       0x1dd29f2       net/http.HandlerFunc.ServeHTTP+0x52                                     net/http/server.go:2047
#       0x1dd29ed       github.com/minio/minio/cmd.httpTraceAll.func1+0x4d                      github.com/minio/minio/cmd/handler-utils.go:364
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x794886        github.com/klauspost/compress/gzhttp.NewWrapper.func1.1+0x346           github.com/klauspost/compress@v1.15.1/gzhttp/compress.go:390
#       0x1dd0758       net/http.HandlerFunc.ServeHTTP+0x418                                    net/http/server.go:2047
#       0x1dd0733       github.com/minio/minio/cmd.maxClients.func1+0x3f3                       github.com/minio/minio/cmd/handler-api.go:273
#       0x1dd2eb1       net/http.HandlerFunc.ServeHTTP+0x111                                    net/http/server.go:2047
#       0x1dd2e98       github.com/minio/minio/cmd.collectAPIStats.func1+0xf8                   github.com/minio/minio/cmd/handler-utils.go:391
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dcb0d7       github.com/minio/minio/cmd.setBucketForwardingHandler.func1+0x217       github.com/minio/minio/cmd/generic-handlers.go:402
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dcbc41       github.com/minio/minio/cmd.addCustomHeaders.func1+0x3c1                 github.com/minio/minio/cmd/generic-handlers.go:477
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dcac69       github.com/minio/minio/cmd.setRequestValidityHandler.func1+0xb89        github.com/minio/minio/cmd/generic-handlers.go:390
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dc9d0d       github.com/minio/minio/cmd.setHTTPStatsHandler.func1+0x10d              github.com/minio/minio/cmd/generic-handlers.go:281
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dc90ae       github.com/minio/minio/cmd.setRequestLimitHandler.func1+0x38e           github.com/minio/minio/cmd/generic-handlers.go:112
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1cbce8e       github.com/minio/minio/cmd.setCrossDomainPolicy.func1+0xee              github.com/minio/minio/cmd/crossdomain-xml-handler.go:43
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047
#       0x1dc93c2       github.com/minio/minio/cmd.setBrowserRedirectHandler.func1+0x182        github.com/minio/minio/cmd/generic-handlers.go:141
#       0x6ce22e        net/http.HandlerFunc.ServeHTTP+0x2e                                     net/http/server.go:2047

@lihaohua66
Copy link
Author

we have 6 nodes in the cluster, and each node has 24 disks. let me send you the full profile.

@harshavardhana
Copy link
Member

we have 6 nodes in the cluster, and each node has 24 disks. let me send you the full profile.

you need to send everything not just one node.

@lihaohua66
Copy link
Author

yes, we could have some clients access this minio. but the work load should be very low, and it should not have so many open file descriptors.

@harshavardhana
Copy link
Member

yes, we could have some clients access this minio. but the work load should be very low, and it should not have so many open file descriptors.

it has because there 300+ listing requests @lihaohua66

@harshavardhana
Copy link
Member

Okay from what it looks like you are facing this

Author: Klaus Post <klauspost@gmail.com>
Date:   Mon May 23 06:28:46 2022 -0700

    Fix WalkDir fallback hot loop (#14961)
    
    Fix fallback hot loop
    
    fd was never refreshed, leading to an infinite hot loop if a disk failed and the fallback disk fails as well.
    
    Fix & simplify retry loop.
    
    Fixes #14960

You need to upgrade your setup @lihaohua66

@harshavardhana
Copy link
Member

  • Version used (minio --version): RELEASE.2022-05-19T18-20-59Z

Use the latest release.

@lihaohua66
Copy link
Author

@harshavardhana thank you for your help. let me try upgrade.

@lihaohua66
Copy link
Author

lihaohua66 commented Jun 3, 2022

@harshavardhana we have upgrade to 20220602021104.0.0, but we are still observing increasing file descriptor. could you help take another look? we restarted the whole cluster several times, so you could see few clifs here.
image

with version 20211209061941.0.0, we don't see this issue.
image

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants