Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heap samples are not what I expect #116

Closed
dumbFeng opened this issue Aug 4, 2022 · 4 comments
Closed

heap samples are not what I expect #116

dumbFeng opened this issue Aug 4, 2022 · 4 comments

Comments

@dumbFeng
Copy link
Contributor

dumbFeng commented Aug 4, 2022

What version of Go are you using (go version)?

$  1.17.8

Does this issue reproduce with the latest release?

Haven't tried.

What operating system and processor architecture are you using (go env)?

go env Output
$ MacOS and amd64

What did you do?

I wrote a piece of code:

var (
    default = 1073741824
    a []byte
)

func MakeNGbSlice(ctx *gin.Context) {
    sizeStr ,ok := ctx.GetQuery("size")
    if ok {
        size, _ := strconv.Atoi(sizeStr)
        if size > 0 {
            defaultSize = size * defaultSize
        }
    }

    a = make([]byte, 0, defaultSize)
    for i := 0; i < defaultSize; i ++ {
        a = append(a, byte('a'))
    }
    time.Sleep(time.Second * 10)
    a = nil // for gc
}

Then I curled the api to trigger the code to run.

What did you expect to see?

  1. The RSS my app used went up to 1GB.
  2. The heap profile data was right and could help me find out the reason why RSS went up after I dumped the heap samples.

What did you see instead?

  1. The RSS my app used went up to 1GB.
  2. The heap profile seemed to be weird:
Type: inuse_space
Time: Aug 3, 2022 at 2:32pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 12052.56kB, 64.38% of 18721.44kB total
Showing top 10 nodes out of 153
      flat  flat%   sum%        cum   cum%
 2562.81kB 13.69% 13.69%  3075.02kB 16.43%  runtime.allocm
 2048.81kB 10.94% 24.63%  2048.81kB 10.94%  runtime.malg
 2048.19kB 10.94% 35.57%  2048.19kB 10.94%  github.com/Shopify/sarama.(*TopicMetadata).decode
 1097.69kB  5.86% 41.44%  1097.69kB  5.86%  github.com/Shopify/sarama.(*client).updateMetadata
 1089.33kB  5.82% 47.26%  1089.33kB  5.82%  google.golang.org/grpc/internal/transport.newBufWriter
// ...
  1. After about 2 minutes, I profiled again and the result was what I expected:
Type: inuse_space
Time: Aug 3, 2022 at 2:33pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1024.50MB, 98.25% of 1042.78MB total
Dropped 154 nodes (cum <= 5.21MB)
Showing top 10 nodes out of 18
      flat  flat%   sum%        cum   cum%
    1024MB 98.20% 98.20%     1024MB 98.20%  .../examples.Make1GbSlice
    0.50MB 0.048% 98.25%     6.63MB  0.64%  runtime.main
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Context).Next (inline)
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Engine).ServeHTTP
         0     0% 98.25%  1024.51MB 98.25%  github.com/gin-gonic/gin.(*Engine).handleHTTPRequest

So, it seems like that the samples pprof dumps in time are wrong.

For more discussion, Please see golang issue #54233

@doujiang24
Copy link
Member

As golang/go#54223 (comment) said, heap profiles are only updated when the gc runs.
Actually, it's a go heap profile related issue, not a holmes related issue.
For more details, please see:
https://mosn.io/blog/posts/mosn-holmes-design/#%E6%A0%B9%E6%8D%AEgc%E5%91%A8%E6%9C%9F%E9%87%87%E9%9B%86
https://uncledou.site/2022/go-pprof-heap/

Hope this could help you.

@dumbFeng
Copy link
Contributor Author

dumbFeng commented Aug 4, 2022

@doujiang24 Nice! Thank you.

@dumbFeng
Copy link
Contributor Author

dumbFeng commented Aug 4, 2022

But I think they are two different situations. About the one I proposed, I think holmes should dump after next gc runs while the heap peak occurs. In the following processes, the heap profiles are stale and cann't help us find out what incurs the problem:

  1. gc runs
  2. heap peak happens. Such as making a big slice
  3. holmes dumps
  4. next gc runs

@cch123
Copy link
Collaborator

cch123 commented Aug 7, 2022

You should see both the inuse and alloc space together to help locating the root cause

A big heap object will cause rss spike, if it is not used any more later, it will not appear on the inuse object pprof list, but you can find it in the alloc object list

All objects allocated after app started will be counted in the alloc objects/space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants