Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Memory Leak in WatchBackend #595

Closed
murphymj25 opened this issue Jan 24, 2019 · 8 comments
Closed

Possible Memory Leak in WatchBackend #595

murphymj25 opened this issue Jan 24, 2019 · 8 comments

Comments

@murphymj25
Copy link
Contributor

murphymj25 commented Jan 24, 2019

Over the last few weeks we've been noticing that memory usage for our instances continues to grow. I performed a pprof and found that about 1/3 of the used memory at the time was for main.WatchBackend. In our environment, we use the manual overrides to control when services are introduced to be load balanced as well as allow us to disable load balancing while allowing the service to be healthy. Because of that we have a pretty large number of overrides (currently ~3700, and we are pretty much constantly adding and removing manual overrides.

@pschultz
Copy link
Member

That's an allocation profile, isn't it? The inuse_space profile would be relevant if you're investigating memory consumption.

@murphymj25
Copy link
Contributor Author

murphymj25 commented Jan 25, 2019

@pschultz I need to update my post, i think my pprof was pointing to the wrong culprit within WatchBackend. I'll provide more detail shortly.

@murphymj25
Copy link
Contributor Author

murphymj25 commented Jan 25, 2019

Looking at the updated pprof, we see the following:

ROUTINE ======================== main.watchBackend in /Users/spjaw2/code/go/src/github.com/fabiolb/fabio/main.go
    1.38GB     1.45GB (flat, cum) 79.31% of Total
         .          .    395:        case mancfg = <-man:
         .          .    396:        }
         .          .    397:
         .          .    398:        // manual config overrides service config
         .          .    399:        // order matters
    1.38GB     1.38GB    400:        next := svccfg + "\n" + mancfg
         .          .    401:        if next == last {
         .          .    402:            continue
         .          .    403:        }
         .          .    404:
         .   165.05kB    405:        aliases, err := route.ParseAliases(next)
         .          .    406:        if err != nil {
         .          .    407:            log.Printf("[WARN]: %s", err)
         .          .    408:        }
         .          .    409:        registry.Default.Register(aliases)
         .          .    410:
         .    68.43MB    411:        t, err := route.NewTable(next)
         .          .    412:        if err != nil {
         .          .    413:            log.Printf("[WARN] %s", err)
         .          .    414:            continue
         .          .    415:        }
         .          .    416:        route.SetTable(t)
         .     2.35MB    417:        logRoutes(t, last, next, cfg.Log.RoutesFormat)
         .          .    418:        last = next
         .          .    419:
         .          .    420:        once.Do(func() { close(first) })
         .          .    421:    }
         .          .    422:}

@murphymj25
Copy link
Contributor Author

@pschultz the above pprof is an inuse_space profile.

@murphymj25
Copy link
Contributor Author

I'm still digging into this issue. I've been unsuccessful in recreating the behavior in a lab environment. I'm gathering some more pprofs from devices to try and narrow down what's happening. It's possible that our solution to address #611 will resolve this behavior as well.

@murphymj25
Copy link
Contributor Author

We made some good progress on this yesterday. From what we can tell, the memory leak has something to do with "next" being a string. We have switched it to be a bytes.Buffer and changed the Parse function in parse_new.go to use a bufio.NewScanner. With these changes we are no longer seeing the rapid increase in memory utilization. We're going to do some additional cleanup and also look at whether using strings.builder works as well. We should be able to have a PR and some really good performance comparisons that we can share soon.

@ketzacoatl
Copy link

@murphymj25 that sounds great, thanks debugging and resolving that. Looking forward to the PR!

@murphymj25
Copy link
Contributor Author

I have created pull request #629 and added some detail on memory performance comparing the pull request to the current 1.5.11 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants