New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why monocular-api have so much restarts ? #450

Open
cabrinoob opened this Issue May 16, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@cabrinoob

cabrinoob commented May 16, 2018

Hi,
I'am using monocular and everything works fine, but I realized that the API pods are restarting very often :

restarts

As you can see on the screenshot above, one of the pods retarted 61 times in 20 hours.

@prydonius

This comment has been minimized.

Member

prydonius commented May 16, 2018

This happens pretty commonly for large chart repositories (e.g. the stable repository), it takes a while for Monocular to index it and Kubernetes will try to kill it. You can try increasing the charts' livenessProbe delay to prevent this from happening: https://github.com/kubernetes-helm/monocular/blob/master/deployment/monocular/values.yaml#L25

@fkpwolf

This comment has been minimized.

fkpwolf commented May 31, 2018

Why not response livenessProbe in time and in same time scrape chart from network?

@fkpwolf

This comment has been minimized.

fkpwolf commented Jul 4, 2018

To make container response livenessProbe in time, I change foreground refresh to goroutine in main.go:

// Run foreground repository refresh
go chartsImplementation.Refresh()

So when pod are starting at first time, it will start livenessProbe REST service in time then will not be killed by k8s.

And I also find lots of OOM issue:

[147913.492743] Memory cgroup stats for /kubepods.slice/kubepods-podbee65b57_7e71_11e8_9ced_d24398b14524.slice/docker-5d3a4c6e75105b3ef0bdfd215ebec12e9a2d32341f369e75125954cb3eeed903.scope: cache:0KB rss:233428KB rss_huge:2048KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:233428KB inactive_file:0KB active_file:0KB unevictable:0KB
[147913.544835] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[147913.546739] [24389]     0 24389      253        1       4        0          -998 pause
[147913.548605] [24518]     0 24518    66805    60921     134        0          -998 monocular
[147913.550548] Memory cgroup out of memory: Kill process 26074 (monocular) score 54 or sacrifice child
[147913.552593] Killed process 24518 (monocular) total-vm:267220kB, anon-rss:232620kB, file-rss:11064kB, shmem-rss:0kB
[147919.170462] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)

To fix this, increase Pods spec:

            "resources": {
              "limits": {
                "cpu": "100m",
                "memory": "928Mi"
              },
              "requests": {
                "cpu": "100m",
                "memory": "428Mi"
              }
            },

Still didn't know why it exhaust so many memory. Maybe we can decrease parallel download semaphore? @prydonius Currently it is 15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment