Skip to content
This repository has been archived by the owner on May 6, 2022. It is now read-only.

Integration tests appear to hold tons of file descriptors open #1649

Closed
pmorie opened this issue Jan 4, 2018 · 18 comments · Fixed by #1651
Closed

Integration tests appear to hold tons of file descriptors open #1649

pmorie opened this issue Jan 4, 2018 · 18 comments · Fixed by #1651
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@pmorie
Copy link
Contributor

pmorie commented Jan 4, 2018

...eventually causing the process to crash on my personal machine.

Output:

$ lsof -p <pid of integration binary>
COMMAND     PID   USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME                                                                                                                                    
integrati 16615 pmorie  cwd       DIR              253,3     4096  2231290 /home/pmorie/go/src/github.com/kubernetes-incubator/service-catalog                                                                     
integrati 16615 pmorie  rtd       DIR              253,2     4096        2 /                                                                                                                                       
integrati 16615 pmorie  txt       REG              253,3 99104080  2233522 /home/pmorie/go/src/github.com/kubernetes-incubator/service-catalog/integration.test                                                    
integrati 16615 pmorie  mem-W     REG               0,42 16805888 58163457 /tmp/service_catalog_integration_test775825161/member/snap/db                                                                           
integrati 16615 pmorie  mem       REG              253,2    57008  1980864 /usr/lib64/libnss_files-2.26.so                                                                                                         
integrati 16615 pmorie  mem       REG              253,2  2254296  1980848 /usr/lib64/libc-2.26.so                                                                                                                 
integrati 16615 pmorie  mem       REG              253,2   153136  1980866 /usr/lib64/libpthread-2.26.so                                                                                                           
integrati 16615 pmorie  mem       REG              253,2   190528  1980842 /usr/lib64/ld-2.26.so                                                                                                                   
integrati 16615 pmorie    0u      CHR              136,6      0t0        9 /dev/pts/6                                                                                                                              
integrati 16615 pmorie    1u      CHR              136,6      0t0        9 /dev/pts/6                                                                                                                              
integrati 16615 pmorie    2u      CHR              136,6      0t0        9 /dev/pts/6                                                                                                                              
integrati 16615 pmorie    3u     unix 0xffff94918ee3b400      0t0 58163445 type=DGRAM                                                                                                                              
integrati 16615 pmorie    4u  a_inode               0,13        0    11173 [eventpoll]                                                                                                                             
integrati 16615 pmorie    5u     IPv4           58163451      0t0      TCP localhost.localdomain:etcd-server (LISTEN)                                                                                              
integrati 16615 pmorie    6u     IPv4           58163452      0t0      TCP localhost.localdomain:etcd-client (LISTEN)                                                                                              
integrati 16615 pmorie    7uW     REG               0,42 16805888 58163457 /tmp/service_catalog_integration_test775825161/member/snap/db                                                                           
integrati 16615 pmorie    8wW     REG               0,42 64000000 58163460 /tmp/service_catalog_integration_test775825161/member/wal/0000000000000000-0000000000000000.wal                                         
integrati 16615 pmorie    9r      DIR               0,42       80 58163458 /tmp/service_catalog_integration_test775825161/member/wal                                                                               
integrati 16615 pmorie   10wW     REG               0,42 64000000 58161917 /tmp/service_catalog_integration_test775825161/member/wal/0.tmp                                                                         
integrati 16615 pmorie   11u      REG               0,42   119327 58161919 /tmp/integration.test.localhost.pmorie.log.INFO.20180104-163445.16615                                                                   
integrati 16615 pmorie   12u     IPv4           58163471      0t0      TCP localhost.localdomain:58856->localhost.localdomain:etcd-client (ESTABLISHED)                                                            
integrati 16615 pmorie   13u     IPv4           58163472      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58856 (ESTABLISHED)                                                            
integrati 16615 pmorie   14u      REG               0,42    27240 58163480 /tmp/integration.test.localhost.pmorie.log.WARNING.20180104-163448.16615                                                                
integrati 16615 pmorie   15u     IPv4           58161935      0t0      TCP localhost.localdomain:58898->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   16u     IPv4           58161003      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58898 (ESTABLISHED)
integrati 16615 pmorie   17u     IPv4           58163482      0t0      TCP localhost.localdomain:58900->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   18u     IPv4           58161005      0t0      TCP localhost.localdomain:58902->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   19u     IPv4           58161936      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58902 (ESTABLISHED)
integrati 16615 pmorie   20u     IPv4           58161937      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58900 (ESTABLISHED)
integrati 16615 pmorie   21u     IPv4           58162762      0t0      TCP localhost.localdomain:58906->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   22u     IPv4           58162763      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58906 (ESTABLISHED)
integrati 16615 pmorie   23u     IPv4           58163483      0t0      TCP localhost.localdomain:58908->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   24u     IPv4           58163484      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:58908 (ESTABLISHED)
integrati 16615 pmorie   25u     IPv4           58162785      0t0      TCP localhost.localdomain:59002->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   26u     IPv4           58162765      0t0      TCP localhost.localdomain:34710->localhost.localdomain:23580 (ESTABLISHED)
integrati 16615 pmorie   27u     IPv6           58161946      0t0      TCP localhost.localdomain:23580->localhost.localdomain:34710 (ESTABLISHED)
integrati 16615 pmorie   28u     IPv4           58161969      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:59002 (ESTABLISHED)
integrati 16615 pmorie   29u     IPv4           58161013      0t0      TCP localhost.localdomain:59004->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   30u     IPv4           58162786      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:59004 (ESTABLISHED)
integrati 16615 pmorie   31u     IPv4           58161971      0t0      TCP localhost.localdomain:59006->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   32u     IPv4           58161014      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:59006 (ESTABLISHED)
integrati 16615 pmorie   33u     IPv4           58162788      0t0      TCP localhost.localdomain:59008->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   34u     IPv4           58161972      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:59008 (ESTABLISHED)
integrati 16615 pmorie   35u     IPv4           58163517      0t0      TCP localhost.localdomain:59010->localhost.localdomain:etcd-client (ESTABLISHED)
integrati 16615 pmorie   36u     IPv4           58162789      0t0      TCP localhost.localdomain:etcd-client->localhost.localdomain:59010 (ESTABLISHED)
integrati 16615 pmorie   37u     IPv4           58163563      0t0      TCP localhost.localdomain:59136->localhost.localdomain:etcd-client (ESTABLISHED)
@MHBauer
Copy link
Contributor

MHBauer commented Jan 4, 2018

I see this too.

@pmorie
Copy link
Contributor Author

pmorie commented Jan 4, 2018

It looks like the culprit is etcd client connections. Seems like etcd is not being shut down after each test case.

@MHBauer
Copy link
Contributor

MHBauer commented Jan 4, 2018

That's neat. Your lsof knows that port 2379 is etcd-client.

@n3wscott
Copy link
Contributor

n3wscott commented Jan 4, 2018

when I run make test-integration, it runs and passes.

when I run ./test/integration.sh, I can't because it is written assuming linux. I am on osx.

./test/integration.sh: line 22: realpath: command not found

@MHBauer
Copy link
Contributor

MHBauer commented Jan 4, 2018

We do not need that anymore with the builtin etcd.

@n3wscott
Copy link
Contributor

n3wscott commented Jan 5, 2018

The storage info object gets sent to genericapiserver.go, InstallAPIGroup. We need to look into if there is a bug where this class never calls the DestroyFunc for storage.

@n3wscott
Copy link
Contributor

n3wscott commented Jan 5, 2018

Also, I was mistaken, we use the same etcd server for all the integration tests. So what we have no is the same as we had before, the main difference is that you don't need to have etcd installed locally to run the unit tests.

Note, the embedded etcd will cluster with a local etcd if given a chance. But that is not the issue we are seeing.

See these:

kubernetes/kubernetes#50690
kubernetes/kubernetes#53617

@MHBauer
Copy link
Contributor

MHBauer commented Jan 5, 2018

I want a server that cleanly shuts down. We should document what we find.

goroutines definietly going up per test case, youch.

$ rg 'goroutine profile' test/itstacktracewithallgoroutines
8:goroutine profile: total 29
150:goroutine profile: total 151
662:goroutine profile: total 168
1060:goroutine profile: total 212
1498:goroutine profile: total 299
1926:goroutine profile: total 308
2289:goroutine profile: total 377

@MHBauer
Copy link
Contributor

MHBauer commented Jan 5, 2018

Digging about more, I also see k8s.io/apiserver/pkg/storage/etcd/testing. Maybe we can reuse that in the future or it'll have a workaround.

@pmorie pmorie reopened this Jan 5, 2018
@pmorie
Copy link
Contributor Author

pmorie commented Jan 5, 2018

This is still an issue and wasn't meant to be closed

@MHBauer
Copy link
Contributor

MHBauer commented Jan 5, 2018

lol, the logic behind the scenes there. the word "fix" and then a link to the issue.

@MHBauer
Copy link
Contributor

MHBauer commented Jan 9, 2018

If kubernetes/apiserver#30 gets implemented we'd have to update/backport to it to get the functionality.

I think with the existing 1.9 release we might be able to do this etcd-watcher cleanup by implementing the destroy on the storage and using a preshutdownhook that calls destroy. It might be easier instead to directly rely on the stopch and some how pass it to destroy, or have the destroy call wait on stopch to receive.

@MHBauer
Copy link
Contributor

MHBauer commented Jan 19, 2018

more data for runs and incoming pr https://gist.github.com/MHBauer/8c7c94c43f4a53b20ee447a926641166

duglin pushed a commit that referenced this issue Jan 27, 2018
partial mitigation for #1649
still many outstanding goroutines by test end
@MHBauer
Copy link
Contributor

MHBauer commented Jan 31, 2018

still threads hanging about, even if I pause for 5minutes after all tests have run. maybe a longer timeout is needed, but this is what I see:

goroutine profile: total 249
106 @ 0x457c3d 0x469ae3 0x875b80 0x48a461
#       0x875b7f        net/http.(*persistConn).readLoop+0xbff  /usr/local/go/src/net/http/transport.go:1654

106 @ 0x457c3d 0x469ae3 0x8774fe 0x48a461
#       0x8774fd        net/http.(*persistConn).writeLoop+0x1bd /usr/local/go/src/net/http/transport.go:1759

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 22, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants