Skip to content

[Tensorpipe Agent] Implementing getMetrics with currently available metrics #37980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

osalpekar
Copy link
Member

@osalpekar osalpekar commented May 6, 2020

Stack from ghstack:

This implements TensorPipeAgent::getMetrics with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: D21439184

…etrics

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)

[ghstack-poisoned]
osalpekar added a commit that referenced this pull request May 6, 2020
…etrics

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)

ghstack-source-id: 103624005
Pull Request resolved: #37980
@dr-ci
Copy link

dr-ci bot commented May 7, 2020

💊 CI failures summary and remediations

As of commit 47b3d61 (more details on the Dr. CI page):


  • 5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_backward_compatibility_check_test (1/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 18:47:55 caused by: Connection refused (os error 111)
May 07 18:47:55 +++ eval 'extract_trap_cmd ' 
May 07 18:47:55 ++++ extract_trap_cmd 
May 07 18:47:55 ++++ printf '%s\n' '' 
May 07 18:47:55 +++ printf '%s\n' cleanup 
May 07 18:47:55 ++ trap -- ' 
May 07 18:47:55 cleanup' EXIT 
May 07 18:47:55 ++ which sccache 
May 07 18:47:55 ++ sccache --stop-server 
May 07 18:47:55 Stopping sccache server... 
May 07 18:47:55 error: couldn't connect to server 
May 07 18:47:55 caused by: Connection refused (os error 111) 
May 07 18:47:55 ++ true 
May 07 18:47:55 ++ rm /var/lib/jenkins/sccache_error.log 
May 07 18:47:55 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
May 07 18:47:55 ++ SCCACHE_IDLE_TIMEOUT=1200 
May 07 18:47:55 ++ RUST_LOG=sccache::server=error 
May 07 18:47:55 ++ sccache --start-server 
May 07 18:47:55 Starting sccache server... 
May 07 18:47:55 ++ sccache --zero-stats 
May 07 18:47:55 Compile requests                 0 
May 07 18:47:55 Compile requests executed        0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_profiling_test (2/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 18:48:03 caused by: Connection refused (os error 111)
May 07 18:48:03 +++ eval 'extract_trap_cmd ' 
May 07 18:48:03 ++++ extract_trap_cmd 
May 07 18:48:03 ++++ printf '%s\n' '' 
May 07 18:48:03 +++ printf '%s\n' cleanup 
May 07 18:48:03 ++ trap -- ' 
May 07 18:48:03 cleanup' EXIT 
May 07 18:48:03 ++ which sccache 
May 07 18:48:03 ++ sccache --stop-server 
May 07 18:48:03 Stopping sccache server... 
May 07 18:48:03 error: couldn't connect to server 
May 07 18:48:03 caused by: Connection refused (os error 111) 
May 07 18:48:03 ++ true 
May 07 18:48:03 ++ rm /var/lib/jenkins/sccache_error.log 
May 07 18:48:03 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
May 07 18:48:03 ++ SCCACHE_IDLE_TIMEOUT=1200 
May 07 18:48:03 ++ RUST_LOG=sccache::server=error 
May 07 18:48:03 ++ sccache --start-server 
May 07 18:48:03 Starting sccache server... 
May 07 18:48:03 ++ sccache --zero-stats 
May 07 18:48:03 Compile requests                 0 
May 07 18:48:03 Compile requests executed        0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 18:48:14 caused by: Connection refused (os error 111)
May 07 18:48:14 ++++ trap -p EXIT 
May 07 18:48:14 +++ eval 'extract_trap_cmd ' 
May 07 18:48:14 ++++ extract_trap_cmd 
May 07 18:48:14 ++++ printf '%s\n' '' 
May 07 18:48:14 +++ printf '%s\n' cleanup 
May 07 18:48:14 ++ trap -- ' 
May 07 18:48:14 cleanup' EXIT 
May 07 18:48:14 ++ which sccache 
May 07 18:48:14 ++ sccache --stop-server 
May 07 18:48:14 error: couldn't connect to server 
May 07 18:48:14 caused by: Connection refused (os error 111) 
May 07 18:48:14 Stopping sccache server... 
May 07 18:48:14 ++ true 
May 07 18:48:14 ++ rm /var/lib/jenkins/sccache_error.log 
May 07 18:48:14 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
May 07 18:48:14 ++ SCCACHE_IDLE_TIMEOUT=1200 
May 07 18:48:14 ++ RUST_LOG=sccache::server=error 
May 07 18:48:14 ++ sccache --start-server 
May 07 18:48:14 Starting sccache server... 
May 07 18:48:14 ++ sccache --zero-stats 
May 07 18:48:14 Compile requests                 0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 18:48:35 caused by: Connection refused (os error 111)
May 07 18:48:35 +++ eval 'extract_trap_cmd ' 
May 07 18:48:35 ++++ extract_trap_cmd 
May 07 18:48:35 ++++ printf '%s\n' '' 
May 07 18:48:35 +++ printf '%s\n' cleanup 
May 07 18:48:35 ++ trap -- ' 
May 07 18:48:35 cleanup' EXIT 
May 07 18:48:35 ++ which sccache 
May 07 18:48:35 ++ sccache --stop-server 
May 07 18:48:35 Stopping sccache server... 
May 07 18:48:35 error: couldn't connect to server 
May 07 18:48:35 caused by: Connection refused (os error 111) 
May 07 18:48:35 ++ true 
May 07 18:48:35 ++ rm /var/lib/jenkins/sccache_error.log 
May 07 18:48:35 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
May 07 18:48:35 ++ SCCACHE_IDLE_TIMEOUT=1200 
May 07 18:48:35 ++ RUST_LOG=sccache::server=error 
May 07 18:48:35 ++ sccache --start-server 
May 07 18:48:35 Starting sccache server... 
May 07 18:48:35 ++ sccache --zero-stats 
May 07 18:48:35 Compile requests                 0 
May 07 18:48:35 Compile requests executed        0 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 19:17:25 caused by: Connection refused (os error 111)
May 07 19:17:25 +++ eval 'extract_trap_cmd ' 
May 07 19:17:25 ++++ extract_trap_cmd 
May 07 19:17:25 ++++ printf '%s\n' '' 
May 07 19:17:25 +++ printf '%s\n' cleanup 
May 07 19:17:25 ++ trap -- ' 
May 07 19:17:25 cleanup' EXIT 
May 07 19:17:25 ++ which sccache 
May 07 19:17:25 ++ sccache --stop-server 
May 07 19:17:25 Stopping sccache server... 
May 07 19:17:25 error: couldn't connect to server 
May 07 19:17:25 caused by: Connection refused (os error 111) 
May 07 19:17:25 ++ true 
May 07 19:17:25 ++ rm /var/lib/jenkins/sccache_error.log 
May 07 19:17:25 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
May 07 19:17:25 ++ SCCACHE_IDLE_TIMEOUT=1200 
May 07 19:17:25 ++ RUST_LOG=sccache::server=error 
May 07 19:17:25 ++ sccache --start-server 
May 07 19:17:25 Starting sccache server... 
May 07 19:17:25 ++ sccache --zero-stats 
May 07 19:17:25 Compile requests                 0 
May 07 19:17:25 Compile requests executed        0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 10 times.

@osalpekar osalpekar requested a review from jiayisuse May 7, 2020 01:06
…available metrics"

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)

[ghstack-poisoned]
…available metrics"

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)

[ghstack-poisoned]
…available metrics"

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)

[ghstack-poisoned]
osalpekar added a commit that referenced this pull request May 7, 2020
…etrics

Pull Request resolved: #37980

This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented.
ghstack-source-id: 103624005

Differential Revision: [D21439184](https://our.internmc.facebook.com/intern/diff/D21439184/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f0f5873.

@facebook-github-bot facebook-github-bot deleted the gh/osalpekar/22/head branch May 11, 2020 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants