Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove logs from warm up period before calculating stats and unbreak bechmark-ab.py #1413

Merged
merged 3 commits into from
Feb 18, 2022

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Feb 5, 2022

Description

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

When running a benchmark with benchmark-ab.py script we run a warmup phase before benchmarking a model in the main stage. Part of the statistics collected for the report are extracted from the TorchServe log file. During this extraction we include the log lines from the warmup phase which dilutes the statistics (i.e. PredictionTime, HandlerTime, QueueTime, WorkerThreadTime) in the report. Due to this we systematically underestimate TorchServe's performance.

Additionally, this RP unbreaks benchmarkss/benchmark-ab.py which got broken due to a missing positional parameter (report_location) in a recent commit. A new parameter tmp_dir is introduced which allows final report location and tmp directory be in separate locations.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the tests [UT/IT] that you ran to verify your changes and relevent result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

To test run:
$ cd serve/benchmark
$ python benchmark-ab.py --config config.json --requests 1000
content config.json:
{
"url":"https://torch-deploy-benchmarks.s3.amazonaws.com/BERTSeqClassification.mar",
"requests": 10000,
"concurrency": 100,
"input": "sample.txt",
"batch_delay": 100,
"batch_size": 8,
"content_type":"application/text",
"workers": 4
}
content sample.txt:
Apples are especially bad for your health [SEP] Eating apples is a health risk

  • Test A
    run before fix

  • Test B
    run after fix

  • UT/IT execution results
    During generation of the benchmark report we create several txt files in /tmp/benchmark and filter out the times we're interested in (

    def extract_metrics():
    )
    before fix:
    $ wc -l /tmp/benchmark/*.txt
    138 /tmp/benchmark/handler_time.txt
    138 /tmp/benchmark/predict.txt
    46 /tmp/benchmark/result.txt
    1100 /tmp/benchmark/waiting_time.txt
    146 /tmp/benchmark/worker_thread.txt

(1000 request + 100 warmup requests) / 8 samples per batch = 138 batches

after fix:
$ wc -l /tmp/benchmark/*.txt
125 /tmp/benchmark/handler_time.txt
125 /tmp/benchmark/predict.txt
46 /tmp/benchmark/result.txt
1000 /tmp/benchmark/waiting_time.txt
125 /tmp/benchmark/worker_thread.txt

1000 request / 8 samples per batch = 125 batches

Checklist:

  • Have you added tests that prove your fix is effective or that this feature works?
  • New and existing unit tests pass locally with these changes?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: d7de214
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: d7de214
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: d7de214
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@HamidShojanazeri
Copy link
Collaborator

Thanks @mreso, can you please add the logs for the fix as well.

@mreso
Copy link
Collaborator Author

mreso commented Feb 7, 2022

Sure, added logs and test plan @HamidShojanazeri

benchmarks/benchmark-ab.py Outdated Show resolved Hide resolved
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: a1f2359
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: a1f2359
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: a1f2359
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mreso
Copy link
Collaborator Author

mreso commented Feb 14, 2022

@HamidShojanazeri do we need any additional logs or any other changes?

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 3bc395b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 3bc395b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 3bc395b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@msaroufim msaroufim self-requested a review February 18, 2022 17:48
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit comment around using None for warmup_lines, otherwise worth mentioning in the description that this PR actually shows that we've been underestimating torchserve performance with our benchmarks.

benchmarks/benchmark-ab.py Outdated Show resolved Hide resolved
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: bde615f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: bde615f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: bde615f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mreso
Copy link
Collaborator Author

mreso commented Feb 18, 2022

@nskool could you take another look? I remove the global but had to make some more changes due to a commit in master that broke the script. The breakage was due to a missing positional parameter (report_location). I introduced another tmp_dir parameter so you final report location and tmp directory can be in separate locations now.

@mreso mreso changed the title [WIP] Remove logs from warm up period before calculating stats Remove logs from warm up period before calculating stats and unbreak bechmark-ab.py Feb 18, 2022
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: f16a8a2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: f16a8a2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f16a8a2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mreso mreso merged commit 390573b into pytorch:master Feb 18, 2022
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: f16a8a2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f16a8a2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants