This repository was archived by the owner on Aug 7, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 890
Add neuron benchmarking to automation and other enhancements #1099
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
5eb4250
Add neuron benchmarking to automation and other enhancements
c96320b
Add working neuron handler changes
58f2b06
Fix bert_compile script
71f087d
Remove report.md extra file
f866a2d
Fix batch-size issues in neuron benchmarking
cdc6b99
Fix vgg16 instance type
f6b444d
Refactor imports
cc99803
Remove instances.yaml, address commends and unskip functions
a7ea429
Address Geeta's comments
188eef1
Refactor according to comments
e0d44f9
Cleanup
4385731
Remove comments
ddd9d27
Correct bert input
4108825
Merge branch 'master' into neuron_automation
msaroufim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # syntax = docker/dockerfile:experimental | ||
| # | ||
| # Following comments have been shamelessly copied from https://github.com/pytorch/pytorch/blob/master/Dockerfile | ||
| # | ||
| # NOTE: To build this you will need a docker version > 18.06 with | ||
| # experimental enabled and DOCKER_BUILDKIT=1 | ||
| # | ||
| # If you do not use buildkit you are not going to have a good time | ||
| # | ||
| # For reference: | ||
| # https://docs.docker.com/develop/develop-images/build_enhancements/ | ||
|
|
||
| ARG BASE_IMAGE=ubuntu:18.04 | ||
| ARG BUILD_TYPE=dev | ||
| FROM ${BASE_IMAGE} AS compile-image | ||
|
|
||
| ARG BASE_IMAGE | ||
| ARG BRANCH_NAME=master | ||
| ARG MACHINE_TYPE=cpu | ||
| ARG CUDA_VERSION | ||
|
|
||
| ENV PYTHONUNBUFFERED TRUE | ||
|
|
||
| RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \ | ||
| apt-get update && \ | ||
| DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \ | ||
| fakeroot \ | ||
| ca-certificates \ | ||
| dpkg-dev \ | ||
| sudo \ | ||
| g++ \ | ||
| git \ | ||
| python3-dev \ | ||
| build-essential \ | ||
| openjdk-11-jdk \ | ||
| curl \ | ||
| wget \ | ||
| vim \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && cd /tmp \ | ||
| && curl -O https://bootstrap.pypa.io/get-pip.py \ | ||
| && python3 get-pip.py | ||
|
|
||
| RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 \ | ||
| && update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1 | ||
|
|
||
| RUN pip install -U pip setuptools | ||
|
|
||
| RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list | ||
| RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get install -y \ | ||
| aws-neuron-runtime \ | ||
| aws-neuron-tools \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && rm -rf /tmp/tmp* \ | ||
| && apt-get clean | ||
|
|
||
| # Build Dev Image | ||
| FROM compile-image AS dev-image | ||
| ARG MACHINE_TYPE=cpu | ||
| ARG CUDA_VERSION | ||
| RUN if [ "$MACHINE_TYPE" = "gpu" ]; then export USE_CUDA=1; fi \ | ||
| && git clone https://github.com/pytorch/serve.git \ | ||
| && cd serve \ | ||
| && git checkout --track ${BRANCH_NAME} \ | ||
| && if [ -z "$CUDA_VERSION" ]; then python ts_scripts/install_dependencies.py --environment=dev; else python ts_scripts/install_dependencies.py --environment=dev --cuda $CUDA_VERSION; fi \ | ||
| && python ts_scripts/install_from_src.py \ | ||
| && useradd -m model-server \ | ||
| && mkdir -p /home/model-server/tmp \ | ||
| && cp docker/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh \ | ||
| && chmod +x /usr/local/bin/dockerd-entrypoint.sh \ | ||
| && chown -R model-server /home/model-server \ | ||
| && cp docker/config.properties /home/model-server/config.properties \ | ||
| && mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store \ | ||
| && pip install torch-neuron 'neuron-cc[tensorflow]' --extra-index-url=https://pip.repos.neuron.amazonaws.com | ||
|
|
||
| EXPOSE 8080 8081 8082 7070 7071 | ||
| USER model-server | ||
| WORKDIR /home/model-server | ||
| ENV TEMP=/home/model-server/tmp | ||
| ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"] | ||
| CMD ["serve"] | ||
|
|
||
| # Build CodeBuild Image | ||
| FROM compile-image AS codebuild-image | ||
| ENV JAVA_VERSION=11 \ | ||
| JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" \ | ||
| JDK_HOME="/usr/lib/jvm/java-11-openjdk-amd64" \ | ||
| JRE_HOME="/usr/lib/jvm/java-11-openjdk-amd64" \ | ||
| ANT_VERSION=1.10.3 \ | ||
| MAVEN_HOME="/opt/maven" \ | ||
| MAVEN_VERSION=3.5.4 \ | ||
| MAVEN_CONFIG="/root/.m2" \ | ||
| MAVEN_DOWNLOAD_SHA1="22cac91b3557586bb1eba326f2f7727543ff15e3" | ||
|
|
||
| # Install Maven | ||
| RUN set -ex \ | ||
| && mkdir -p $MAVEN_HOME \ | ||
| && curl -LSso /var/tmp/apache-maven-$MAVEN_VERSION-bin.tar.gz https://apache.org/dist/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz \ | ||
| && echo "$MAVEN_DOWNLOAD_SHA1 /var/tmp/apache-maven-$MAVEN_VERSION-bin.tar.gz" | sha1sum -c - \ | ||
| && tar xzvf /var/tmp/apache-maven-$MAVEN_VERSION-bin.tar.gz -C $MAVEN_HOME --strip-components=1 \ | ||
| && update-alternatives --install /usr/bin/mvn mvn /opt/maven/bin/mvn 10000 \ | ||
| && mkdir -p $MAVEN_CONFIG | ||
|
|
||
| FROM ${BUILD_TYPE}-image AS final-image | ||
| ARG BUILD_TYPE | ||
| RUN echo "${BUILD_TYPE} image creation completed" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,8 +21,45 @@ If you'd like to use your own repo, edit the __init__.py under `serve/test/bench | |
| * Ensure you have [docker](https://docs.docker.com/get-docker/) client set-up on your system - osx/ec2 | ||
| * Adjust the following global variables to your preference in the file `serve/test/benchmark/tests/utils/__init__.py` <br> | ||
| -- IAM_INSTANCE_PROFILE :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br> | ||
| Use the following commands to create a new role if you don't have one you can use. | ||
| 1. Create the trust policy file `ec2-admin-trust-policy.json` and add the following content: | ||
| ``` | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "Service": [ | ||
| "ec2.amazonaws.com" | ||
| ] | ||
| }, | ||
| "Action": "sts:AssumeRole" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| 2. Create the EC2 role as follows: | ||
| ``` | ||
| aws iam create-role --role-name EC2Admin --assume-role-policy-document file://ec2-admin-trust-policy.json | ||
| ``` | ||
| 3. Add the permissions to the role as follows: | ||
| ``` | ||
| aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/IAMFullAccess --role-name EC2Admin | ||
| aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess --role-name EC2Admin | ||
| aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name EC2Admin | ||
| aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess --role-name EC2Admin | ||
| ``` | ||
| -- S3_BUCKET_BENCHMARK_ARTIFACTS :all temporary benchmarking artifacts including server logs will be stored in this bucket: <br> | ||
| Use the following command to create a new S3 bucket if you don't have one you can use. | ||
| ``` | ||
| aws s3api create-bucket --bucket <torchserve-benchmark> --region us-west-2 | ||
| ``` | ||
| -- DEFAULT_DOCKER_DEV_ECR_REPO :docker image used for benchmarking will be pushed to this repo <br> | ||
| Use the following command to create a new ECR repo if you don't have one you can use. | ||
| ``` | ||
| aws ecr create-repository --bucket torchserve-benchmark --region us-west-2 | ||
| ``` | ||
| * If you're running this setup on an EC2 instance, please ensure that the instance's security group settings 'allow' inbound ssh port 22. Refer [docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-rules.html). | ||
|
|
||
| *The following steps assume that the current working directory is serve/.* | ||
|
|
@@ -32,6 +69,8 @@ If you'd like to use your own repo, edit the __init__.py under `serve/test/bench | |
| sudo apt-get install python3-venv | ||
| python3 -m venv bvenv | ||
| source bvenv/bin/activate | ||
| # Ensure you have the latest pip | ||
| pip3 install -U pip | ||
| ``` | ||
| 2. Install requirements for the benchmarking | ||
| ``` | ||
|
|
@@ -57,7 +96,7 @@ python report.py | |
| ``` | ||
| The final benchmark report will be available in markdown format as `report.md` in the `serve/` folder. | ||
|
|
||
| **Example report for vgg16 model** | ||
| **Example report for vgg11 model** | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for switching to vgg11 here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The results quoted below are vgg11 results, this 'vgg16' was just a typo that was corrected. |
||
|
|
||
|
|
||
| ### Benchmark report | ||
|
|
@@ -103,3 +142,37 @@ The final benchmark report will be available in markdown format as `report.md` i | |
| | AB | vgg11 | 100 | 1000 | 0 | 3.47 | 28765 | 29849 | 30488 | 28781.227 | 0.0 | 1576.24 | 1758.28 | 1758.28 | 2249.52 | 2249.34 | 25210.43 | 46.77 | | ||
|
|
||
|
|
||
| ## Features of the automation: | ||
| 1. To save time by *not* creating new instances for every benchmark run for local testing, use the '--do-not-terminate' flag. This will automatically create a file called 'instances.yaml' and write instance-related data into the file so that it may be re-used next time. | ||
| ``` | ||
| python test/benchmark/run_benchmark.py --do-not-terminate | ||
| ``` | ||
|
|
||
| 2. To re-use an instance already recorded in `instances.yaml`, use the '--use-instances' flag: | ||
| ``` | ||
| python test/benchmark/run_benchmark.py --use-instances <full_path_to>/instances.yaml --do-no-terminate | ||
| ``` | ||
| `Note: Use --do-not-termninate flag to keep re-using the instances, else, it will be terminated`. | ||
|
|
||
| 3. To run a test containing a specific string, use the `--run-only` flag. Note that the argument is 'string matched' i.e. if the test-name contains the supplied argument as a substring, the test will run. | ||
| ``` | ||
| # To run mnist test | ||
| python test/benchmark/run_benchmark.py --run-only mnist | ||
|
|
||
| # To run fastrcnn test | ||
| python test/benchmark/run_benchmark.py --run-only fastrcnn | ||
|
|
||
| # To run bert_neuron and bert | ||
| python test/benchmark/run_benchmark.py --run-only bert | ||
|
|
||
| # To run vgg11 test | ||
| python test/benchmark/run_benchmark.py --run-only vgg11 | ||
|
|
||
| # To run vgg16 test | ||
| python test/benchmark/run_benchmark.py --run-only vgg16 | ||
| ``` | ||
|
|
||
| 4. You can benchmark a specifc branch of the torchserve github repo by specifying the flag `--use-torchserve-branch` e.g., | ||
| ``` | ||
| python test/benchmark/run_benchmark.py --use-torchserve-branch issue_1115 | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,4 +11,5 @@ gitpython | |
| docker | ||
| pandas | ||
| matplotlib | ||
| pyyaml | ||
| pyyaml | ||
| cryptography==3.4.7 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nskool as discussed, it might be good to add report.py as part of the automation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make this part of a separate CR, with an all-inclusive report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed in the current PR