Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Conversation

@harshbafna
Copy link
Contributor

Description

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the tests [UT/IT] that you ran to verify your changes and relevent result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • UT/IT execution results

  • Logs

Checklist:

  • Have you added tests that prove your fix is effective or that this feature works?
  • New and existing unit tests pass locally with these changes?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: b426673
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: b426673
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Collaborator

@maaquib maaquib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add back the ts_scripts/README file back from earlier commit. Other comments inline. Works for GPU cu92 and cu101.

CU9.2, DLAMI 22, Ubuntu 16.04, ami-0f009bf0d2e29c5a9:

After running install_dependencies.py, python is downgraded to 3.5
After symlinking it back to 3.6 scripts succeed on 16.04

# UBUNTU 16.04: python3 points to 3.5 which doesn’t seem to have a torch==1.6.0+cu92 release. Had to upgrade to python 3.6
> sudo add-apt-repository ppa:deadsnakes/ppa
> sudo apt-get update
> sudo apt-get install python3.6 python3.6-dev
> curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6
> sudo ln -sf /usr/bin/python3.6 /usr/bin/python
> python -m pip install --upgrade pip
> export PATH=$PATH:/home/ubuntu/.local/bin

> git clone https://github.com/pytorch/serve
> cd serve
> git checkout build_fixes
> pip install -U -r requirements_cu92.txt -f https://download.pytorch.org/whl/torch_stable.html
> python ./ts_scripts/install_dependencies.py
> sudo ln -sf /usr/bin/python3.6 /usr/bin/python
> python ./torchserve_sanity.py
> python ./test/regression_tests.py

docker/README.md Outdated
```

For creating GPU based image with older CUDA versions (ex. CUDA 10.1), make sure that the `--build-arg CUDA_VERSION=<version>` is specified. The version is in the format "cuda92", "cuda101":
For creating GPU based image with older CUDA versions (ex. CUDA 10.1), make sure that the `--build-arg CUDA_VERSION=<version>` is specified. The version is in the format "cu92", "cu101", "cu102":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also clarify that we need to specify the appropriate BASE_IMAGE= with the cuda version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@chauhang chauhang Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshbafna @maaquib Is it not possible to derive the Base image name from just the Cuda version? It will simplify things for people building their own custom image of TorchServe if they can just set a single CUDA_VERSION variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chauhang: Enhanced the build_image.sh script and updated the documentation. Could you please review?

@maaquib maaquib added bug Something isn't working and removed bug Something isn't working labels Dec 5, 2020
@maaquib maaquib added this to the v0.3.0 milestone Dec 5, 2020
@harshbafna
Copy link
Contributor Author

Please add back the ts_scripts/README file back from earlier commit. Other comments inline. Works for GPU cu92 and cu101.

CU9.2, DLAMI 22, Ubuntu 16.04, ami-0f009bf0d2e29c5a9:

After running install_dependencies.py, python is downgraded to 3.5
After symlinking it back to 3.6 scripts succeed on 16.04

# UBUNTU 16.04: python3 points to 3.5 which doesn’t seem to have a torch==1.6.0+cu92 release. Had to upgrade to python 3.6
> sudo add-apt-repository ppa:deadsnakes/ppa
> sudo apt-get update
> sudo apt-get install python3.6 python3.6-dev
> curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6
> sudo ln -sf /usr/bin/python3.6 /usr/bin/python
> python -m pip install --upgrade pip
> export PATH=$PATH:/home/ubuntu/.local/bin

> git clone https://github.com/pytorch/serve
> cd serve
> git checkout build_fixes
> pip install -U -r requirements_cu92.txt -f https://download.pytorch.org/whl/torch_stable.html
> python ./ts_scripts/install_dependencies.py
> sudo ln -sf /usr/bin/python3.6 /usr/bin/python
> python ./torchserve_sanity.py
> python ./test/regression_tests.py

@maaquib: Is the readme really required? I felt it adds more confusion and not really required. We already have which scripts ' references in the required sections of the documentation.

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 29bc917
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 2c7f341
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 29bc917
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 2c7f341
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f49cc2a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: f49cc2a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jeremiahschung
Copy link
Collaborator

jeremiahschung commented Dec 9, 2020

Retested this with a fresh DLAMI v38 on g4dn.xlarge on this branch.

Running into two issues:

I'm continuing to test the different configurations and will report issues as I encounter them in the interest of getting them reported quickly.

The newman test failure message is also observable in a fresh Ubuntu 18.04 CPU instance.
pr836_ubuntu1804_cpu_regression_test.txt


```bash
./build_image.sh
./build_image.sh -bt dev -t torchserve-dev:1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace 1.0 with current latest version of another tag like "dev-latest"

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 4a02121
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 4a02121
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@maaquib
Copy link
Collaborator

maaquib commented Dec 9, 2020

@jeremiahschung

  • Looking at the following logs, it seems like you should be passing the --cuda=cu102 flag. There seems to be a typo. Can you re-run this with the correct args
ubuntu@ip-172-31-23-252:~/serve$ python ts_scripts/install_dependencies.py cuda=cu102
usage: install_dependencies.py [-h] [--cuda {cu92,cu101,latest}]
                               [--environment {prod,dev}]
install_dependencies.py: error: unrecognized arguments: cuda=cu102
ubuntu@ip-172-31-23-252:~/serve$ python ts_scripts/install_dependencies.py
Wed Dec  9 00:09:05 2020   
  • For regression tests, it seems newman was not installed. Can you verify that the install dependency script was run successfully with the appropriate arguments before running this
sh: 1: newman: not found

maaquib
maaquib previously requested changes Dec 9, 2020
@maaquib maaquib requested a review from lxning December 9, 2020 01:01
@jeremiahschung
Copy link
Collaborator

@jeremiahschung

  • Looking at the following logs, it seems like you should be passing the --cuda=cu102 flag. There seems to be a typo. Can you re-run this with the correct args
ubuntu@ip-172-31-23-252:~/serve$ python ts_scripts/install_dependencies.py cuda=cu102
usage: install_dependencies.py [-h] [--cuda {cu92,cu101,latest}]
                               [--environment {prod,dev}]
install_dependencies.py: error: unrecognized arguments: cuda=cu102
ubuntu@ip-172-31-23-252:~/serve$ python ts_scripts/install_dependencies.py
Wed Dec  9 00:09:05 2020   
  • For regression tests, it seems newman was not installed. Can you verify that the install dependency script was run successfully with the appropriate arguments before running this
sh: 1: newman: not found

Thanks for the pointers @maaquib.

I reran the tests after using the command python ts_scripts/install_dependencies.py --environment=dev --cuda=latest (and without the cuda arg in a CPU instance) and that resolved the newman: not found message. However, the regression test still ends with the message ## Newman API Tests Failed ! for both CPU and GPU. Can you please take a look?

pr836_ubuntu1804_cpu_regression_test_2.txt
pr836_dlami_v38_cu102_regression_test_2.txt

@maaquib
Copy link
Collaborator

maaquib commented Dec 9, 2020

  • Dependency: SUCCESS
  • Sanity Test: SUCCESS
  • Regression: 10 Assertion Failures
# ami-02e86b825fe559330, DLAMI 38, CUDA 10.2, Ubuntu 18.04
# ami-0f3fd8037b0b9df52, DLAMI 37, CUDA 10.2, Ubuntu 18.04

## Symlink appropriate CUDA version
sudo rm /usr/local/cuda && sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda
ll /usr/local/cuda

## Verify EC2 updates are not running
ps ax|grep apt

## Run scripts
git clone https://github.com/pytorch/serve/
cd serve/
git checkout build_fixes
python ts_scripts/install_dependencies.py --environment dev
python test/regression_tests.py
  • Dependency: SUCCESS
  • Sanity Test: SUCCESS
  • Regression: 10 Assertion Failures
# ami-02e86b825fe559330, DLAMI 38, CUDA 10.1, Ubuntu 18.04
# ami-0f3fd8037b0b9df52, DLAMI 37, CUDA 10.2, Ubuntu 18.04

## Symlink appropriate CUDA version
sudo rm /usr/local/cuda && sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda
ll /usr/local/cuda

## Verify EC2 updates are not running
ps ax|grep apt

## Run scripts
git clone https://github.com/pytorch/serve/
cd serve/
git checkout build_fixes
python ts_scripts/install_dependencies.py --cuda cu101 --environment dev
python test/regression_tests.py

@maaquib
Copy link
Collaborator

maaquib commented Dec 9, 2020

Reverted failing newman tests in #845

@maaquib maaquib dismissed their stale review December 9, 2020 03:46

Explanation provided

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: cc924de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: cc924de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@maaquib
Copy link
Collaborator

maaquib commented Dec 9, 2020

Verified working after #845

# ami-02e86b825fe559330, DLAMI 38, CUDA 10.2, Ubuntu 18.04, g4dn.xl
# ami-02e86b825fe559330, DLAMI 38, CUDA 10.1, Ubuntu 18.04, g4dn.xl

## Symlink appropriate CUDA version
sudo rm /usr/local/cuda && sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda
ll /usr/local/cuda

## Verify EC2 updates are not running
ps ax|grep apt

## Run scripts
git clone https://github.com/pytorch/serve/
cd serve/
git checkout build_fixes
python ts_scripts/install_dependencies.py --environment dev
python torchserve_sanity.py
python test/regression_tests.py
# ami-02e86b825fe559330, DLAMI 38, CPU, Ubuntu 18.04, c5.4xl

## Verify EC2 updates are not running
ps ax|grep apt

## Run scripts
git clone https://github.com/pytorch/serve/
cd serve/
git checkout build_fixes
python ts_scripts/install_dependencies.py --environment dev
python torchserve_sanity.py
python test/regression_tests.py

@maaquib maaquib dismissed chauhang’s stale review December 9, 2020 05:36

Review addressed

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: cc924de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: cc924de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@harshbafna harshbafna deleted the build_fixes branch December 11, 2020 06:35
@harshbafna harshbafna changed the title [WIP] Build fixes Build fixes Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants