Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ubuntu docker version #1970

Merged
merged 6 commits into from
Nov 17, 2022
Merged

Conversation

LuigiCerone
Copy link
Contributor

@LuigiCerone LuigiCerone commented Nov 12, 2022

Description

Update default from Ubuntu 18.04 to Ubuntu 20.04 (LTS).

Fixes #1889

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

I need advice on how to test the different updates.

Checklist

  • Should 20.04 be used here instead of latest tag?
  • Should 20.04 be used here instead of latest tag?
  • Should 20.04 be used here instead of latest tag?
  • Should 20.04 be used here instead of latest tag?
  • Should 20.04 be used here instead of latest tag?
  • This guide refers to 18.04 version, should it be updated?
  • This example uses 18.04 version
  • In the K8S folder there are some reference to 18.04 version

Edit after comments:

  • Update script where we should make 22.04 optional if people pass it in as an arg since 20.04 is now the default
  • Build the images and them by running an example inference, attach the log in the PR

@maaquib maaquib linked an issue Nov 15, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Merging #1970 (bf38f8b) into master (2edd063) will not change coverage.
The diff coverage is n/a.

❗ Current head bf38f8b differs from pull request most recent head 0fe4882. Consider uploading reports for the commit 0fe4882 to get more accurate results

@@           Coverage Diff           @@
##           master    #1970   +/-   ##
=======================================
  Coverage   53.31%   53.31%           
=======================================
  Files          70       70           
  Lines        3157     3157           
  Branches       56       56           
=======================================
  Hits         1683     1683           
  Misses       1474     1474           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@msaroufim
Copy link
Member

So ubuntu-latest in a Github runner means 20.04 but I'd rather we don't rely on an unpinned dependency and be explicit that we're using 20.04

https://github.com/actions/runner-images
Screen Shot 2022-11-15 at 10 28 45 AM

So to answer your question @LuigiCerone the answer is yes to all of the above

Regarding testing for anything that's a github action that will be covered by CI so that's easy, for the doc related stuff I think also no test needed

But for the existing images you've updated, one more change we'd need to do is go here and make 22.04 optional if people pass it in as an arg since 20.04 is now the default https://github.com/pytorch/serve/blob/master/docker/build_image.sh#L97

And finally for testing can you build the docker images and run a simple inference from our docker/README.md? and attach those logs here. That's the tricky but important part from this PR. If you're up for it lmk because we'd like to merge this change before Dec 2 for a patch release but if not I can either create a new PR or if you're up for it I can push code to your branch directly

@LuigiCerone
Copy link
Contributor Author

LuigiCerone commented Nov 15, 2022

Hello @msaroufim , thanks for the useful information! I'll work on this (also the last point) and update the PR in the next few days :)

docker/build_image.sh Outdated Show resolved Hide resolved
@LuigiCerone
Copy link
Contributor Author

Hello @msaroufim, these are the logs obtained by building locally image pytorch/torchserve:latest-cpu (by running docker/build_image.sh without arguments). I did the test with model resnet-152 as explained here.

➜  docker git:(update/ubuntu_docker) docker image ls | grep torch
pytorch/torchserve                                                                  latest-cpu       7d7680068cc1   23 hours ago    2.04GB

➜  docker git:(update/ubuntu_docker) docker run --rm -it -p 8080:8080 -p 8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071 pytorch/torchserve
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-11-16T20:01:00,503 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-11-16T20:01:00,789 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.6.0
TS Home: /home/venv/lib/python3.8/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 1992 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: N/A
2022-11-16T20:01:00,822 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-11-16T20:01:00,903 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-11-16T20:01:01,023 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-11-16T20:01:01,024 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2022-11-16T20:01:01,027 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2022-11-16T20:01:01,028 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-11-16T20:01:01,033 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2022-11-16T20:01:01,851 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,861 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:15.432594299316406|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,862 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:51.57096862792969|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,863 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:77.0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,865 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6933.99609375|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,866 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:448.640625|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:01:01,866 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:12.9|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628861
2022-11-16T20:02:01,744 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,746 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:15.16732406616211|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,749 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:51.836238861083984|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,752 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:77.4|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,755 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6852.40234375|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,758 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:530.015625|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:01,761 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:13.9|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628921
2022-11-16T20:02:07,071 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model resnet-152-batch_v2
2022-11-16T20:02:07,073 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 2.0 for model resnet-152-batch_v2
2022-11-16T20:02:07,074 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model resnet-152-batch_v2 loaded.
2022-11-16T20:02:07,076 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: resnet-152-batch_v2, count: 1
2022-11-16T20:02:07,093 [DEBUG] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000]
2022-11-16T20:02:10,114 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2022-11-16T20:02:10,115 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - [PID]51
2022-11-16T20:02:10,116 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - Torch worker started.
2022-11-16T20:02:10,118 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - Python runtime: 3.8.0
2022-11-16T20:02:10,118 [DEBUG] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-152-batch_v2_2.0 State change null -> WORKER_STARTED
2022-11-16T20:02:10,128 [INFO ] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2022-11-16T20:02:10,144 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2022-11-16T20:02:10,156 [INFO ] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1668628930156
2022-11-16T20:02:10,215 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - model_name: resnet-152-batch_v2, batchSize: 1
2022-11-16T20:02:12,099 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - generated new fontManager
2022-11-16T20:02:16,158 [INFO ] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 5942
2022-11-16T20:02:16,160 [DEBUG] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-152-batch_v2_2.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-11-16T20:02:16,161 [INFO ] W-9000-resnet-152-batch_v2_2.0 TS_METRICS - W-9000-resnet-152-batch_v2_2.0.ms:9072|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628936
2022-11-16T20:02:16,163 [INFO ] W-9000-resnet-152-batch_v2_2.0 TS_METRICS - WorkerThreadTime.ms:65|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628936
2022-11-16T20:02:16,168 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /172.17.0.1:64904 "POST /models?url=https://torchserve.pytorch.org/mar_files/resnet-152-batch_v2.mar&batch_size=1&max_batch_delay=50&initial_workers=1 HTTP/1.1" 200 54092
2022-11-16T20:02:16,169 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628882
2022-11-16T20:03:01,647 [INFO ] pool-3-thread-2 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,658 [INFO ] pool-3-thread-2 TS_METRICS - DiskAvailable.Gigabytes:14.998088836669922|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,662 [INFO ] pool-3-thread-2 TS_METRICS - DiskUsage.Gigabytes:52.00547409057617|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,662 [INFO ] pool-3-thread-2 TS_METRICS - DiskUtilization.Percent:77.6|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,663 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:6401.51171875|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,663 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:980.953125|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:03:01,663 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:19.6|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628981
2022-11-16T20:04:01,591 [INFO ] pool-3-thread-2 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,592 [INFO ] pool-3-thread-2 TS_METRICS - DiskAvailable.Gigabytes:14.998088836669922|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,594 [INFO ] pool-3-thread-2 TS_METRICS - DiskUsage.Gigabytes:52.00547409057617|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,595 [INFO ] pool-3-thread-2 TS_METRICS - DiskUtilization.Percent:77.6|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,596 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:6411.35546875|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,597 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:971.11328125|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:01,597 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:19.5|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629041
2022-11-16T20:04:16,610 [INFO ] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1668629056610
2022-11-16T20:04:16,616 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_LOG - Backend received inference at: 1668629056
2022-11-16T20:04:16,924 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:307.83|#ModelName:resnet-152-batch_v2,Level:Model|#hostname:4d2ed7855de0,requestID:fea7e8cf-0c4e-4a21-9003-60ec6fe551f1,timestamp:1668629056
2022-11-16T20:04:16,926 [INFO ] W-9000-resnet-152-batch_v2_2.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:307.93|#ModelName:resnet-152-batch_v2,Level:Model|#hostname:4d2ed7855de0,requestID:fea7e8cf-0c4e-4a21-9003-60ec6fe551f1,timestamp:1668629056
2022-11-16T20:04:16,926 [INFO ] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 311
2022-11-16T20:04:16,928 [INFO ] W-9000-resnet-152-batch_v2_2.0 ACCESS_LOG - /172.17.0.1:56654 "PUT /predictions/resnet-152-batch_v2 HTTP/1.1" 200 336
2022-11-16T20:04:16,931 [INFO ] W-9000-resnet-152-batch_v2_2.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668628882
2022-11-16T20:04:16,932 [DEBUG] W-9000-resnet-152-batch_v2_2.0 org.pytorch.serve.job.Job - Waiting time ns: 541400, Backend time ns: 322506900
2022-11-16T20:04:16,934 [INFO ] W-9000-resnet-152-batch_v2_2.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629056
2022-11-16T20:04:16,935 [INFO ] W-9000-resnet-152-batch_v2_2.0 TS_METRICS - WorkerThreadTime.ms:14|#Level:Host|#hostname:4d2ed7855de0,timestamp:1668629056

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

@msaroufim msaroufim self-requested a review November 17, 2022 00:00
Copy link
Collaborator

@maaquib maaquib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register model on Ubuntu20.04+TS0.6.1 Fails with

$ docker run --rm -it -v/home/ubuntu/Downloads/model_store/:/home/model-server/model-store -p8080:8080 -p8081:8081 pytorch/torchserve:latest-cpu  serve

...
2022-11-16T23:57:50,418 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2022-11-16T23:57:50,419 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 16, in <module>
2022-11-16T23:57:50,419 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG -     from ts.metrics.metric_cache_yaml_impl import MetricsCacheYamlImpl
2022-11-16T23:57:50,419 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/metrics/metric_cache_yaml_impl.py", line 5, in <module>
2022-11-16T23:57:50,420 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG -     import yaml
2022-11-16T23:57:50,420 [WARN ] W-9000-resnet-18_1.0-stderr MODEL_LOG - ModuleNotFoundError: No module named 'yaml'
...

@LuigiCerone Can you add PyYAML as a dependency. This is a miss from #1954

Update: Building with dev image pulls in the dependencies. Seems like we don't use Dockerfile for official release anymore. Approving

@agunapal agunapal merged commit 95c5052 into pytorch:master Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

security issue Default to Ubuntu 20.04 in our releases
4 participants