feat: add PyTorch/XLA support #2182

morgandu · 2023-03-16T23:34:00Z

Description

This PR is to add PyTorch/XLA support in TorchServe backend base handler.

Type of change

New feature (non-breaking change which adds functionality)

msaroufim

Very cool! Left some minor questions on the PR directly

One question I had was whether this is the right way to uses torch/xla nowadays or whether users are recommended to pass in an XLA backend to torch.compile()

Since most of machines are running on AWS in CI it's unlikely we'll get a TPU available to fuly test this but I'm assuming this should work just fine on GPU as well, in which case a quick test would also be super helpful

msaroufim · 2023-03-16T23:37:08Z

ts/torch_handler/base_handler.py

@@ -278,6 +303,9 @@ def inference(self, data, *args, **kwargs):
        with torch.no_grad():
            marshalled_data = data.to(self.device)
            results = self.model(marshalled_data, *args, **kwargs)
+            if torch_xla_enabled:
+                xm.mark_step()


not super familiar with xla internals but what does this line do?

Removed the xm.mark_step() cause this is essential for training, optional for inferencing. In short, the value /calculation is upon either a xm.mark_step() or when it gets retrieved. In our case it's the latter one.

msaroufim · 2023-03-16T23:38:14Z

ts/torch_handler/base_handler.py

@@ -59,6 +59,24 @@ def check_pt2_enabled():
        )


+def check_torch_xla_enabled() -> bool:


@lxning another good candidate for your new config change, it might be possible that a user has xla installed but doesnt want to necessarily comile the model with XLA

@msaroufim yes. the model yaml config can make this much easier. I'll send the PR early next week to unblock this PR.

@lxning another good candidate for your new config change, it might be possible that a user has xla installed but doesnt want to necessarily comile the model with XLA

IIUC, the above mentioned scenario applies to gpu. Though, I have torch.cuda.is_available() and properties.get("gpu_id") is not None: as the prioritized condition. For accelerator type the require torch_xla, users do have option to choose to compile the torchxla_trace_once, which is an experimental backend for Dynamo.

yyetim · 2023-03-17T23:43:31Z

torch.compile() is a good point. I'm guessing we'll need both this version to support pytorch <2.0, and another change to support pytorch 2.0 models.

msaroufim · 2023-03-18T00:45:39Z

So we do actually already support torch.compile #1960 and you can pass in a custom backend via a compile.json

I don't think supporting both workflows is a huge deal but curious which one would you prefer people use assuming people have 2.0 installed

morgandu · 2023-03-25T01:18:07Z

torch.compile() is a good point. I'm guessing we'll need both this version to support pytorch <2.0, and another change to support pytorch 2.0 models.

As discussed, we decided to prioritize pytorch/xla 2.0 and above.

morgandu · 2023-03-25T01:19:01Z

So we do actually already support torch.compile #1960 and you can pass in a custom backend via a compile.json

I don't think supporting both workflows is a huge deal but curious which one would you prefer people use assuming people have 2.0 installed

Added torchxla_trace_once backend

codecov · 2023-03-28T20:30:01Z

Codecov Report

Merging #2182 (c3a6b93) into master (c37da18) will increase coverage by 0.10%.
The diff coverage is 86.66%.

❗ Current head c3a6b93 differs from pull request most recent head 4537fbb. Consider uploading reports for the commit 4537fbb to get more accurate results

@@            Coverage Diff             @@
##           master    #2182      +/-   ##
==========================================
+ Coverage   71.31%   71.41%   +0.10%     
==========================================
  Files          73       73              
  Lines        3336     3348      +12     
  Branches       57       57              
==========================================
+ Hits         2379     2391      +12     
  Misses        954      954              
  Partials        3        3

Impacted Files	Coverage Δ
ts/torch_handler/base_handler.py	`54.97% <85.71%> (+1.97%)`	⬆️
ts/utils/util.py	`71.79% <100.00%> (+0.36%)`	⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

msaroufim · 2023-04-04T00:26:30Z

PR looks good but I was hoping we could have the test you're running checked in and only run it if a TPU is found

…rk_step

morgandu · 2023-04-05T22:17:19Z

PR looks good but I was hoping we could have the test you're running checked in and only run it if a TPU is found

Added test, PTAL

msaroufim · 2023-04-06T05:07:41Z

LGTM thank you, as FYI we're killing the compile.json in the next release but I'll make the change and test out the kokoro CI directly

morgandu · 2023-04-06T16:56:23Z

Thanks for the heads up!

morgandu · 2023-04-10T21:20:23Z

@lxning , follow up for review request

morgandu marked this pull request as draft March 16, 2023 23:34

msaroufim requested changes Mar 16, 2023

View reviewed changes

morgandu force-pushed the torch-xla branch from 8429423 to 9447d8e Compare March 25, 2023 01:11

morgandu requested a review from msaroufim March 27, 2023 17:37

morgandu marked this pull request as ready for review March 27, 2023 17:38

morgandu force-pushed the torch-xla branch from 9447d8e to 0986476 Compare March 28, 2023 20:37

pytorch deleted a comment from morgandu Mar 28, 2023

morgandu force-pushed the torch-xla branch from 0986476 to f4a9619 Compare March 29, 2023 19:14

morgandu force-pushed the torch-xla branch from e48940c to fa022ec Compare March 31, 2023 20:19

cloud-tpu-inference-github-bot added kokoro:force-run Triggers a kokoro presubmit on a pull request and removed kokoro:force-run Triggers a kokoro presubmit on a pull request labels Mar 31, 2023

morgandu added 3 commits April 5, 2023 04:05

enable torch_xla

8c67d76

add venv to .gitignore

7ed77b4

add torch_xla 2.0 baackend; update map_location and device; remove ma…

80cb92f

…rk_step

morgandu added 3 commits April 5, 2023 04:05

clear map_location for torchxla

f8bf8c3

format

3fa5faf

add torch_xla test

4537fbb

morgandu force-pushed the torch-xla branch from fa022ec to 4537fbb Compare April 5, 2023 22:15

msaroufim approved these changes Apr 6, 2023

View reviewed changes

msaroufim requested a review from lxning April 6, 2023 05:09

cloud-tpu-inference-github-bot added kokoro:force-run Triggers a kokoro presubmit on a pull request and removed kokoro:force-run Triggers a kokoro presubmit on a pull request labels Apr 6, 2023

lxning approved these changes Apr 11, 2023

View reviewed changes

lxning merged commit 4ea172d into pytorch:master Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add PyTorch/XLA support #2182

feat: add PyTorch/XLA support #2182

morgandu commented Mar 16, 2023

msaroufim left a comment

msaroufim Mar 16, 2023

morgandu Mar 25, 2023

msaroufim Mar 16, 2023

lxning Mar 17, 2023 •

edited

Loading

morgandu Mar 25, 2023 •

edited

Loading

yyetim commented Mar 17, 2023

msaroufim commented Mar 18, 2023 •

edited

Loading

morgandu commented Mar 25, 2023

morgandu commented Mar 25, 2023 •

edited

Loading

codecov bot commented Mar 28, 2023 •

edited

Loading

msaroufim commented Apr 4, 2023

morgandu commented Apr 5, 2023

msaroufim commented Apr 6, 2023

morgandu commented Apr 6, 2023

morgandu commented Apr 10, 2023

		@@ -59,6 +59,24 @@ def check_pt2_enabled():
		)


		def check_torch_xla_enabled() -> bool:

feat: add PyTorch/XLA support #2182

feat: add PyTorch/XLA support #2182

Conversation

morgandu commented Mar 16, 2023

Description

Type of change

msaroufim left a comment

Choose a reason for hiding this comment

msaroufim Mar 16, 2023

Choose a reason for hiding this comment

morgandu Mar 25, 2023

Choose a reason for hiding this comment

msaroufim Mar 16, 2023

Choose a reason for hiding this comment

lxning Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

morgandu Mar 25, 2023 • edited Loading

Choose a reason for hiding this comment

yyetim commented Mar 17, 2023

msaroufim commented Mar 18, 2023 • edited Loading

morgandu commented Mar 25, 2023

morgandu commented Mar 25, 2023 • edited Loading

codecov bot commented Mar 28, 2023 • edited Loading

Codecov Report

msaroufim commented Apr 4, 2023

morgandu commented Apr 5, 2023

msaroufim commented Apr 6, 2023

morgandu commented Apr 6, 2023

morgandu commented Apr 10, 2023

lxning Mar 17, 2023 •

edited

Loading

morgandu Mar 25, 2023 •

edited

Loading

msaroufim commented Mar 18, 2023 •

edited

Loading

morgandu commented Mar 25, 2023 •

edited

Loading

codecov bot commented Mar 28, 2023 •

edited

Loading