[Torchscript] Adds GPU-enabled input types for Vector and Timeseries #2197

geoffreyangus · 2022-06-27T14:00:28Z

This PR enables torchscript users to pass in Union[torch.Tensor, List[torch.Tensor]] objects for Vector input features and List[torch.Tensor] objects for Timeseries input features in order to better utilize GPU resources.

Prior to this change, users could only pass Vector and Timeseries features as List[str] objects, which required stripping and parsing each sample into torch.Tensor objects on CPU, which can be slow. With this change, users now have the option to pass in torch.Tensor or List[torch.Tensor] objects, which can be operated upon on GPU.

…imeseries

github-actions · 2022-06-27T14:49:14Z

Unit Test Results

      6 files +    1       6 suites +1 2h 17m 36s ⏱️ + 34m 26s
2 886 tests +    6 2 840 ✔️ +    6   46 💤 ±  0 0 ❌ ±0
8 658 runs +179 8 516 ✔️ +156 142 💤 +23 0 ❌ ±0

Results for commit 74a02d3. ± Comparison against base commit c26e81a.

♻️ This comment has been updated with latest results.

justinxzhao · 2022-06-27T21:30:20Z

ludwig/data/dataset_synthesizer.py

@@ -247,7 +247,9 @@ def generate_text(feature):

 def generate_timeseries(feature):
    series = []
-    for _ in range(feature.get("max_len", 10)):
+    max_len = feature.get("max_len", 10)


nit: Let's make 10 a default function parameter.

justinxzhao · 2022-06-27T21:32:41Z

ludwig/features/timeseries_feature.py

+        if torch.jit.isinstance(v, List[torch.Tensor]):
+            return self.forward_list_of_tensors(v)
+        elif torch.jit.isinstance(v, List[str]):
+            return self.forward_list_of_strs(v)
+        else:
+            raise ValueError(f"Unsupported input: {v}")


nit (personal preference):

Suggested change

if torch.jit.isinstance(v, List[torch.Tensor]):

return self.forward_list_of_tensors(v)

elif torch.jit.isinstance(v, List[str]):

return self.forward_list_of_strs(v)

else:

raise ValueError(f"Unsupported input: {v}")

if torch.jit.isinstance(v, List[torch.Tensor]):

return self.forward_list_of_tensors(v)

if torch.jit.isinstance(v, List[str]):

return self.forward_list_of_strs(v)

raise ValueError(f"Unsupported input: {v}")

justinxzhao · 2022-06-27T21:41:34Z

ludwig/features/timeseries_feature.py

+        if v.isnan().any():
+            if self.computed_fill_value == "":
+                v = torch.nan_to_num(v, nan=self.padding_value)
+            else:
+                raise ValueError(f"Fill value must be empty string. Got {self.computed_fill_value}.")
+        return v


Suggested change

if v.isnan().any():

if self.computed_fill_value == "":

v = torch.nan_to_num(v, nan=self.padding_value)

else:

raise ValueError(f"Fill value must be empty string. Got {self.computed_fill_value}.")

return v

if not v.isnan().any():

# No nans to replace.

return v

if v.isnan().any() and self.computed_fill_value != "":

# Nans present, but fill value is non-empty. (Question: why does the fill value have to be an empty string?)

raise ValueError(f"Fill value must be empty string. Got {self.computed_fill_value}.")

return torch.nan_to_num(v, nan=self.padding_value)

Great question– I've actually updated the function to support all computed_fill_value values. Thanks for the inspiration!

justinxzhao · 2022-06-27T21:51:19Z

tests/integration_tests/test_torchscript.py

+        feature_name = feature_name_expected[: feature_name_expected.rfind("_")]  # remove proc suffix
+        if feature_name not in preproc_inputs.keys():
+            continue


This seems potentially brittle like if the feature naming changes, then this test won't actually check anything. Are output features the only feature that wouldn't have an entry in preproc_inputs? Perhaps we do a hard continue, only for the output feature.

Alternatively, if we could make this feature_name_expected[: feature_name_expected.rfind("_")] # remove proc suffix into a tested function that's guarantees that it's in sync with preproc module feature naming, that would feel a bit more robust.

Good point. There should actually be no mismatching values in between the two dictionaries, so I've changed the conditional to an assert. Thanks!

justinxzhao · 2022-06-27T21:51:37Z

tests/integration_tests/test_torchscript.py

-        skip_save_progress=True,
-        skip_save_log=True,
-        skip_save_processed_input=True,
+    ludwig_model, script_module = initialize_ludwig_model_and_scripted_module(


Nice simplification.

justinxzhao

LGTM

geoffreyangus added 2 commits June 27, 2022 15:53

[Torchscript] Adds alternative inference input types for Vector and T…

f8bc897

…imeseries

cleanup

3dd5019

added left padding logic

ef5e204

geoffreyangus marked this pull request as ready for review June 27, 2022 18:10

geoffreyangus requested review from justinxzhao and tgaddair and removed request for justinxzhao June 27, 2022 18:10

justinxzhao reviewed Jun 27, 2022

View reviewed changes

geoffreyangus changed the title ~~[Torchscript] Adds alternative input types for Vector and Timeseries~~ [Torchscript] Adds GPU-enabled input types for Vector and Timeseries Jun 27, 2022

geoffreyangus added 4 commits June 28, 2022 14:40

Merge branch 'master' into ts-add-timeseries-vector-alt-dtype

8b73a7b

address PR comments

27b96b7

update function call

a886752

rename

74a02d3

justinxzhao approved these changes Jun 28, 2022

View reviewed changes

geoffreyangus merged commit d0e8439 into master Jun 28, 2022

geoffreyangus deleted the ts-add-timeseries-vector-alt-dtype branch June 28, 2022 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torchscript] Adds GPU-enabled input types for Vector and Timeseries #2197

[Torchscript] Adds GPU-enabled input types for Vector and Timeseries #2197

geoffreyangus commented Jun 27, 2022 •

edited

Loading

github-actions bot commented Jun 27, 2022 •

edited

Loading

justinxzhao Jun 27, 2022

justinxzhao Jun 27, 2022

justinxzhao Jun 27, 2022

geoffreyangus Jun 28, 2022

justinxzhao Jun 27, 2022

geoffreyangus Jun 28, 2022

justinxzhao Jun 27, 2022

justinxzhao left a comment

[Torchscript] Adds GPU-enabled input types for Vector and Timeseries #2197

[Torchscript] Adds GPU-enabled input types for Vector and Timeseries #2197

Conversation

geoffreyangus commented Jun 27, 2022 • edited Loading

github-actions bot commented Jun 27, 2022 • edited Loading

Unit Test Results

justinxzhao Jun 27, 2022

Choose a reason for hiding this comment

justinxzhao Jun 27, 2022

Choose a reason for hiding this comment

justinxzhao Jun 27, 2022

Choose a reason for hiding this comment

geoffreyangus Jun 28, 2022

Choose a reason for hiding this comment

justinxzhao Jun 27, 2022

Choose a reason for hiding this comment

geoffreyangus Jun 28, 2022

Choose a reason for hiding this comment

justinxzhao Jun 27, 2022

Choose a reason for hiding this comment

justinxzhao left a comment

Choose a reason for hiding this comment

geoffreyangus commented Jun 27, 2022 •

edited

Loading

github-actions bot commented Jun 27, 2022 •

edited

Loading