Some cleanups to the model spec documentation. (#27)

Summary: Clarify Android-only support for certain types. Clean up descriptions of certain elements in the spec. Fix a few typos. The model spec documentation is confusing in a few cases, and some types are not supported on iOS (see #26). ## Changelog [DOCS] Improve model spec documentation Pull Request resolved: #27 Test Plan: There are no code changes in this PR. **Static Docs Preview: pytorch-live** |[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33418396/V3/pytorch-live/)| |**Modified Pages**| |[docs/tutorials/model-spec](https://our.intern.facebook.com/intern/staticdocs/eph/D33418396/V3/pytorch-live/docs/tutorials/model-spec/)| |[docs/api/model-spec](https://our.intern.facebook.com/intern/staticdocs/eph/D33418396/V3/pytorch-live/docs/api/model-spec/)| Reviewed By: clarksandholtz Differential Revision: D33418396 Pulled By: raedle fbshipit-source-id: 1d154f4f58b1a634f5d3b732b05647a73e2a0208
facebookresearch · Jan 19, 2022 · 620b4d5 · 620b4d5
1 parent 80e7bef
commit 620b4d5
Show file tree

Hide file tree

Showing 2 changed files with 86 additions and 94 deletions.
diff --git a/website/docs/api/model-spec.md b/website/docs/api/model-spec.md
@@ -7,9 +7,7 @@ sidebar_position: 5
 
 <div className="tutorial-page">
 
-Model specification specifies the structure of model input and output, allowing the use of prebuilt transformations.
-
-It is stored as [extra_file of the model](https://pytorch.org/docs/stable/generated/torch.jit.load.html#torch.jit.load) 'model/live.spec.json'.
+A PyTorch Live model consists of two components: (1) A model file saved for the PyTorch "lite" interpreter format; and (2) a JSON file with details on the model input and output types. The JSON file is stored within the model file itself as an [extra_file of the model](https://pytorch.org/docs/stable/generated/torch.jit.load.html#torch.jit.load) with the name `model/live.spec.json`.
 
 Example of model with specification preparation:
 
@@ -20,22 +18,25 @@ import torch
 import torchvision
 from torch.utils.mobile_optimizer import optimize_for_mobile
 
+# Get the original PyTorch model and convert it to mobile-optimized
+# TorchScript.
 model = torchvision.models.mobilenet_v3_small(pretrained=True)
 model.eval()
 script_model = torch.jit.script(model)
 script_model_opt = optimize_for_mobile(script_model)
+
+# Read the live.spec.json file and embed it into the model file.
 spec = Path("live.spec.json").read_text()
 extra_files = {}
 extra_files["model/live.spec.json"] = spec
 script_model_opt._save_for_lite_interpreter("model_with_spec.ptl", _extra_files=extra_files)
 ```
 
-'model/live.spec.json' is a valid JSON file.
-Which contains `pack` and `unpack` objects and may contain other root objects that will be used by both pack (input preprocessing) and unpack (model output post processing) functionality.
+The `model/live.spec.json` file is a valid JSON file that contains two objects: `pack` and `unpack` objects. It may also contain other root objects that will be used by both pack (input preprocessing) and unpack (model output post processing) functionality.
 
 The JavaScript side calls the model to forward specifying a plain javascript object that contains `$key` members of predefined types (Image, double, integer, string).
 
-'model/live.spec.json' contains `"$key"` stubs that will be replaced with the values from the specified javascript object.
+`model/live.spec.json` contains `"$key"` stubs that will be replaced with the values from the specified JavaScript object.
 
 Example:
 ```json title=model/live.spec.json
@@ -120,36 +121,36 @@ const {
 });
 ```
 
+## Pack - Input preprocessing
 
-## Pack, Input preprocessing
+The input processing required for the model is specified by `pack` object. Every object in `pack` has a `type` field, other fields are specific to that `type`.
 
-Specified by `pack` object. Which represents the structure of the model input (which is torchscript (python) object that may contain PyTorch Tensors and plain python types (scalars, arrays, lists). Every object in `pack` has a `type` field, other fields are specific to that `type`.
+### Types supported for `"pack"`
 
-### Types
-- `tuple`
+- `tuple` *(currently supported on Android only)*
    - `items`: array of the tuple items
-- `scalar_bool`
-   - `value`: true or false
-- `scalar_long`
+- `scalar_bool` *(currently supported on Android only)*
+   - `value`: `true` or `false`
+- `scalar_long` *(currently supported on Android only)*
    - `value`: long value
-- `scalar_double`
+- `scalar_double` *(currently supported on Android only)*
    - `value`: double value
-- `tensor`
-   - `dtype`: data type of the tensor "float" or "long"
+- `tensor` *(currently supported on Android only)*
+   - `dtype`: data type of the tensor (`"float"` or `"long"`)
    - `items`: array of tensor data of specified dtype
 - `tensor_from_image`
-   - `image`: js image object
-   - `transforms`: array of chained transformations on the input image, the type `ImageTransform`:
+   - `image`: JavaScript image object
+   - `transforms`: array of chained transformations on the input image of type `ImageTransform` (see below)
 - `tensor_from_string`
    - `tokenizer`:
        - `bert`:
-           Prepares tensor dtype=long of token ids using bert vocabulary from  `.vocabulary_bert` of spec json.
+           Prepares tensor dtype=long of token ids using a BERT vocabulary. The vocabulary used to encode inputs must be stored in the top-level key `vocabulary_bert` in the spec JSON object. It should be a string with BERT tokens separated with `\n`.
        - `gpt2`:
-           Prepares tensor dtype=long of token ids using bert vocabulary from  `.vocabulary_gpt2` of spec json.
+           Prepares tensor dtype=long of token ids using a GPT2 vocabulary. The vocabulary used to encode inputs must be stored in the top-level key `vocabulary_gpt2` in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
 
+### Type `ImageTransform`
 
-### Type `ImageTransform`:
-- type: "image_to_image" or "image_to_tensor"
+- type: `"image_to_image"` or `"image_to_tensor"`
 - name: the name of transformation
 - additional parameters specific to the particular type and name
 
@@ -158,7 +159,7 @@ Specified by `pack` object. Which represents the structure of the model input (w
        Crops from the center part of the image with specified width and height.
        parameters:
        - `width`: width of the result cropped image
-       - `height`: width of the result cropped image
+       - `height`: height of the result cropped image
    - `name`: `scale`
        Scales input image to specified width and height.
        parameters:
@@ -169,46 +170,41 @@ Specified by `pack` object. Which represents the structure of the model input (w
    - name: `rgb_norm`
        The output is NCHW tensor from input image, normalized by specified mean and std.
        parameters:
-       - `mean`: array of 3 float numbers with values of mean for normalization
-       - `std`: array of 3 float numbers with values of std for normalization
+       - `mean`: array of 3 float numbers with values of mean for normalization (one value per channel)
+       - `std`: array of 3 float numbers with values of std for normalization (one value per channel)
 
-## Unpack, Output post processing.
+## Unpack - Output post-processing
 
-The result of post processing is a plain javascript object (will call it output_jsmap further).
+The result of model post processing is a plain JavaScript object, referred to below as `output_jsmap`.
 
-`unpack` object is the recursive structure of objects of predefined `type`s.
+The `unpack` object is a recursive structure of objects of predefined `type`s.
 
-unpack type:
-   - `tuple`
-       - `items`: tuple items to unpack
-   - `list`
-       - `items`: list items to unpack
-   - `dict_string_key`
-       - `items`: tuple items to unpack
+### Types supported for `"unpack"`
+
+   - `tuple` *(currently supported on Android only)*
+       - `items`: An array of `unpack` objects, one per tuple item to unpack.
+   - `list` *(currently supported on Android only)*
+       - `items`: An array of `unpack` objects, one per list item to unpack.
+   - `dict_string_key` *(currently supported on Android only)*
+       - `items`: An array of objects of the form `{"dict_key": <string value>}` where each `dict_key` is a string key into a dictionary returned by the model. The unpacked values will be those entries in the dictionary specified by each `dict_key`.
    - `tensor`
        - `key`: key of the array of specified data type that contains tensor items in NCHW format.
        - `dtype`: data of the tensor "float" or "long"
-   - `scalar_long`:
+   - `scalar_long`: *(currently supported on Android only)*
        - `key`: key of the long value in output_jsmap
-   - `scalar_float`:
+   - `scalar_float`: *(currently supported on Android only)*
        - `key`: key of the double value in output_jsmap
-   - `scalar_bool`:
+   - `scalar_bool`: *(currently supported on Android only)*
        - `key`: key of the bool value in output_jsmap
    - `string`:
        - `key`: key of the string in output_jsmap
-   - `tensor_to_string`:
+   - `tensor_to_string`: *(currently supported on Android only)*
        - `key`: key of the result string in output_jsmap
        - `decoder`:
            `gpt2`:
-               Expects tensor of long data type containing tokenIds. Decodes tokenIds using vocabulary in `.vocabulary_gpt2` in the spec.
+               Expects tensor of long data type containing tokenIds. The vocabulary used to decode results must be stored in the top-level key `vocabulary_gpt2` in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
    - `bert_decode_qa_answer`:
-       - `key`: key of the result string in output_jsmap
-
-
-
-`.vocabulary_gpt2` expected json object containing `\"key\"=id`
-
-`.vocabulary_bert` expected string containing bert tokens separated with `\n`
+       - `key`: key of the result string in output_jsmap. The vocabulary used to decode results must be stored in the top-level key `vocabulary_bert` in the spec JSON object. It should contain a string with BERT tokens separated with `\n`.
 
 ## Examples
 

diff --git a/website/docs/tutorials/model-spec.mdx b/website/docs/tutorials/model-spec.mdx
@@ -9,9 +9,7 @@ import SurveyLinkButton from '@site/src/components/SurveyLinkButton';
 
 <div className="tutorial-page">
 
-Model specification specifies the structure of model input and output, allowing the use of prebuilt transformations.
-
-It is stored as [extra_file of the model](https://pytorch.org/docs/stable/generated/torch.jit.load.html#torch.jit.load) 'model/live.spec.json'.
+A PyTorch Live model consists of two components: (1) A model file saved for the PyTorch "lite" interpreter format; and (2) a JSON file with details on the model input and output types. The JSON file is stored within the model file itself as an [extra_file of the model](https://pytorch.org/docs/stable/generated/torch.jit.load.html#torch.jit.load) with the name `model/live.spec.json`.
 
 Example of model with specification preparation:
 
@@ -22,22 +20,25 @@ import torch
 import torchvision
 from torch.utils.mobile_optimizer import optimize_for_mobile
 
+# Get the original PyTorch model and convert it to mobile-optimized
+# TorchScript.
 model = torchvision.models.mobilenet_v3_small(pretrained=True)
 model.eval()
 script_model = torch.jit.script(model)
 script_model_opt = optimize_for_mobile(script_model)
+
+# Read the live.spec.json file and embed it into the model file.
 spec = Path("live.spec.json").read_text()
 extra_files = {}
 extra_files["model/live.spec.json"] = spec
 script_model_opt._save_for_lite_interpreter("model_with_spec.ptl", _extra_files=extra_files)
 ```
 
-'model/live.spec.json' is a valid JSON file.
-Which contains `pack` and `unpack` objects and may contain other root objects that will be used by both pack (input preprocessing) and unpack (model output post processing) functionality.
+The `model/live.spec.json` file is a valid JSON file that contains two objects: `pack` and `unpack` objects. It may also contain other root objects that will be used by both pack (input preprocessing) and unpack (model output post processing) functionality.
 
 The JavaScript side calls the model to forward specifying a plain javascript object that contains `$key` members of predefined types (Image, double, integer, string).
 
-'model/live.spec.json' contains `"$key"` stubs that will be replaced with the values from the specified javascript object.
+`model/live.spec.json` contains `"$key"` stubs that will be replaced with the values from the specified JavaScript object.
 
 Example:
 ```json title=model/live.spec.json
@@ -122,36 +123,36 @@ const {
 });
 ```
 
+## Pack - Input preprocessing
 
-## Pack, Input preprocessing
+The input processing required for the model is specified by `pack` object. Every object in `pack` has a `type` field, other fields are specific to that `type`.
 
-Specified by `pack` object. Which represents the structure of the model input (which is torchscript (python) object that may contain PyTorch Tensors and plain python types (scalars, arrays, lists). Every object in `pack` has a `type` field, other fields are specific to that `type`.
+### Types supported for `"pack"`
 
-### Types
-- `tuple`
+- `tuple` *(currently supported on Android only)*
    - `items`: array of the tuple items
-- `scalar_bool`
-   - `value`: true or false
-- `scalar_long`
+- `scalar_bool` *(currently supported on Android only)*
+   - `value`: `true` or `false`
+- `scalar_long` *(currently supported on Android only)*
    - `value`: long value
-- `scalar_double`
+- `scalar_double` *(currently supported on Android only)*
    - `value`: double value
-- `tensor`
-   - `dtype`: data type of the tensor "float" or "long"
+- `tensor` *(currently supported on Android only)*
+   - `dtype`: data type of the tensor (`"float"` or `"long"`)
    - `items`: array of tensor data of specified dtype
 - `tensor_from_image`
-   - `image`: js image object
-   - `transforms`: array of chained transformations on the input image, the type `ImageTransform`:
+   - `image`: JavaScript image object
+   - `transforms`: array of chained transformations on the input image of type `ImageTransform` (see below)
 - `tensor_from_string`
    - `tokenizer`:
        - `bert`:
-           Prepares tensor dtype=long of token ids using bert vocabulary from  `.vocabulary_bert` of spec json.
+           Prepares tensor dtype=long of token ids using a BERT vocabulary. The vocabulary used to encode inputs must be stored in the top-level key `vocabulary_bert` in the spec JSON object. It should be a string with BERT tokens separated with `\n`.
        - `gpt2`:
-           Prepares tensor dtype=long of token ids using bert vocabulary from  `.vocabulary_gpt2` of spec json.
+           Prepares tensor dtype=long of token ids using a GPT2 vocabulary. The vocabulary used to encode inputs must be stored in the top-level key `vocabulary_gpt2` in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
 
+### Type `ImageTransform`
 
-### Type `ImageTransform`:
-- type: "image_to_image" or "image_to_tensor"
+- type: `"image_to_image"` or `"image_to_tensor"`
 - name: the name of transformation
 - additional parameters specific to the particular type and name
 
@@ -160,7 +161,7 @@ Specified by `pack` object. Which represents the structure of the model input (w
        Crops from the center part of the image with specified width and height.
        parameters:
        - `width`: width of the result cropped image
-       - `height`: width of the result cropped image
+       - `height`: height of the result cropped image
    - `name`: `scale`
        Scales input image to specified width and height.
        parameters:
@@ -171,46 +172,41 @@ Specified by `pack` object. Which represents the structure of the model input (w
    - name: `rgb_norm`
        The output is NCHW tensor from input image, normalized by specified mean and std.
        parameters:
-       - `mean`: array of 3 float numbers with values of mean for normalization
-       - `std`: array of 3 float numbers with values of std for normalization
+       - `mean`: array of 3 float numbers with values of mean for normalization (one value per channel)
+       - `std`: array of 3 float numbers with values of std for normalization (one value per channel)
 
-## Unpack, Output post processing.
+## Unpack - Output post-processing
 
-The result of post processing is a plain javascript object (will call it output_jsmap further).
+The result of model post processing is a plain JavaScript object, referred to below as `output_jsmap`.
 
-`unpack` object is the recursive structure of objects of predefined `type`s.
+The `unpack` object is a recursive structure of objects of predefined `type`s.
 
-unpack type:
-   - `tuple`
-       - `items`: tuple items to unpack
-   - `list`
-       - `items`: list items to unpack
-   - `dict_string_key`
-       - `items`: tuple items to unpack
+### Types supported for `"unpack"`
+
+   - `tuple` *(currently supported on Android only)*
+       - `items`: An array of `unpack` objects, one per tuple item to unpack.
+   - `list` *(currently supported on Android only)*
+       - `items`: An array of `unpack` objects, one per list item to unpack.
+   - `dict_string_key` *(currently supported on Android only)*
+       - `items`: An array of objects of the form `{"dict_key": <string value>}` where each `dict_key` is a string key into a dictionary returned by the model. The unpacked values will be those entries in the dictionary specified by each `dict_key`.
    - `tensor`
        - `key`: key of the array of specified data type that contains tensor items in NCHW format.
        - `dtype`: data of the tensor "float" or "long"
-   - `scalar_long`:
+   - `scalar_long`: *(currently supported on Android only)*
        - `key`: key of the long value in output_jsmap
-   - `scalar_float`:
+   - `scalar_float`: *(currently supported on Android only)*
        - `key`: key of the double value in output_jsmap
-   - `scalar_bool`:
+   - `scalar_bool`: *(currently supported on Android only)*
        - `key`: key of the bool value in output_jsmap
    - `string`:
        - `key`: key of the string in output_jsmap
-   - `tensor_to_string`:
+   - `tensor_to_string`: *(currently supported on Android only)*
        - `key`: key of the result string in output_jsmap
        - `decoder`:
            `gpt2`:
-               Expects tensor of long data type containing tokenIds. Decodes tokenIds using vocabulary in `.vocabulary_gpt2` in the spec.
+               Expects tensor of long data type containing tokenIds. The vocabulary used to decode results must be stored in the top-level key `vocabulary_gpt2` in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
    - `bert_decode_qa_answer`:
-       - `key`: key of the result string in output_jsmap
-
-
-
-`.vocabulary_gpt2` expected json object containing `\"key\"=id`
-
-`.vocabulary_bert` expected string containing bert tokens separated with `\n`
+       - `key`: key of the result string in output_jsmap. The vocabulary used to decode results must be stored in the top-level key `vocabulary_bert` in the spec JSON object. It should contain a string with BERT tokens separated with `\n`.
 
 ## Examples