Skip to content

Commit

Permalink
fix user_guide and tutorial docs
Browse files Browse the repository at this point in the history
  • Loading branch information
yoosful authored May 20, 2024
1 parent c21866b commit 653e524
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 25 deletions.
14 changes: 7 additions & 7 deletions docsrc/tutorials/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ and running it to test the speedup obtained.
* `Torch-TensorRT Getting Started - CitriNet <https://github.com/pytorch/TensorRT/blob/master/notebooks/CitriNet-example.ipynb>`_


Compiling EfficentNet with Torch-TensorRT
Compiling EfficientNet with Torch-TensorRT
********************************************

EfficentNet is a feedforward CNN designed to achieve better performance and accuracy than alternative architectures
EfficientNet is a feedforward CNN designed to achieve better performance and accuracy than alternative architectures
by using a "scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient".

This notebook demonstrates the steps for optimizing a pretrained EfficentNet model with Torch-TensorRT,
This notebook demonstrates the steps for optimizing a pretrained EfficientNet model with Torch-TensorRT,
and running it to test the speedup obtained.

* `Torch-TensorRT Getting Started - EfficientNet-B0 <https://github.com/pytorch/TensorRT/blob/master/notebooks/EfficientNet-example.ipynb>`_
Expand All @@ -43,7 +43,7 @@ This way, the model learns an inner representation of the English language that
features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train
a standard classifier using the features produced by the BERT model as inputs." (https://huggingface.co/bert-base-uncased)

This notebook demonstrates the steps for optimizing a pretrained EfficentNet model with Torch-TensorRT,
This notebook demonstrates the steps for optimizing a pretrained EfficientNet model with Torch-TensorRT,
and running it to test the speedup obtained.

* `Masked Language Modeling (MLM) with Hugging Face BERT Transformer <https://github.com/pytorch/TensorRT/blob/master/notebooks/Hugging-Face-BERT.ipynb>`_
Expand Down Expand Up @@ -73,7 +73,7 @@ Using Dynamic Shapes with Torch-TensorRT

Making use of Dynamic Shaped Tensors in Torch TensorRT is quite simple. Let's say you are
using the ``torch_tensorrt.compile(...)`` function to compile a torchscript module. One
of the args in this function in this function is ``input``: which defines an input to a
of the args in this function is ``input``: which defines an input to a
module in terms of expected shape, data type and tensor format: ``torch_tensorrt.Input.``

For the purposes of this walkthrough we just need three kwargs: `min_shape`, `opt_shape`` and `max_shape`.
Expand All @@ -96,8 +96,8 @@ In this example, we are going to use a simple ResNet model to demonstrate the us
Using the FX Frontend with Torch-TensorRT
********************************************

The purpose of this example is to demostrate the overall flow of lowering a PyTorch model to TensorRT
conveniently with using FX.
The purpose of this example is to demonstrate the overall flow of lowering a PyTorch model to TensorRT
conveniently using FX.

* `Using the FX Frontend with Torch-TensorRT <https://github.com/pytorch/TensorRT/blob/master/notebooks/getting_started_with_fx_path_lower_to_trt.ipynb>`_

Expand Down
10 changes: 5 additions & 5 deletions docsrc/tutorials/serving_torch_tensorrt_with_triton.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ Serving a Torch-TensorRT model with Triton
==========================================

Optimization and deployment go hand in hand in a discussion about Machine
Learning infrastructure. Once network level optimzation are done
Learning infrastructure. Once network level optimization are done
to get the maximum performance, the next step would be to deploy it.

However, serving this optimized model comes with it's own set of considerations
and challenges like: building an infrastructure to support concorrent model
However, serving this optimized model comes with its own set of considerations
and challenges like: building an infrastructure to support concurrent model
executions, supporting clients over HTTP or gRPC and more.

The `Triton Inference Server <https://github.com/triton-inference-server/server>`__
Expand Down Expand Up @@ -67,7 +67,7 @@ highly recommend to checking our `Github
Repository <https://github.com/triton-inference-server>`__.

To use Triton, we need to make a model repository. A model repository, as the
name suggested, is a repository of the models the Inference server hosts. While
name suggests, is a repository of the models the Inference server hosts. While
Triton can serve models from multiple repositories, in this example, we will
discuss the simplest possible form of the model repository.

Expand Down Expand Up @@ -204,7 +204,7 @@ Lastly, we send an inference request to the Triton Inference Server.
inference_output = results.as_numpy('output__0')
print(inference_output[:5])

The output of the same should look like below:
The output should look like below:

::

Expand Down
4 changes: 2 additions & 2 deletions docsrc/user_guide/dynamic_shapes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Dynamic shapes with Torch-TensorRT
By default, you can run a pytorch model with varied input shapes and the output shapes are determined eagerly.
However, Torch-TensorRT is an AOT compiler which requires some prior information about the input shapes to compile and optimize the model.
In the case of dynamic input shapes, we must provide the (min_shape, opt_shape, max_shape) arguments so that the model can be optimized for
these range of input shapes. An example usage of static and dynamic shapes is as follows.
this range of input shapes. An example usage of static and dynamic shapes is as follows.

NOTE: The following code uses Dynamo Frontend. Incase of Torchscript Frontend, please swap out ``ir=dynamo`` with ``ir=ts`` and the behavior is exactly the same.
NOTE: The following code uses Dynamo Frontend. In case of Torchscript Frontend, please swap out ``ir=dynamo`` with ``ir=ts`` and the behavior is exactly the same.

.. code-block:: python
Expand Down
10 changes: 5 additions & 5 deletions docsrc/user_guide/ptq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ Users writing TensorRT applications are required to setup a calibrator class whi
the TensorRT calibrator. With Torch-TensorRT we look to leverage existing infrastructure in PyTorch to make implementing
calibrators easier.

LibTorch provides a ``DataLoader`` and ``Dataset`` API which steamlines preprocessing and batching input data.
LibTorch provides a ``DataLoader`` and ``Dataset`` API which streamlines preprocessing and batching input data.
These APIs are exposed via both C++ and Python interface which makes it easier for the end user.
For C++ interface, we use ``torch::Dataset`` and ``torch::data::make_data_loader`` objects to construct and perform pre-processing on datasets.
The equivalent functionality in python interface uses ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``.
This section of the PyTorch documentation has more information https://pytorch.org/tutorials/advanced/cpp_frontend.html#loading-data and https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html.
Torch-TensorRT uses Dataloaders as the base of a generic calibrator implementation. So you will be able to reuse or quickly
implement a ``torch::Dataset`` for your target domain, place it in a DataLoader and create a INT8 Calibrator
which you can provide to Torch-TensorRT to run INT8 Calibration during compliation of your module.
implement a ``torch::Dataset`` for your target domain, place it in a DataLoader and create an INT8 Calibrator
which you can provide to Torch-TensorRT to run INT8 Calibration during compilation of your module.

.. _writing_ptq_cpp:

Expand Down Expand Up @@ -108,7 +108,7 @@ Next we create a calibrator from the ``calibration_dataloader`` using the calibr

Here we also define a location to write a calibration cache file to which we can use to reuse the calibration data without needing the dataset and whether or not
we should use the cache file if it exists. There also exists a ``torch_tensorrt::ptq::make_int8_cache_calibrator`` factory which creates a calibrator that uses the cache
only for cases where you may do engine building on a machine that has limited storage (i.e. no space for a full dataset) or to have a simpiler deployment application.
only for cases where you may do engine building on a machine that has limited storage (i.e. no space for a full dataset) or to have a simpler deployment application.

The calibrator factories create a calibrator that inherits from a ``nvinfer1::IInt8Calibrator`` virtual class (``nvinfer1::IInt8EntropyCalibrator2`` by default) which
defines the calibration algorithm used when calibrating. You can explicitly make the selection of calibration algorithm like this:
Expand All @@ -118,7 +118,7 @@ defines the calibration algorithm used when calibrating. You can explicitly make
// MinMax Calibrator is geared more towards NLP tasks
auto calibrator = torch_tensorrt::ptq::make_int8_calibrator<nvinfer1::IInt8MinMaxCalibrator>(std::move(calibration_dataloader), calibration_cache_file, true);

Then all thats required to setup the module for INT8 calibration is to set the following compile settings in the `torch_tensorrt::CompileSpec` struct and compiling the module:
Then all that's required to setup the module for INT8 calibration is to set the following compile settings in the `torch_tensorrt::CompileSpec` struct and compiling the module:

.. code-block:: c++

Expand Down
8 changes: 4 additions & 4 deletions docsrc/user_guide/runtime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Deploying Torch-TensorRT Programs
====================================

After compiling and saving Torch-TensorRT programs there is no longer a strict dependency on the full
Torch-TensorRT library. All that is required to run a compiled program is the runtime. There are therfore a couple
Torch-TensorRT library. All that is required to run a compiled program is the runtime. There are therefore a couple
options to deploy your programs other than shipping the full Torch-TensorRT compiler with your applications.

Torch-TensorRT package / libtorchtrt.so
Expand All @@ -24,7 +24,7 @@ programs just as you would otherwise via PyTorch API.

.. note:: If you are using the standard distribution of PyTorch in Python on x86, likely you will need the pre-cxx11-abi variant of ``libtorchtrt_runtime.so``, check :ref:`Installation` documentation for more details.

.. note:: If you are linking ``libtorchtrt_runtime.so``, likely using the following flags will help ``-Wl,--no-as-needed -ltorchtrt -Wl,--as-needed`` as theres no direct symbol dependency to anything in the Torch-TensorRT runtime for most Torch-TensorRT runtime applications
.. note:: If you are linking ``libtorchtrt_runtime.so``, likely using the following flags will help ``-Wl,--no-as-needed -ltorchtrt -Wl,--as-needed`` as there's no direct symbol dependency to anything in the Torch-TensorRT runtime for most Torch-TensorRT runtime applications

An example of how to use ``libtorchtrt_runtime.so`` can be found here: https://github.com/pytorch/TensorRT/tree/master/examples/torchtrt_runtime_example

Expand All @@ -33,7 +33,7 @@ Plugin Library

In the case you use Torch-TensorRT as a converter to a TensorRT engine and your engine uses plugins provided by Torch-TensorRT, Torch-TensorRT
ships the library ``libtorchtrt_plugins.so`` which contains the implementation of the TensorRT plugins used by Torch-TensorRT during
compilation. This library can be ``DL_OPEN`` or ``LD_PRELOAD`` similar to other TensorRT plugin libraries.
compilation. This library can be ``DL_OPEN`` or ``LD_PRELOAD`` similarly to other TensorRT plugin libraries.

Multi Device Safe Mode
---------------
Expand All @@ -60,7 +60,7 @@ doubles as a context manager.
TensorRT requires that each engine be associated with the CUDA context in the active thread from which it is invoked.
Therefore, if the device were to change in the active thread, which may be the case when invoking
engines on multiple GPUs from the same Python process, safe mode will cause Torch-TensorRT to display
an alert and switch GPUs accordingly. If safe mode were not enabled, there could be a mismatch in the engine
an alert and switch GPUs accordingly. If safe mode is not enabled, there could be a mismatch in the engine
device and CUDA context device, which could lead the program to crash.

One technique for managing multiple TRT engines on different GPUs while not sacrificing performance for
Expand Down
4 changes: 2 additions & 2 deletions docsrc/user_guide/using_dla.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ DLA

NOTE: DLA supports fp16 and int8 precision only.

Using DLA with torchtrtc
Using DLA with `torchtrtc`

.. code-block:: shell
Expand Down Expand Up @@ -41,7 +41,7 @@ Using DLA in a python application
compile_spec = {
"inputs": [torch_tensorrt.Input(self.input.shape)],
"device": torch_tensorrt.Device("dla:0", allow_gpu_fallback=True),
"enalbed_precisions": {torch.half},
"enabled_precisions": {torch.half},
}
trt_mod = torch_tensorrt.compile(self.scripted_model, compile_spec)

0 comments on commit 653e524

Please sign in to comment.