```python
from torchvision.transforms import v2
transforms = v2.Compose([
    v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # optional, most input are already uint8 at this point
    # ...
    v2.RandomResizedCrop(size=(224, 224), antialias=True),  # Or Resize(antialias=True)
    # ...
    v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
```

```python
transforms = torch.nn.Sequential(
    CenterCrop(10),
    Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)

```

Here's the table including all the mentioned functions along with example snippets:

| Function                                                    | Description                                                                                                                                                 | Example                                                                                                                                                                                                                                                                                                                                                                                                                   |
|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Resizing**                                                                                                                                                                                                                                                                                                                                                                                                          |
| `v2.Resize(size[, interpolation, max_size, ...])`           | Resize the input to the given size.                                                                                                                        | ```python import torchvision.transforms as transforms resize = transforms.Resize((256, 256)) # Define a resize transform image = Image.open('example.jpg') # Load an image image_resized = resize(image) # Apply the resize transform to the image ```                                                                                                                                                     |
| `v2.ScaleJitter(target_size[, scale_range, ...])`          | Perform Large Scale Jitter on the input according to "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation".                   | ```python scale_jitter = transforms.ScaleJitter(target_size=512) # Define a Scale Jitter transform image_scaled_jittered = scale_jitter(image) # Apply the Scale Jitter transform to the image ```                                                                                                                                                           |
| `v2.RandomShortestSize(min_size[, max_size, ...])`         | Randomly resize the input.                                                                                                                                 | ```python random_shortest_size = transforms.RandomShortestSize(min_size=256, max_size=512) # Define a Random Shortest Size transform image_randomly_resized = random_shortest_size(image) # Apply the Random Shortest Size transform to the image ```                                                                                                                                                                      |
| `v2.RandomResize(min_size, max_size[, ...])`               | Randomly resize the input.                                                                                                                                 | ```python random_resize = transforms.RandomResize(min_size=256, max_size=512) # Define a Random Resize transform image_randomly_resized = random_resize(image) # Apply the Random Resize transform to the image ```                                                                                                                                                                                          |
| **Functionals**                                                                                                                                                                                                                                                                                                                                                                                                      |
| `v2.functional.resize(inpt, size[, ...])`                  | See Resize for details.                                                                                                                                    | ```python resized_image = v2.functional.resize(image, (256, 256)) # Resize the input image to the given size ```                                                                                                                                                                                                                                                                                                     |
| **Cropping**                                                                                                                                                                                                                                                                                                                                                                                                         |
| `v2.RandomCrop(size[, padding, ...])`                      | Crop the input at a random location.                                                                                                                       | ```python random_crop = transforms.RandomCrop(224) # Define a random crop transform image_random_cropped = random_crop(image) # Apply the random crop transform to the image ```                                                                                                                                                                                                                                         |
| `v2.RandomResizedCrop(size[, scale, ratio, ...])`          | Crop a random portion of the input and resize it to a given size.                                                                                          | ```python random_resized_crop = transforms.RandomResizedCrop(224) # Define a random resized crop transform image_random_resized_cropped = random_resized_crop(image) # Apply the random resized crop transform to the image ```                                                                                                                                                                           |
| `v2.RandomIoUCrop([min_scale, max_scale, ...])`            | Random IoU crop transformation from "SSD: Single Shot MultiBox Detector".                                                                                  | ```python random_iou_crop = transforms.RandomIoUCrop(min_scale=0.5, max_scale=0.9) # Define a random IoU crop transform image_random_iou_cropped = random_iou_crop(image) # Apply the random IoU crop transform to the image ```                                                                                                                                                                                |
| `v2.CenterCrop(size)`                                      | Crop the input at the center.                                                                                                                              | ```python center_crop = transforms.CenterCrop(224) # Define a center crop transform image_center_cropped = center_crop(image) # Apply the center crop transform to the image ```                                                                                                                                                                                                                                           |
| `v2.FiveCrop(size)`                                        | Crop the image or video into four corners and the central crop.                                                                                            | ```python five_crop = transforms.FiveCrop(224) # Define a five crop transform images_five_cropped = five_crop(image) # Apply the five crop transform to the image (returns a tuple of five images) ```                                                                                                                                                                                                                      |
| `v2.TenCrop(size[, vertical_flip])`                       | Crop the image or video into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).                   | ```python ten_crop = transforms.TenCrop(224) # Define a ten crop transform images_ten_cropped = ten_crop(image) # Apply the ten crop transform to the image (returns a tuple of ten images) ```                                                                                                                                                                                                                          |
| **Functionals**                                                                                                                                                                                                                                                                                                                                                                                                      |
| `v2.functional.crop(inpt, top, left, height, ...)`         | See RandomCrop for details.                                                                                                                                 | ```python cropped_image = v2.functional.crop(image, top=10, left=20, height=200, width=300) # Crop the input image at specified location and output size ```                                                                                                                                                                                                                                                      |
| `v2.functional.resized_crop(inpt, top, left, ...)`        | See RandomResizedCrop for details.                                                                                                                         | ```python cropped_resized_image = v2.functional.resized_crop(image, top=10, left=20, height=200, width=300, size=(256, 256)) # Crop the input image and resize it to desired size ```                                                                                                                                                                                                                                      |
| `v2.functional.ten_crop(inpt, size[, ...])`                | See TenCrop for details.                                                                                                                                   | ```python ten_cropped_images = v2.functional.ten_crop(image, size=224) # Generate ten cropped images from the given image ```                                                                                                                                                                                                                                                                                       |
| `v2.functional.center_crop(inpt, output_size)`             | See RandomCrop for details.                                                                                                                                 | ```python center_cropped_image = v2.functional.center_crop(image, output_size=224) # Crop the input image at the center ```                                                                                                                                                                                                                                                                                          |
| `v2.functional.five_crop(inpt, size)`                     | See FiveCrop for details.                                                                                                                                  | ```python five_cropped_images = v2.functional.five_crop(image, size=224) # Crop the input image into four corners and the central crop ```                                                                                                                                                                                                                                                                             |

Here's the table including the remaining functions along with example snippets:

| Function                                                  | Description                                                                                                                                                                   | Example                                                                                                                                                                                                                                                                                                                                                   |
|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Others**                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `v2.RandomHorizontalFlip([p])`                           | Horizontally flip the input with a given probability.                                                                                                                         | ```python horizontal_flip = transforms.RandomHorizontalFlip(p=0.5) # Define a random horizontal flip transform image_horizontal_flipped = horizontal_flip(image) # Apply the random horizontal flip transform to the image ```                                                                                                                                                                   |
| `v2.RandomVerticalFlip([p])`                             | Vertically flip the input with a given probability.                                                                                                                           | ```python vertical_flip = transforms.RandomVerticalFlip(p=0.5) # Define a random vertical flip transform image_vertical_flipped = vertical_flip(image) # Apply the random vertical flip transform to the image ```                                                                                                                                                                                           |
| `v2.Pad(padding[, fill, padding_mode])`                  | Pad the input on all sides with the given "pad" value.                                                                                                                        | ```python pad = transforms.Pad(padding=10, fill=0, padding_mode='constant') # Define a pad transform image_padded = pad(image) # Apply the pad transform to the image ```                                                                                                                                                                                                                                       |
| `v2.RandomZoomOut([fill, side_range, p])`                | "Zoom out" transformation from "SSD: Single Shot MultiBox Detector".                                                                                                          | ```python random_zoom_out = transforms.RandomZoomOut() # Define a random zoom out transform image_zoomed_out = random_zoom_out(image) # Apply the random zoom out transform to the image ```                                                                                                                                                                                                                      |
| `v2.RandomRotation(degrees[, interpolation, ...])`       | Rotate the input by angle.                                                                                                                                                     | ```python random_rotation = transforms.RandomRotation(degrees=45) # Define a random rotation transform image_rotated = random_rotation(image) # Apply the random rotation transform to the image ```                                                                                                                                                                                                               |
| `v2.RandomAffine(degrees[, translate, scale, ...])`      | Random affine transformation the input keeping center invariant.                                                                                                              | ```python random_affine = transforms.RandomAffine(degrees=30, translate=(0.1, 0.1), scale=(0.8, 1.2)) # Define a random affine transform image_affined = random_affine(image) # Apply the random affine transform to the image ```                                                                                                                                                                               |
| `v2.RandomPerspective([distortion_scale, p, ...])`       | Perform a random perspective transformation of the input with a given probability.                                                                                             | ```python random_perspective = transforms.RandomPerspective(distortion_scale=0.5, p=0.5) # Define a random perspective transform image_perspectived = random_perspective(image) # Apply the random perspective transform to the image ```                                                                                                                                                                       |
| `v2.ElasticTransform([alpha, sigma, ...])`              | Transform the input with elastic transformations.                                                                                                                             | ```python elastic_transform = transforms.ElasticTransform(alpha=1.5, sigma=0.07) # Define an elastic transform image_elastic_transformed = elastic_transform(image) # Apply the elastic transform to the image ```                                                                                                                                                                                           |

Here's the table including the functional transformations along with example snippets:

| Function                                                                      | Description                                                                                                                                                                                   | Example                                                                                                                                                                                                                                                                                                                                                             |
|-------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Functionals**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `v2.functional.horizontal_flip(inpt)`                                        | See RandomHorizontalFlip for details.                                                                                                                                                         | ```python horizontal_flipped_image = v2.functional.horizontal_flip(image) # Horizontally flip the input image ```                                                                                                                                                                                                                                                    |
| `v2.functional.vertical_flip(inpt)`                                          | See RandomVerticalFlip for details.                                                                                                                                                           | ```python vertical_flipped_image = v2.functional.vertical_flip(image) # Vertically flip the input image ```                                                                                                                                                                                                                                                            |
| `v2.functional.pad(inpt, padding[, fill, ...])`                              | See Pad for details.                                                                                                                                                                          | ```python padded_image = v2.functional.pad(image, padding=10, fill=0) # Pad the input image on all sides with the given "pad" value ```                                                                                                                                                                                                                         |
| `v2.functional.rotate(inpt, angle[, ...])`                                   | See RandomRotation for details.                                                                                                                                                               | ```python rotated_image = v2.functional.rotate(image, angle=45) # Rotate the input image by angle ```                                                                                                                                                                                                                                                                |
| `v2.functional.affine(inpt, angle, translate, ...)`                          | See RandomAffine for details.                                                                                                                                                                 | ```python affine_transformed_image = v2.functional.affine(image, angle=30, translate=(0.1, 0.1)) # Apply a random affine transformation to the input image ```                                                                                                                                                                                                     |
| `v2.functional.perspective(inpt, startpoints, ...)`                          | See RandomPerspective for details.                                                                                                                                                            | ```python perspective_transformed_image = v2.functional.perspective(image, startpoints=[(0, 0), (0, 224), (224, 224), (224, 0)]) # Apply a random perspective transformation to the input image ```                                                                                                                                                                    |
| `v2.functional.elastic(inpt, displacement[, ...])`                           | See ElasticTransform for details.                                                                                                                                                             | ```python elastic_transformed_image = v2.functional.elastic(image, displacement=10) # Apply elastic transformations to the input image ```                                                                                                                                                                                                                      |

Here are the color transformations along with example snippets:

| Function                                                                      | Description                                                                                                                                                                                   | Example                                                                                                                                                                                                                                                                                                                                                             |
|-------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Color**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `v2.ColorJitter([brightness, contrast, ...])`                                 | Randomly change the brightness, contrast, saturation, and hue of an image or video.                                                                                                           | ```python color_jitter = transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1) # Define a ColorJitter transform jittered_image = color_jitter(image) # Apply the ColorJitter transform to the image ```                                                                                                                                 |
| `v2.RandomChannelPermutation()`                                               | Randomly permute the channels of an image or video.                                                                                                                                          | ```python random_permutation = transforms.RandomChannelPermutation() # Define a RandomChannelPermutation transform permuted_image = random_permutation(image) # Apply the RandomChannelPermutation transform to the image ```                                                                                                                            |
| `v2.RandomPhotometricDistort([brightness, ...])`                              | Randomly distorts the image or video as used in SSD: Single Shot MultiBox Detector.                                                                                                          | ```python photometric_distort = transforms.RandomPhotometricDistort(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5) # Define a RandomPhotometricDistort transform distorted_image = photometric_distort(image) # Apply the RandomPhotometricDistort transform to the image ```                                                                  |
| `v2.Grayscale([num_output_channels])`                                         | Convert images or videos to grayscale.                                                                                                                                                      | ```python grayscale = transforms.Grayscale(num_output_channels=1) # Define a Grayscale transform grayscale_image = grayscale(image) # Convert the input image to grayscale ```                                                                                                                                                                                 |
| `v2.RandomGrayscale([p])`                                                     | Randomly convert images or videos to grayscale with a given probability.                                                                                                                     | ```python random_grayscale = transforms.RandomGrayscale(p=0.5) # Define a RandomGrayscale transform grayscaled_image = random_grayscale(image) # Randomly convert the input image to grayscale with a probability of 0.5 ```                                                                                                                                      |
| `v2.GaussianBlur(kernel_size[, sigma])`                                       | Blurs the image with randomly chosen Gaussian blur.                                                                                                                                         | ```python gaussian_blur = transforms.GaussianBlur(kernel_size=5, sigma=2.0) # Define a GaussianBlur transform blurred_image = gaussian_blur(image) # Apply the GaussianBlur transform to the image ```                                                                                                                                                     |
| `v2.RandomInvert([p])`                                                        | Inverts the colors of the given image or video with a given probability.                                                                                                                     | ```python random_invert = transforms.RandomInvert(p=0.5) # Define a RandomInvert transform inverted_image = random_invert(image) # Apply the RandomInvert transform to the image ```                                                                                                                                                                           |
| `v2.RandomPosterize(bits[, p])`                                               | Posterize the image or video with a given probability by reducing the number of bits for each color channel.                                                                                 | ```python random_posterize = transforms.RandomPosterize(bits=4, p=0.5) # Define a RandomPosterize transform posterized_image = random_posterize(image) # Apply the RandomPosterize transform to the image ```                                                                                                                                              |
| `v2.RandomSolarize(threshold[, p])`                                           | Solarize the image or video with a given probability by inverting all pixel values above a threshold.                                                                                       | ```python random_solarize = transforms.RandomSolarize(threshold=128, p=0.5) # Define a RandomSolarize transform solarized_image = random_solarize(image) # Apply the RandomSolarize transform to the image ```                                                                                                                                               |
| `v2.RandomAdjustSharpness(sharpness_factor[, p])`                             | Adjust the sharpness of the image or video with a given probability.                                                                                                                         | ```python random_adjust_sharpness = transforms.RandomAdjustSharpness(sharpness_factor=0.5, p=0.5) # Define a RandomAdjustSharpness transform sharpened_image = random_adjust_sharpness(image) # Apply the RandomAdjustSharpness transform to the image ```                                                                                                            |
| `v2.RandomAutocontrast([p])`                                                  | Autocontrast the pixels of the given image or video with a given probability.                                                                                                                 | ```python random_autocontrast = transforms.RandomAutocontrast(p=0.5) # Define a RandomAutocontrast transform autocontrasted_image = random_autocontrast(image) # Apply the RandomAutocontrast transform to the image ```                                                                                                                                |
| `v2.RandomEqualize([p])`                                                      | Equalize the histogram of the given image or video with a given probability.                                                                                                                  | ```python random_equalize = transforms.RandomEqualize(p=0.5) # Define a RandomEqualize transform equalized_image = random_equalize(image) # Apply the RandomEqualize transform to the image ```                                                                                                                                                              |

Here are the functional transformations along with example snippets:

| Function                                                                    | Description                                                                                                                                                                           | Example                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|-----------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Functional Transforms**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `v2.functional.permute_channels(inpt, permutation)`                        | Permute the channels of the input according to the given permutation.                                                                                                                 | ```python import torch from torchvision.transforms import functional as F input_tensor = torch.rand(3, 224, 224) permuted_tensor = F.permute_channels(input_tensor, [2, 1, 0]) # Permute the channels of the input tensor according to the permutation [2, 1, 0] ```                                                                                                                                                                          |
| `v2.functional.rgb_to_grayscale(inpt[, ...])`                              | Convert RGB image to grayscale version of image.                                                                                                                                       | ```python grayscale_tensor = F.rgb_to_grayscale(input_tensor) # Convert the input RGB image tensor to grayscale ```                                                                                                                                                                                                                                                                                                                           |
| `v2.functional.to_grayscale(inpt[, ...])`                                  | Convert PIL image of any mode (RGB, HSV, LAB, etc) to grayscale version of image.                                                                                                     | ```python grayscale_pil_image = F.to_grayscale(input_pil_image) # Convert the input PIL image to grayscale ```                                                                                                                                                                                                                                                                                                                              |
| `v2.functional.gaussian_blur(inpt, kernel_size)`                           | Performs Gaussian blurring on the image by the given kernel.                                                                                                                           | ```python blurred_tensor = F.gaussian_blur(input_tensor, kernel_size=5) # Apply Gaussian blur with kernel size 5 to the input tensor ```                                                                                                                                                                                                                                                                                                      |
| `v2.functional.invert(inpt)`                                               | Inverts the colors of the given image or video.                                                                                                                                        | ```python inverted_tensor = F.invert(input_tensor) # Invert the colors of the input tensor ```                                                                                                                                                                                                                                                                                                                                                   |
| `v2.functional.posterize(inpt, bits)`                                      | Posterize the image or video with a given number of bits.                                                                                                                              | ```python posterized_tensor = F.posterize(input_tensor, bits=4) # Posterize the input tensor with 4 bits ```                                                                                                                                                                                                                                                                                                                                      |
| `v2.functional.solarize(inpt, threshold)`                                  | Solarize an RGB/grayscale image by inverting all pixel values above a threshold.                                                                                                       | ```python solarized_tensor = F.solarize(input_tensor, threshold=0.5) # Solarize the input tensor with a threshold of 0.5 ```                                                                                                                                                                                                                                                                                                                    |
| `v2.functional.adjust_sharpness(inpt, ...)`                                | Adjust the sharpness of the image or video.                                                                                                                                           | ```python sharpened_tensor = F.adjust_sharpness(input_tensor, sharpness_factor=0.5) # Adjust the sharpness of the input tensor with a factor of 0.5 ```                                                                                                                                                                                                                                                                                     |
| `v2.functional.autocontrast(inpt)`                                          | Autocontrast the pixels of the given image or video.                                                                                                                                   | ```python autocontrasted_tensor = F.autocontrast(input_tensor) # Autocontrast the input tensor ```                                                                                                                                                                                                                                                                                                                                             |
| `v2.functional.adjust_contrast(inpt, ...)`                                 | Adjust contrast of an image.                                                                                                                                                          | ```python contrast_adjusted_tensor = F.adjust_contrast(input_tensor, contrast_factor=0.5) # Adjust the contrast of the input tensor with a factor of 0.5 ```                                                                                                                                                                                                                                                                                  |
| `v2.functional.equalize(inpt)`                                             | Equalize the histogram of the given image or video.                                                                                                                                   | ```python equalized_tensor = F.equalize(input_tensor) # Equalize the histogram of the input tensor ```                                                                                                                                                                                                                                                                                                                                        |
| `v2.functional.adjust_brightness(inpt, ...)`                               | Adjust brightness of an image.                                                                                                                                                       | ```python brightness_adjusted_tensor = F.adjust_brightness(input_tensor, brightness_factor=0.5) # Adjust the brightness of the input tensor with a factor of 0.5 ```                                                                                                                                                                                                                                                                        |
| `v2.functional.adjust_saturation(inpt, ...)`                                | Adjust color saturation of an image.                                                                                                                                                  | ```python saturation_adjusted_tensor = F.adjust_saturation(input_tensor, saturation_factor=0.5) # Adjust the saturation of the input tensor with a factor of 0.5 ```                                                                                                                                                                                                                                                                         |
| `v2.functional.adjust_hue(inpt, hue_factor)`                               | Adjust hue of an image.                                                                                                                                                               | ```python hue_adjusted_tensor = F.adjust_hue(input_tensor, hue_factor=0.5) # Adjust the hue of the input tensor with a factor of 0.5 ```                                                                                                                                                                                                                                                                                                        |
| `v2.functional.adjust_gamma(inpt, gamma[, gain])`                          | Adjust gamma of an image.                                                                                                                                                             | ```python gamma_adjusted_tensor = F.adjust_gamma(input_tensor, gamma=0.5) # Adjust the gamma of the input tensor with a value of 0.5 ```                                                                                                                                                                                                                                                                                                        |

Here are the remaining transformations along with example snippets:

| Function                                                                                              | Description                                                                                                                                                                         | Example                                                                                                                                                                                                                                                                                                                                                                                                                            |
|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Composition**                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `v2.Compose(transforms)`                                                                               | Composes several transforms together.                                                                                                                                               | ```python from torchvision import transforms v2_transforms = transforms.Compose([ transforms.Resize(256), transforms.RandomCrop(224), transforms.RandomHorizontalFlip(), ]) # Compose several transforms together ```                                                                                                                                                                                                      |
| `v2.RandomApply(transforms[, p])`                                                                     | Apply randomly a list of transformations with a given probability.                                                                                                                 | ```python randomly_applied_transforms = transforms.RandomApply([transforms.RandomRotation(10), transforms.RandomResizedCrop(224)], p=0.5) # Randomly apply a list of transformations with a probability of 0.5 ```                                                                                                                                                                                                              |
| `v2.RandomChoice(transforms[, p])`                                                                    | Apply single transformation randomly picked from a list.                                                                                                                           | ```python randomly_chosen_transform = transforms.RandomChoice([transforms.RandomHorizontalFlip(), transforms.RandomVerticalFlip()]) # Randomly pick and apply one transformation from the list ```                                                                                                                                                                                                                              |
| `v2.RandomOrder(transforms)`                                                                           | Apply a list of transformations in a random order.                                                                                                                                 | ```python randomly_ordered_transforms = transforms.RandomOrder([transforms.RandomRotation(10), transforms.RandomResizedCrop(224)]) # Apply a list of transformations in a random order ```                                                                                                                                                                                                                                           |
| **Miscellaneous**                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `v2.LinearTransformation(...)`                                                                        | Transform a tensor image or video with a square transformation matrix and a mean_vector computed offline.                                                                         | ```python linear_transformed_tensor = v2.LinearTransformation(transformation_matrix, mean_vector)(input_tensor) # Transform the input tensor using a linear transformation ```                                                                                                                                                                                                                                                    |
| `v2.Normalize(mean, std[, inplace])`                                                                  | Normalize a tensor image or video with mean and standard deviation.                                                                                                                 | ```python normalized_tensor = v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])(input_tensor) # Normalize the input tensor with the given mean and standard deviation ```                                                                                                                                                                                                                                         |
| `v2.RandomErasing([p, scale, ratio, value, ...])`                                                      | Randomly select a rectangle region in the input image or video and erase its pixels.                                                                                                | ```python randomly_erased_tensor = v2.RandomErasing(p=0.5)(input_tensor) # Randomly erase pixels from the input tensor with a probability of 0.5 ```                                                                                                                                                                                                                                                                              |
| `v2.Lambda(lambd, *types)`                                                                            | Apply a user-defined function as a transform.                                                                                                                                      | ```python import torchvision.transforms.functional as F lambda_transformed_tensor = v2.Lambda(lambda x: F.adjust_brightness(x, 0.5), types=[torch.Tensor])(input_tensor) # Apply a user-defined function to adjust brightness as a transform ```                                                                                                                                                                                    |
| **Functionals**                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `v2.functional.normalize(inpt, mean, std[, ...])`                                                      | Normalize the input tensor with mean and standard deviation.                                                                                                                       | ```python normalized_tensor = v2.functional.normalize(input_tensor, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # Normalize the input tensor with the given mean and standard deviation ```                                                                                                                                                                                                                               |
| `v2.functional.erase(inpt, i, j, h, w, v[, ...])`                                                      | Erase a rectangle region in the input image or video.                                                                                                                               | ```python erased_tensor = v2.functional.erase(input_tensor, i=0, j=0, h=100, w=100, v=0) # Erase a rectangle region in the input tensor ```                                                                                                                                                                                                                                                                                     |
| `v2.functional.clamp_bounding_boxes(inpt[, ...])`                                                      | Clamp bounding boxes to their corresponding image dimensions.                                                                                                                      | ```python clamped_boxes_tensor = v2.functional.clamp_bounding_boxes(input_tensor) # Clamp bounding boxes in the input tensor ```                                                                                                                                                                                                                                                                                                 |
| `v2.functional.uniform_temporal_subsample(...)`                                                        | Uniformly subsample num_samples indices from the temporal dimension of the video.                                                                                                   | ```python subsampled_tensor = v2.functional.uniform_temporal_subsample(input_tensor, num_samples=16) # Uniformly subsample 16 indices from the temporal dimension of the video tensor ```                                                                                                                                                                                                                                       |
| **Conversion**                                                                                                                                                                                                                                                                                                                                                                                                                                |
| `v2.ToImage()`                                                                                        | Convert a tensor, ndarray, or PIL Image to Image ; this does not scale values.                                                                                                     | ```python image = v2.ToImage()(input_tensor) # Convert the input tensor to an Image object without scaling values ```                                                                                                                                                                                                                                                                                                           |
| `v2.PILToTensor()`                                                                                   | Convert a PIL Image to a tensor of the same type - this does not scale values.                                                                                                     | ```python tensor = v2.PILToTensor()(pil_image) # Convert the input PIL Image to a tensor of the same type without scaling values ```                                                                                                                                                                                                                                                                                              |
| `v2.ToPILImage([mode])`                                                                              | Convert a tensor or an ndarray to PIL Image.                                                                                                                                       | ```python pil_image = v2.ToPILImage()(tensor) # Convert the input tensor to a PIL Image ```                                                                                                                                                                                                                                                                                                                                           |
| `v2.ToDtype(dtype[, scale])`                                                                         | Convert the input to a specific dtype, optionally scaling the values for images or videos.                                                                                       | ```python dtype_tensor = v2.ToDtype(torch.float32, scale=True)(input_tensor) # Convert the input tensor to float32 and scale the values ```                                                                                                                                                                                                                                                                                         |
| `v2.ConvertBoundingBoxFormat(format)`                                                                | Convert bounding box coordinates to the given format.                                                                                                                             | ```python converted_boxes_tensor = v2.ConvertBoundingBoxFormat("XYXY")(input_tensor) # Convert bounding box coordinates in the input tensor to the XYXY format ```                                                                                                                                                                                                                                                                  |
| **Auto-Augmentation**                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `v2.AutoAugment([policy, interpolation, fill])`                                                       | AutoAugment data augmentation method based on "AutoAugment: Learning Augmentation Strategies from Data".                                                                            | ```python autoaugmented_image = v2.AutoAugment(policy='v1')(input_image) # Apply AutoAugment to the input image using the ImageNet policy ```                                                                                                                                                                                                                                                                                         |
| `v2.RandAugment([num_ops, magnitude, ...])`                                                          | RandAugment data augmentation method based on "RandAugment: Practical automated data augmentation with a reduced search space".                                                      | ```python randaugmented_image = v2.RandAugment(num_ops=2, magnitude=10)(input_image) # Apply RandAugment to the input image with 2 operations and magnitude 10 ```                                                                                                                                                                                                                                                                     |
| `v2.TrivialAugmentWide([num_magnitude_bins, ...])`                                                   | Dataset-independent data-augmentation with TrivialAugment Wide, as described in "TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation".                                 | ```python augmented_image= v2.TrivialAugmentWide(num_magnitude_bins=20)(input_image) # Apply TrivialAugment Wide to the input image with 20 magnitude bins ```                                                                                                                                                                                                                                  |
| `v2.AugMix([severity, mixture_width, ...])`                                                           | AugMix data augmentation method based on "AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty".                                                          | ```python augmixed_image = v2.AugMix(severity=3, mixture_width=3)(input_image) # Apply AugMix to the input image with severity 3 and mixture width 3 ```                                                                                                                                                                                                                                                                              |


| Function                                     | Description                                                                                           | Example                                                                                                                                   |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| v2.CutMix(\*[, alpha, labels_getter])       | Apply CutMix to the provided batch of images and labels.                                               | ```python v2.CutMix(alpha=0.5)```                                                                                                       |
| v2.MixUp(\*[, alpha, labels_getter])        | Apply MixUp to the provided batch of images and labels.                                                | ```python v2.MixUp(alpha=0.2)```                                                                                                        |
| Developer tools                             |                                                                                                       |                                                                                                                                           |
| v2.functional.register_kernel(functional, ...) | Decorate a kernel to register it for a functional and a (custom) tv_tensor type.                    | ```python v2.functional.register_kernel(functional, ...)```                                                                              |
| V1 API Reference                            |                                                                                                       |                                                                                                                                           |
| Geometry                                     |                                                                                                       |                                                                                                                                           |
| Resize(size\[, interpolation, max_size, ...]) | Resize the input image to the given size.                                                            | ```python Resize((256, 256))```                                                                                                          |
| RandomCrop(size\[, padding, pad_if_needed, ...]) | Crop the given image at a random location.                                                          | ```python RandomCrop((224, 224))```                                                                                                      |
| RandomResizedCrop(size\[, scale, ratio, ...]) | Crop a random portion of image and resize it to a given size.                                         | ```python RandomResizedCrop((224, 224))```                                                                                               |
| CenterCrop(size)                            | Crops the given image at the center.                                                                 | ```python CenterCrop((224, 224))```                                                                                                       |
| FiveCrop(size)                              | Crop the given image into four corners and the central crop.                                          | ```python FiveCrop((224, 224))```                                                                                                         |
| TenCrop(size\[, vertical_flip])             | Crop the given image into four corners and the central crop plus the flipped version of these.        | ```python TenCrop((224, 224))```                                                                                                          |
| Pad(padding\[, fill, padding_mode])         | Pad the given image on all sides with the given "pad" value.                                          | ```python Pad(10)```                                                                                                                      |
| RandomRotation(degrees\[, interpolation, ...]) | Rotate the image by angle.                                                                         | ```python RandomRotation(45)```                                                                                                           |
| RandomAffine(degrees\[, translate, scale, ...]) | Random affine transformation of the image keeping center invariant.                                   | ```python RandomAffine(30, translate=(0.1, 0.1), scale=(0.8, 1.2))```                                                                     |
| RandomPerspective(\[distortion_scale, p, ...]) | Performs a random perspective transformation of the given image with a given probability.            | ```python RandomPerspective(distortion_scale=0.5, p=0.5)```                                                                               |
| ElasticTransform(\[alpha, sigma, ...])       | Transform a tensor image with elastic transformations.                                                | ```python ElasticTransform(alpha=120, sigma=10)```                                                                                       |
| RandomHorizontalFlip(\[p])                  | Horizontally flip the given image randomly with a given probability.                                  | ```python RandomHorizontalFlip(p=0.5)```                                                                                                 |
| RandomVerticalFlip(\[p])                    | Vertically flip the given image randomly with a given probability.                                    | ```python RandomVerticalFlip(p=0.5)```                                                                                                   |



| Function                                         | Description                                                                                                             | Example                                                                                              |
|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| ColorJitter(\[brightness, contrast, ...])        | Randomly change the brightness, contrast, saturation, and hue of an image.                                             | ```python ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.1)```                      |
| Grayscale(\[num_output_channels])                | Convert image to grayscale.                                                                                            | ```python Grayscale()```                                                                             |
| RandomGrayscale(\[p])                            | Randomly convert image to grayscale with a probability of p (default 0.1).                                              | ```python RandomGrayscale(p=0.3)```                                                                  |
| GaussianBlur(kernel_size\[, sigma])              | Blurs the image with a randomly chosen Gaussian blur.                                                                  | ```python GaussianBlur(kernel_size=(3, 3), sigma=(0.1, 2.0))```                                       |
| RandomInvert(\[p])                               | Inverts the colors of the given image randomly with a given probability.                                                | ```python RandomInvert(p=0.2)```                                                                     |
| RandomPosterize(bits\[, p])                      | Posterize the image randomly with a given probability by reducing the number of bits for each color channel.           | ```python RandomPosterize(bits=4, p=0.5)```                                                          |
| RandomSolarize(threshold\[, p])                  | Solarize the image randomly with a given probability by inverting all pixel values above a threshold.                 | ```python RandomSolarize(threshold=128, p=0.3)```                                                    |
| RandomAdjustSharpness(sharpness_factor\[, p])    | Adjust the sharpness of the image randomly with a given probability.                                                    | ```python RandomAdjustSharpness(sharpness_factor=2.0, p=0.5)```                                       |
| RandomAutocontrast(\[p])                         | Autocontrast the pixels of the given image randomly with a given probability.                                           | ```python RandomAutocontrast(p=0.4)```                                                                |
| RandomEqualize(\[p])                             | Equalize the histogram of the given image randomly with a given probability.                                            | ```python RandomEqualize(p=0.3)```                                                                   |




| Function                                         | Description                                                                                                             | Example                                                                                              |
|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| Compose(transforms)                             | Composes several transforms together.                                                                                  | ```python Compose([RandomHorizontalFlip(), RandomRotation(10)])```                                    |
| RandomApply(transforms\[, p])                   | Apply randomly a list of transformations with a given probability.                                                     | ```python RandomApply([ColorJitter(), RandomRotation(20)], p=0.5)```                                  |
| RandomChoice(transforms\[, p])                  | Apply a single transformation randomly picked from a list.                                                              | ```python RandomChoice([RandomGrayscale(), RandomPosterize(bits=4)])```                               |
| RandomOrder(transforms)                         | Apply a list of transformations in a random order.                                                                     | ```python RandomOrder([RandomHorizontalFlip(), RandomRotation(10)])```                                  |
| LinearTransformation(transformation_matrix, ...) | Transform a tensor image with a square transformation matrix and a mean_vector computed offline.                     | ```python LinearTransformation(transformation_matrix, mean_vector)```                                 |
| Normalize(mean, std\[, inplace])                | Normalize a tensor image with mean and standard deviation.                                                             | ```python Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])```                         |
| RandomErasing(\[p, scale, ratio, value, inplace]) | Randomly selects a rectangle region in a torch.Tensor image and erases its pixels.                                    | ```python RandomErasing(p=0.5, scale=(0.02, 0.4), ratio=(0.3, 3.3), value='random')```                |
| Lambda(lambd)                                   | Apply a user-defined lambda as a transform.                                                                            | ```python Lambda(lambda x: x + 10)```                                                                |





| Function                                     | Description                                                                                                                               | Example                                                                                                                |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| ToPILImage(\[mode])                         | Convert a tensor or an ndarray to a PIL Image.                                                                                            | ```python ToPILImage()```                                                                                            |
| ToTensor()                                  | Convert a PIL Image or ndarray to a tensor and scale the values accordingly.                                                              | ```python ToTensor()```                                                                                               |
| PILToTensor()                               | Convert a PIL Image to a tensor of the same type - this does not scale values.                                                            | ```python PILToTensor()```                                                                                            |
| ConvertImageDtype(dtype)                   | Convert a tensor image to the given dtype and scale the values accordingly.                                                                | ```python ConvertImageDtype(torch.float32)```                                                                        |
| AutoAugmentPolicy(value)                   | AutoAugment policies learned on different datasets.                                                                                       | ```python AutoAugmentPolicy('imagenet')```                                                                           |
| AutoAugment(\[policy, interpolation, fill]) | AutoAugment data augmentation method based on "AutoAugment: Learning Augmentation Strategies from Data".                                  | ```python AutoAugment(policy='cifar10')```                                                                           |
| RandAugment(\[num_ops, magnitude, ...])    | RandAugment data augmentation method based on "RandAugment: Practical automated data augmentation with a reduced search space".            | ```python RandAugment(num_ops=2, magnitude=5)```                                                                     |
| TrivialAugmentWide(\[num_magnitude_bins, ...]) | Dataset-independent data-augmentation with TrivialAugment Wide, as described in "TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation". | ```python TrivialAugmentWide(num_magnitude_bins=30)```                                                              |
| AugMix(\[severity, mixture_width, ...])    | AugMix data augmentation method based on "AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty".                 | ```python AugMix(severity=3, mixture_width=3, alpha=1.0, decay=0.5, prob_coeff=0.5, augmentation_chain=None)```          |




| Function                                     | Description                                                                                                                               | Example                                                                                              |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| adjust_brightness(img, brightness_factor)   | Adjust the brightness of an image.                                                                                                        | ```python adjust_brightness(image, 0.5)```                                                            |
| adjust_contrast(img, contrast_factor)       | Adjust the contrast of an image.                                                                                                          | ```python adjust_contrast(image, 2.0)```                                                               |
| adjust_gamma(img, gamma[, gain])            | Perform gamma correction on an image.                                                                                                     | ```python adjust_gamma(image, gamma=1.5)```                                                           |
| adjust_hue(img, hue_factor)                 | Adjust the hue of an image.                                                                                                               | ```python adjust_hue(image, hue_factor=0.2)```                                                        |
| adjust_saturation(img, saturation_factor)   | Adjust the color saturation of an image.                                                                                                  | ```python adjust_saturation(image, saturation_factor=1.5)```                                           |
| adjust_sharpness(img, sharpness_factor)     | Adjust the sharpness of an image.                                                                                                         | ```python adjust_sharpness(image, sharpness_factor=2.0)```                                             |
| affine(img, angle, translate, scale, shear) | Apply an affine transformation on the image keeping the image center invariant.                                                            | ```python affine(image, angle=30, translate=(10, 10), scale=0.5, shear=0.5)```                          |
| autocontrast(img)                           | Maximize the contrast of an image by remapping its pixels per channel.                                                                    | ```python autocontrast(image)```                                                                       |
| center_crop(img, output_size)               | Crop the given image at the center.                                                                                                       | ```python center_crop(image, output_size=(100, 100))```                                                |
| convert_image_dtype(image[, dtype])         | Convert a tensor image to the given dtype and scale the values accordingly. This function does not support PIL Image.                      | ```python convert_image_dtype(image, dtype=torch.float32)```                                           |
| crop(img, top, left, height, width)         | Crop the given image at the specified location and output size.                                                                            | ```python crop(image, top=10, left=20, height=100, width=100)```                                        |
| equalize(img)                               | Equalize the histogram of an image.                                                                                                       | ```python equalize(image)```                                                                           |
| erase(img, i, j, h, w, v[, inplace])        | Erase the input Tensor Image with the given value.                                                                                        | ```python erase(image, i=10, j=10, h=50, w=50, v=0, inplace=True)```                                   |
| five_crop(img, size)                        | Crop the given image into four corners and the central crop.                                                                               | ```python five_crop(image, size=(100, 100))```                                                          |
| gaussian_blur(img, kernel_size[, sigma])    | Performs Gaussian blurring on the image by the given kernel.                                                                              | ```python gaussian_blur(image, kernel_size=(5, 5), sigma=(0.1, 2.0))```                                 |
| get_dimensions(img)                        | Returns the dimensions of an image as [channels, height, width].                                                                           | ```python get_dimensions(image)```                                                                     |
| get_image_num_channels(img)                | Returns the number of channels of an image.                                                                                               | ```python get_image_num_channels(image)```                                                             |
| get_image_size(img)                        | Returns the size of an image as [width, height].                                                                                          | ```python get_image_size(image)```                                                                     |
| hflip(img)                                 | Horizontally flip the given image.                                                                                                        | ```python hflip(image)```                                                                              |
| invert(img)                                | Invert the colors of an RGB/grayscale image.                                                                                              | ```python invert(image)```                                                                             |
| normalize(tensor, mean, std[, inplace])    | Normalize a float tensor image with mean and standard deviation.                                                                           | ```python normalize(tensor_image, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])```                         |
| pad(img, padding[, fill, padding_mode])    | Pad the given image on all sides with the given "pad" value.                                                                              | ```python pad(image, padding=10, fill=0, padding_mode='constant')```                                    |
| perspective(img, startpoints, endpoints[, ...]) | Perform a perspective transform of the given image.                                                                                      | ```python perspective(image, startpoints, endpoints)```                                                 |
| pil_to_tensor(pic)                         | Convert a PIL Image to a tensor of the same type.                                                                                         | ```python pil_to_tensor(pil_image)```                                                                  |
| posterize(img, bits)                        | Posterize an image by reducing the number of bits for each color channel.                                                                  | ```python posterize(image, bits=4)```                                                                  |
| resize(img, size[, interpolation, max_size, ...]) | Resize the input image to the given size.                                                                                                | ```python resize(image, size=(100, 100), interpolation='bicubic', max_size=200)```                       |
| resized_crop(img, top, left, height, width, size) | Crop the given image and resize it to the desired size.                                                                                  | ```python resized_crop(image, top=10, left=10, height=100, width=100, size=(50, 50))```                 |
| rgb_to_grayscale(img[, num_output_channels]) | Convert an RGB image to the grayscale version of the image.                                                                               | ```python rgb_to_grayscale(image)```                                                                   |
| rotate(img, angle[, interpolation, expand, ...]) | Rotate the image by the given angle.                                                                                                    | ```python rotate(image, angle=30, interpolation='bicubic', expand=True)```                               |
| solarize(img, threshold)                    | Solarize an RGB/grayscale image by inverting all pixel values above a threshold.                                                           | ```python solarize(image, threshold=128)```                                                             |
| ten_crop(img, size[, vertical_flip])       | Generate ten cropped images from the given image.                                                                                         | ```python ten_crop(image, size=(100, 100), vertical_flip=True)```                                       |
| to_grayscale(img[, num_output_channels])   | Convert a PIL image of any mode (RGB, HSV, LAB, etc.) to the grayscale version of the image.                                              | ```python to_grayscale(image)```                                                                        |
| to_pil_image(pic[, mode])                  | Convert a tensor or an ndarray to a PIL Image.                                                                                            | ```python to_pil_image(tensor_image)```                                                                |
| to_tensor(pic)                             | Convert a PIL Image or numpy.ndarray to a tensor.                                                                                         | ```python to_tensor(pil_image)```                                                                      |
| vflip(img)                                 | Vertically flip the given image.                                                                                                          | ```python vflip(image)```                                                                               |



-----
## TVTENSOR



| Class               | Description                                                                                                                                                                                                                                                   | Example Snippet                                                                                                                                                                      |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Image               | TVTensor subclass for representing images.                                                                                                                                                                                                                   | ```python image = Image(data, dtype=torch.float32, device='cuda')```                                                                                                                |
| Video               | TVTensor subclass for video data.                                                                                                                                                                                                                            | ```python video = Video(data, dtype=torch.float32, device='cuda')```                                                                                                                |
| BoundingBoxes       | TVTensor subclass for bounding box data, with options for specifying coordinate format and canvas size.                                                                                                                                                      | ```python boxes = BoundingBoxes(data, format=BoundingBoxFormat.COCO, canvas_size=(800, 600))```                                                                                     |
| Mask                | TVTensor subclass for segmentation and detection masks.                                                                                                                                                                                                      | ```python mask = Mask(data, dtype=torch.float32, device='cuda')```                                                                                                                  |
| TVTensor            | Base class for all TVTensors, providing common functionality and attributes.                                                                                                                                                                                 | N/A                                                                                                                                                                                  |
| set_return_type     | Method to set the return type of torch operations on TVTensors, ensuring consistent behavior.                                                                                                                                                                | N/A                                                                                                                                                                                  |
| wrap                | Function to convert a torch.Tensor into the same TVTensor subclass as another tensor, specified by the 'like' parameter.                                                                                                                                     | ```python wrapped_tensor = wrap(tensor, like=image)```                                                                                                                              |

These examples demonstrate the creation of TVTensors for various types of data, such as images, videos, bounding boxes, and masks, along with utilizing the `wrap` function to convert a regular torch.Tensor into a TVTensor.

```python
from torchvision.models import resnet50, ResNet50_Weights

# Old weights with accuracy 76.130%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)

# New weights with accuracy 80.858%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

# Best available weights (currently alias for IMAGENET1K_V2)
# Note that these weights may change across versions
resnet50(weights=ResNet50_Weights.DEFAULT)

# Strings are also supported
resnet50(weights="IMAGENET1K_V2")

# No weights - random initialization
resnet50(weights=None)
```

```python
from torchvision.models import resnet50, ResNet50_Weights

# Using pretrained weights:
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
resnet50(weights="IMAGENET1K_V1")
resnet50(pretrained=True)  # deprecated
resnet50(True)  # deprecated

# Using no weights:
resnet50(weights=None)
resnet50()
resnet50(pretrained=False)  # deprecated
resnet50(False)  # deprecated
```

```python
# Initialize the Weight Transforms
weights = ResNet50_Weights.DEFAULT
preprocess = weights.transforms()

# Apply it to the input image
img_transformed = preprocess(img)
```

```python
# Initialize model
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)

# Set model to eval mode
model.eval()
```

```python
# List available models
all_models = list_models()
classification_models = list_models(module=torchvision.models)

# Initialize models
m1 = get_model("mobilenet_v3_large", weights=None)
m2 = get_model("quantized_mobilenet_v3_large", weights="DEFAULT")

# Fetch weights
weights = get_weight("MobileNet_V3_Large_QuantizedWeights.DEFAULT")
assert weights == MobileNet_V3_Large_QuantizedWeights.DEFAULT

weights_enum = get_model_weights("quantized_mobilenet_v3_large")
assert weights_enum == MobileNet_V3_Large_QuantizedWeights

weights_enum2 = get_model_weights(torchvision.models.quantization.mobilenet_v3_large)
assert weights_enum == weights_enum2
```



| Function                                | Description                                                                                                                                    | Example Snippet                                                                                                 |
|-----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| get_model(name, **config)              | Gets the model name and configuration and returns an instantiated model.                                                                       | ```python model = get_model("resnet", num_classes=1000, pretrained=True)```                                    |
| get_model_weights(name)                | Returns the weights enum class associated with the given model.                                                                                 | ```python weights_enum = get_model_weights("resnet")```                                                        |
| get_weight(name)                       | Gets the weights enum value by its full name.                                                                                                  | ```python weight = get_weight("resnet18_pretrained")```                                                        |
| list_models([module, include, exclude]) | Returns a list with the names of registered models.                                                                                             | ```python model_list = list_models(include="resnet*")```                                                       |

These functions provide utilities for managing models, including instantiating a model with specified configurations, obtaining weights enums, retrieving weights by name, and listing registered models.

```python
import torch

# Option 1: passing weights param as string
model = torch.hub.load("pytorch/vision", "resnet50", weights="IMAGENET1K_V2")

# Option 2: passing weights param as enum
weights = torch.hub.load("pytorch/vision", "get_weight", weights="ResNet50_Weights.IMAGENET1K_V2")
model = torch.hub.load("pytorch/vision", "resnet50", weights=weights)
```

### classification image

1. AlexNet
2. ConvNeXt
3. DenseNet
4. EfficientNet
5. EfficientNetV2
6. GoogLeNet
7. Inception V3
8. MaxVit
9. MNASNet
10. MobileNet V2
11. MobileNet V3
12. RegNet
13. ResNet
14. ResNeXt
15. ShuffleNet V2
16. SqueezeNet
17. SwinTransformer
18. VGG
19. VisionTransformer
20. Wide ResNet

You can use these models for various classification tasks, either with pre-trained weights or train them from scratch.

```python
from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights

img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")

# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}%")

```


| Weight                                          | Acc@1    | Acc@5    | Params   | GFLOPS  | Recipe |
|-------------------------------------------------|----------|----------|----------|---------|--------|
| AlexNet_Weights.IMAGENET1K_V1                  | 56.522   | 79.066   | 61.1M    | 0.71    | link   |
| ConvNeXt_Base_Weights.IMAGENET1K_V1            | 84.062   | 96.87    | 88.6M    | 15.36   | link   |
| ConvNeXt_Large_Weights.IMAGENET1K_V1           | 84.414   | 96.976   | 197.8M   | 34.36   | link   |
| ConvNeXt_Small_Weights.IMAGENET1K_V1           | 83.616   | 96.65    | 50.2M    | 8.68    | link   |
| ConvNeXt_Tiny_Weights.IMAGENET1K_V1            | 82.52    | 96.146   | 28.6M    | 4.46    | link   |
| DenseNet121_Weights.IMAGENET1K_V1              | 74.434   | 91.972   | 8.0M     | 2.83    | link   |
| DenseNet161_Weights.IMAGENET1K_V1              | 77.138   | 93.56    | 28.7M    | 7.73    | link   |
| DenseNet169_Weights.IMAGENET1K_V1              | 75.6     | 92.806   | 14.1M    | 3.36    | link   |
| DenseNet201_Weights.IMAGENET1K_V1              | 76.896   | 93.37    | 20.0M    | 4.29    | link   |
| EfficientNet_B0_Weights.IMAGENET1K_V1          | 77.692   | 93.532   | 5.3M     | 0.39    | link   |
| EfficientNet_B1_Weights.IMAGENET1K_V1          | 78.642   | 94.186   | 7.8M     | 0.69    | link   |
| EfficientNet_B1_Weights.IMAGENET1K_V2          | 79.838   | 94.934   | 7.8M     | 0.69    | link   |
| EfficientNet_B2_Weights.IMAGENET1K_V1          | 80.608   | 95.31    | 9.1M     | 1.09    | link   |
| EfficientNet_B3_Weights.IMAGENET1K_V1          | 82.008   | 96.054   | 12.2M    | 1.83    | link   |
| EfficientNet_B4_Weights.IMAGENET1K_V1          | 83.384   | 96.594   | 19.3M    | 4.39    | link   |
| EfficientNet_B5_Weights.IMAGENET1K_V1          | 83.444   | 96.628   | 30.4M    | 10.27   | link   |
| EfficientNet_B6_Weights.IMAGENET1K_V1          | 84.008   | 96.916   | 43.0M    | 19.07   | link   |
| EfficientNet_B7_Weights.IMAGENET1K_V1          | 84.122   | 96.908   | 66.3M    | 37.75   | link   |
| EfficientNet_V2_L_Weights.IMAGENET1K_V1       | 85.808   | 97.788   | 118.5M   | 56.08   | link   |
| EfficientNet_V2_M_Weights.IMAGENET1K_V1       | 85.112   | 97.156   | 54.1M    | 24.58   | link   |
| EfficientNet_V2_S_Weights.IMAGENET1K_V1       | 84.228   | 96.878   | 21.5M    | 8.37    | link   |
| GoogLeNet_Weights.IMAGENET1K_V1               | 69.778   | 89.53    | 6.6M     | 1.5     | link   |
| Inception_V3_Weights.IMAGENET1K_V1            | 77.294   | 93.45    | 27.2M    | 5.71    | link   |
| MNASNet0_5_Weights.IMAGENET1K_V1              | 67.734   | 87.49    | 2.2M     | 0.1     | link   |
| MNASNet0_75_Weights.IMAGENET1K_V1             | 71.18    | 90.496   | 3.2M     | 0.21    | link   |
| MNASNet1_0_Weights.IMAGENET1K_V1              | 73.456   | 91.51    | 4.4M     | 0.31    | link   |
| MNASNet1_3_Weights.IMAGENET1K_V1              | 76.506   | 93.522   | 6.3M     | 0.53    | link   |
| MaxVit_T_Weights.IMAGENET1K_V1                | 83.7     | 96.722   | 30.9M    | 5.56    | link   |
| MobileNet_V2_Weights.IMAGENET1K_V1            | 71.878   | 90.286   | 3.5M     | 0.3     | link   |
| MobileNet_V2_Weights.IMAGENET1K_V2            | 72.154   | 90.822   | 3.5M     | 0.3     | link   |
| MobileNet_V3_Large_Weights.IMAGENET1K_V1     | 74.042   | 91.34    | 5.5M     | 0.22    | link   |
| MobileNet_V3_Large_Weights.IMAGENET1K_V2     | 75.274   | 92.566   | 5.5M     | 0.22    | link   |
| MobileNet_V3_Small_Weights.IMAGENET1K_V1     | 67.668  | 87.556   | 2.9M     | 0.07    | link   |
| MobileNet_V3_Small_Weights.IMAGENET1K_V2     | 68.602   | 88.244   | 2.9M     | 0.07    | link   |
| NFNet_F0_Weights.IMAGENET1K_V1               | 83.308   | 96.46    | 88.8M    | 15.88   | link   |
| NFNet_F1_Weights.IMAGENET1K_V1               | 84.14    | 96.748   | 94.2M    | 16.83   | link   |
| NFNet_F2_Weights.IMAGENET1K_V1               | 84.92    | 97.266   | 119.4M   | 21.38   | link   |
| NFNet_F3_Weights.IMAGENET1K_V1               | 85.436   | 97.49    | 166.5M   | 29.9    | link   |
| NFNet_F4_Weights.IMAGENET1K_V1               | 85.65    | 97.592   | 218.6M   | 39.29   | link   |
| NFNet_F5_Weights.IMAGENET1K_V1               | 85.808   | 97.65    | 313.9M   | 56.32   | link   |
| NFNet_F6_Weights.IMAGENET1K_V1               | 86.132   | 97.736   | 406.3M   | 73.14   | link   |
| NFNet_F7_Weights.IMAGENET1K_V1               | 86.314   | 97.814   | 528.6M   | 95.17   | link   |
| NFNet_L0_Weights.IMAGENET1K_V1               | 85.986   | 97.698   | 143.9M   | 40.56   | link   |
| NFNet_L1_Weights.IMAGENET1K_V1               | 86.678   | 97.94    | 170.4M   | 48.12   | link   |
| NFNet_L2_Weights.IMAGENET1K_V1               | 86.858   | 97.984   | 221.8M   | 62.44   | link   |
| NFNet_L3_Weights.IMAGENET1K_V1               | 86.994   | 98.006   | 295.0M   | 83.2    | link   |
| NFNet_L4_Weights.IMAGENET1K_V1               | 87.034   | 97.982   | 403.2M   | 113.2   | link   |
| NFNet_L5_Weights.IMAGENET1K_V1               | 87.138   | 97.99    | 588.1M   | 166.4   | link   |
| NFNet_L6_Weights.IMAGENET1K_V1               | 87.13    | 97.99    | 820.2M   | 244.7   | link   |
| NFNet_L7_Weights.IMAGENET1K_V1               | 87.094   | 97.982   | 1.1B     | 324.8   | link   |
| NFNet_L7_Weights.IMAGENET1K_V2               | 87.166   | 97.994   | 1.1B     | 324.8   | link   |
| NFNet_L7_Weights.IMAGENET1K_V3               | 87.256   | 98.002   | 1.1B     | 324.8   | link   |
| RegNetX_200MF_Weights.IMAGENET1K_V1          | 75.684   | 92.992   | 19.3M    | 4.11    | link   |
| RegNetX_400MF_Weights.IMAGENET1K_V1          | 78.696   | 94.442   | 25.6M    | 8.01    | link   |
| RegNetX_800MF_Weights.IMAGENET1K_V1          | 80.908   | 95.644   | 39.0M    | 15.88   | link   |
| RegNetX_1_6GF_Weights.IMAGENET1K_V1          | 82.95    | 96.56    | 63.6M    | 32.22   | link   |
| RegNetX_3_2GF_Weights.IMAGENET1K_V1          | 84.128   | 97.036   | 96.2M    | 49.14   | link   |
| RegNetX_8GF_Weights.IMAGENET1K_V1            | 85.356   | 97.53    | 169.6M   | 85.99   | link   |
| RegNetY_200MF_Weights.IMAGENET1K_V1          | 75.278   | 92.886   | 20.1M    | 4.18    | link   |
| RegNetY_400MF_Weights.IMAGENET1K_V1          | 79.6     | 94.752   | 29.6M    | 9.21    | link   |
| RegNetY_800MF_Weights.IMAGENET1K_V1          | 82.232   | 96.288   | 48.2M    | 18.27   | link   |
| RegNetY_1_6GF_Weights.IMAGENET1K_V1          | 84.316   | 97.134   | 83.9M    | 33.29   | link   |
| RegNetY_3_2GF_Weights.IMAGENET1K_V1          | 85.134   | 97.432   | 143.8M   | 58.08   | link   |
| RegNetY_8GF_Weights.IMAGENET1K_V1            | 86.364   | 97.882   | 251.9M   | 101.98  | link   |
| Res2Net_50_26w_4s_Weights.IMAGENET1K_V1      | 79.746   | 94.66    | 25.0M    | 4.37    | link   |
| Res2Net_50_26w_6s_Weights.IMAGENET1K_V1      | 80.668   | 95.198   | 25.5M    | 4.94    | link   |
| Res2Net_50_48w_2s_Weights.IMAGENET1K_V1      | 80.038   | 94.918   | 31.2M    | 7.17    | link   |
| Res2Net_50_14w_8s_Weights.IMAGENET1K_V1      | 79.566   | 94.598   | 24.4M    | 4.24    | link   |
| Res2Net_50_26w_8s_Weights.IMAGENET1K_V1      | 80.702   | 95.222   | 25.0M    | 4.93    | link   |
| Res2Net_101_26w_4s_Weights.IMAGENET1K_V1     | 80.528   | 95.144   | 45.6M    | 8.48    | link   |
| Res2Net_101_26w_6s_Weights.IMAGENET1K_V1     | 81.464   | 95.656   | 47.5M    | 9.75    | link   |
| Res2Net_101_26w_8s_Weights.IMAGENET1K_V1     | 81.812   | 95.778   | 45.6M    | 8.47    | link   |
| Res2Net_152_26w_4s_Weights.IMAGENET1K_V1     | 81.284   | 95.638   | 60.2M    | 11.73   | link   |
| Res2Net_152_26w_6s_Weights.IMAGENET1K_V1     | 81.882   | 95.798   | 62.1M    | 12.86   | link   |
| Res2Net_152_26w_8s_Weights.IMAGENET1K_V1     | 82.264   | 95.942   | 60.2M    | 11.73   | link   |
| ResNeSt_50_1s4x24d_Weights.IMAGENET1K_V1    | 80.64    | 95.332   | 27.5M    | 4.25    | link   |
| ResNeSt_50_2s1x64d_Weights.IMAGENET1K_V1    | 81.7     | 95.866   | 27.5M    | 4.26    | link   |
| ResNeSt_50_3s1x64d_Weights.IMAGENET1K_V1    | 81.946   | 95.972   | 27.5M    | 4.26    | link   |
| ResNeSt_50_3s2x40d_Weights.IMAGENET1K_V1    | 81.744   | 95.868   | 27.5M    | 4.25    | link   |
| ResNeSt_101_1s4x24d_Weights.IMAGENET1K_V1   | 82.576   | 96.218   | 47.0M    | 8.03    | link   |
| ResNeSt_200_1s2x40d_Weights.IMAGENET1K_V1   | 83.212   | 96.386   | 68.8M    | 11.34   | link   |
| ResNeSt_269_1s2x40d_Weights.IMAGENET1K_V1   | 83.596   | 96.514   | 110.7M   | 22.41   | link   |
| ResNet18_Weights.IMAGENET1K_V1                | 69.758   | 89.086   | 11.7M    | 1.82    | link   |
| ResNet34_Weights.IMAGENET1K_V1                | 73.498   | 91.464   | 21.8M    | 3.67    | link   |
| ResNet50_Weights.IMAGENET1K_V1                | 76.036   | 92.862   | 25.6M    | 4.12    | link   |
| ResNet50_Weights.IMAGENET1K_V2                | 76.142   | 93.004   | 25.6M    | 4.12    | link   |
| ResNet101_Weights.IMAGENET1K_V1               | 77.386   | 93.714   | 44.6M    | 7.85    | link   |
| ResNet152_Weights.IMAGENET1K_V1               | 78.312   | 94.234   | 60.2M    | 11.58   | link   |
| ResNet200_Weights.IMAGENET1K_V1               | 78.56    | 94.35    | 64.8M    | 15.47   | link   |
| ResNetV2_50_Weights.IMAGENET1K_V1            | 76.104   | 92.982   | 25.6M    | 4.12    | link   |
| ResNetV2_101_Weights.IMAGENET1K_V1           | 77.218   | 93.724   | 44.6M    | 7.85    | link   |
| ResNetV2_152_Weights.IMAGENET1K_V1           | 78.166   | 94.266   | 60.2M    | 11.58   | link   |
| ResNetV2_200_Weights.IMAGENET1K_V1           | 78.484   | 94.424   | 64.8M    | 15.47   | link   |
| ResNeXt50_32x4d_Weights.IMAGENET1K_V1        | 77.194   | 93.616   | 25.0M    | 4.26    | link   |
| ResNeXt50_32x4d_Weights.IMAGENET1K_V2        | 77.618   | 93.96    | 25.0M    | 4.26    | link   |
| ResNeXt50d_32x4d_Weights.IMAGENET1K_V1       | 77.124   | 93.614   | 25.0M    | 4.26    | link   |
| ResNeXt101_32x4d_Weights.IMAGENET1K_V1       | 78.956   | 94.668   | 44.2M    | 8.03    | link   |
| ResNeXt101_32x8d_Weights.IMAGENET1K_V1       | 79.306   | 94.908   | 88.8M    | 16.27   | link   |
| ResNeXt101_64x4d_Weights.IMAGENET1K_V1       | 78.78    | 94.638   | 44.2M    | 8.03    | link   |
| ResNeXt101_32x16d_Weights.IMAGENET1K_V1      | 79.186   | 94.858   | 177.7M   | 32.54   | link   |
| ResNeXt101_32x32d_Weights.IMAGENET1K_V1      | 79.412   | 94.968   | 355.4M   | 64.52   | link   |
| ResNeXt101_32x48d_Weights.IMAGENET1K_V1      | 79.64    | 95.108   | 533.1M   | 96.51   | link   |
| RinceNet_24_Weights.IMAGENET1K_V1             | 79.258   | 94.754   | 32.4M    | 5.24    | link   |
| RinceNet_36_Weights.IMAGENET1K_V1             | 80.306   | 95.278   | 46.5M    | 7.35    | link   |
| RinceNet_48_Weights.IMAGENET1K_V1             | 81.176   | 95.69    | 64.2M    | 10.52   | link   |
| RinceNet_64_Weights.IMAGENET1K_V1             | 81.892   | 96.072   | 96.1M    | 15.83   | link   |
| RinceNet_72_Weights.IMAGENET1K_V1             | 82.046   | 96.176   | 107.5M   | 17.77   | link   |
| RinceNet_80_Weights.IMAGENET1K_V1             | 82.274   | 96.282   | 121.2M   | 20.26   | link   |
| RinceNet_112_Weights.IMAGENET1K_V1            | 82.796   | 96.534   | 178.3M   | 30.06   | link   |
| RinceNet_168_Weights.IMAGENET1K_V1            | 83.1     | 96.654   | 265.5M   | 44.87   | link   |
| RinceNet_224_Weights.IMAGENET1K_V1            | 83.352   | 96.768   | 392.7M   | 66.5    | link   |
| SENet154_Weights.IMAGENET1K_V1                | 81.092   | 95.546   | 113.7M   | 66.8    | link   |
| SE_ResNet50_Weights.IMAGENET1K_V1             | 77.422   | 93.734   | 28.1M    | 4.14    | link   |
| SE_ResNet101_Weights.IMAGENET1K_V1            | 78.56    | 94.416   | 49.3M    | 8.14    | link   |
| SE_ResNet152_Weights.IMAGENET1K_V1            | 79.38    | 94.894   | 68.3M    | 11.14   | link   |
| SE_ResNeXt50_32x4d_Weights.IMAGENET1K_V1     | 78.134   | 94.21    | 28.0M    | 4.17    | link   |
| SE_ResNeXt101_32x4d_Weights.IMAGENET1K_V1    | 79.67    | 95.152   | 49.1M    | 8.12    | link   |
| SE_ResNeXt152_32x4d_Weights.IMAGENET1K_V1    | 80.296   | 95.552   | 68.1M    | 11.12   | link   |
| SENet_MixNet_S_Weights.IMAGENET1K_V1         | 80.964   | 95.754   | 3.5M     | 0.3     | link   |
| SENet_MixNet_M_Weights.IMAGENET1K_V1         | 81.35    | 95.95    | 5.0M     | 0.42    | link   |
| SENet_MixNet_L_Weights.IMAGENET1K_V1         | 82.076   | 96.276   | 7.3M     | 0.67    | link   |
| ShuffleNetV2_x0_25_Weights.IMAGENET1K_V1     | 59.35    | 81.462   | 0.47M    | 0.04    | link   |
| ShuffleNetV2_x0_33_Weights.IMAGENET1K_V1     | 62.008   | 83.464   | 0.67M    | 0.05    | link   |
| ShuffleNetV2_x0_5_Weights.IMAGENET1K_V1      | 69.1     | 89.234   | 1.36M    | 0.12    | link   |
| ShuffleNetV2_x1_Weights.IMAGENET1K_V1        | 70.622   | 89.968   | 2.27M    | 0.2     | link   |
| ShuffleNetV2_x1_5_Weights.IMAGENET1K_V1      | 73.284   | 91.422   | 4.42M    | 0.38    | link   |
| ShuffleNetV2_x2_Weights.IMAGENET1K_V1        | 74.448   | 91.976   | 7.4M     | 0.63    | link   |
| SqueezeNet1_0_Weights.IMAGENET1K_V1          | 57.72    | 79.158   | 1.25M    | 0.36    | link   |
| SqueezeNet1_1_Weights.IMAGENET1K_V1          | 59.066   | 80.37 | 1.24M    | 0.36    | link   |
| SwinTransformer_Tiny_Weights.IMAGENET1K_V1   | 83.212   | 96.524   | 28.0M    | 4.5     | link   |
| SwinTransformer_Small_Weights.IMAGENET1K_V1  | 84.29    | 97.006   | 50.3M    | 8.1     | link   |
| SwinTransformer_Base_Weights.IMAGENET1K_V1   | 85.64    | 97.54    | 86.9M    | 14.0    | link   |
| SwinTransformer_Large_Weights.IMAGENET1K_V1  | 86.314   | 97.814   | 197.0M   | 31.6    | link   |
| SwinTransformer_BasePatch4G_Weights.IMAGENET1K_V1 | 86.872 | 97.868   | 87.4M    | 14.1    | link   |
| SwinTransformer_LargePatch4G_Weights.IMAGENET1K_V1 | 87.06 | 97.902   | 197.6M | 31.6    | link   |




| Architecture          | Details                                              |
|-----------------------|------------------------------------------------------|
| Quantized GoogLeNet   | Provides support for INT8 quantized models.          |
| Quantized InceptionV3 | Provides support for INT8 quantized models.          |
| Quantized MobileNet V2| Provides support for INT8 quantized models.          |
| Quantized MobileNet V3| Provides support for INT8 quantized models.          |
| Quantized ResNet      | Provides support for INT8 quantized models.          |
| Quantized ResNeXt     | Provides support for INT8 quantized models.          |
| Quantized ShuffleNet V2| Provides support for INT8 quantized models.        |



```python
from torchvision.io import read_image
from torchvision.models.quantization import resnet50, ResNet50_QuantizedWeights

img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")

# Step 1: Initialize model with the best available weights
weights = ResNet50_QuantizedWeights.DEFAULT
model = resnet50(weights=weights, quantize=True)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score}%")
```

## Semantic Segmentation

The following semantic segmentation models are available, with or without pre-trained weights:

- DeepLabV3
- FCN
- LRASPP

```python
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image

img = read_image("gallery/assets/dog1.jpg")

# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
prediction = model(batch)["out"]
normalized_masks = prediction.softmax(dim=1)
class_to_idx = {cls: idx for (idx, cls) in enumerate(weights.meta["categories"])}
mask = normalized_masks[0, class_to_idx["dog"]]
to_pil_image(mask).show()
```




| Weight                                                       | Acc@1  | Acc@5  | Params  | GIPS | Recipe |
|--------------------------------------------------------------|--------|--------|---------|------|--------|
| GoogLeNet_QuantizedWeights.IMAGENET1K_FBGEMM_V1             | 69.826 | 89.404 | 6.6M    | 1.5  | link   |
| Inception_V3_QuantizedWeights.IMAGENET1K_FBGEMM_V1          | 77.176 | 93.354 | 27.2M   | 5.71 | link   |
| MobileNet_V2_QuantizedWeights.IMAGENET1K_QNNPACK_V1         | 71.658 | 90.15  | 3.5M    | 0.3  | link   |
| MobileNet_V3_Large_QuantizedWeights.IMAGENET1K_QNNPACK_V1   | 73.004 | 90.858 | 5.5M    | 0.22 | link   |
| ResNeXt101_32X8D_QuantizedWeights.IMAGENET1K_FBGEMM_V1     | 78.986 | 94.48  | 88.8M   | 16.41| link   |
| ResNeXt101_32X8D_QuantizedWeights.IMAGENET1K_FBGEMM_V2     | 82.574 | 96.132 | 88.8M   | 16.41| link   |
| ResNeXt101_64X4D_QuantizedWeights.IMAGENET1K_FBGEMM_V1     | 82.898 | 96.326 | 83.5M   | 15.46| link   |
| ResNet18_QuantizedWeights.IMAGENET1K_FBGEMM_V1              | 69.494 | 88.882 | 11.7M   | 1.81 | link   |
| ResNet50_QuantizedWeights.IMAGENET1K_FBGEMM_V1              | 75.92  | 92.814 | 25.6M   | 4.09 | link   |
| ResNet50_QuantizedWeights.IMAGENET1K_FBGEMM_V2              | 80.282 | 94.976 | 25.6M   | 4.09 | link   |
| ShuffleNet_V2_X0_5_QuantizedWeights.IMAGENET1K_FBGEMM_V1    | 57.972 | 79.78  | 1.4M    | 0.04 | link   |
| ShuffleNet_V2_X1_0_QuantizedWeights.IMAGENET1K_FBGEMM_V1    | 68.36  | 87.582 | 2.3M    | 0.14 | link   |
| ShuffleNet_V2_X1_5_QuantizedWeights.IMAGENET1K_FBGEMM_V1    | 72.052 | 90.7   | 3.5M    | 0.3  | link   |
| ShuffleNet_V2_X2_0_QuantizedWeights.IMAGENET1K_FBGEMM_V1    | 75.354 | 92.488 | 7.4M    | 0.58 | link   |



## Object Detection, Instance Segmentation and Person Keypoint Detection
The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision. The models expect a list of Tensor[C, H, W]. Check the constructor of the models for more information.

## Object Detection
The following object detection models are available, with or without pre-trained weights:

- Faster R-CNN
- FCOS
- RetinaNet
- SSD
- SSDlite


```python
from torchvision.io.image import read_image
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image

img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")

# Step 1: Initialize model with the best available weights
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
model = fasterrcnn_resnet50_fpn_v2(weights=weights, box_score_thresh=0.9)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = [preprocess(img)]

# Step 4: Use the model and visualize the prediction
prediction = model(batch)[0]
labels = [weights.meta["categories"][i] for i in prediction["labels"]]
box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                          labels=labels,
                          colors="red",
                          width=4, font_size=30)
im = to_pil_image(box.detach())
im.show()

```

## Video Classification

The video module is in Beta stage, and backward compatibility is not guaranteed.

The following video classification models are available, with or without pre-trained weights:

- Video MViT
- Video ResNet
- Video S3D
- Video SwinTransformer

```python
from torchvision.io.video import read_video
from torchvision.models.video import r3d_18, R3D_18_Weights

vid, _, _ = read_video("test/assets/videos/v_SoccerJuggling_g23_c01.avi", output_format="TCHW")
vid = vid[:32]  # optionally shorten duration

# Step 1: Initialize model with the best available weights
weights = R3D_18_Weights.DEFAULT
model = r3d_18(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(vid).unsqueeze(0)

# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
label = prediction.argmax().item()
score = prediction[label].item()
category_name = weights.meta["categories"][label]
print(f"{category_name}: {100 * score}%")

```



| Weight                                 | Acc@1  | Acc@5  | Params | GFLOPS | Recipe                                        |
|----------------------------------------|--------|--------|--------|--------|-----------------------------------------------|
| MC3_18_Weights.KINETICS400_V1          | 63.96  | 84.13  | 11.7M  | 43.34  | [link](link)                                  |
| MViT_V1_B_Weights.KINETICS400_V1       | 78.477 | 93.582 | 36.6M  | 70.6   | [link](link)                                  |
| MViT_V2_S_Weights.KINETICS400_V1       | 80.757 | 94.665 | 34.5M  | 64.22  | [link](link)                                  |
| R2Plus1D_18_Weights.KINETICS400_V1     | 67.463 | 86.175 | 31.5M  | 40.52  | [link](link)                                  |
| R3D_18_Weights.KINETICS400_V1          | 63.2   | 83.479 | 33.4M  | 40.7   | [link](link)                                  |
| S3D_Weights.KINETICS400_V1             | 68.368 | 88.05  | 8.3M   | 17.98  | [link](link)                                  |
| Swin3D_B_Weights.KINETICS400_V1        | 79.427 | 94.386 | 88.0M  | 140.67 | [link](link)                                  |
| Swin3D_B_Weights.KINETICS400_IMAGENET22K_V1 | 81.643 | 95.574 | 88.0M  | 140.67 | [link](link)                                  |
| Swin3D_S_Weights.KINETICS400_V1        | 79.521 | 94.158 | 49.8M  | 82.84  | [link](link)                                  |
| Swin3D_T_Weights.KINETICS400_V1        | 77.715 | 93.519 | 28.2M  | 43.88  | [link](link)                                  |
| Optical Flow                           |        |        |        |        |                                               |



## Instance Segmentation

# DATASETS



1. Caltech101: Caltech 101 Dataset.
2. Caltech256: Caltech 256 Dataset.
3. CelebA: Large-scale CelebFaces Attributes (CelebA) Dataset.
4. CIFAR10: CIFAR-10 Dataset.
5. CIFAR100: CIFAR-100 Dataset.
6. Country211: Country211 Data Set from OpenAI.
7. DTD: Describable Textures Dataset (DTD).
8. EMNIST: EMNIST Dataset.
9. EuroSAT: RGB version of the EuroSAT Dataset.
10. FakeData: A fake dataset that returns randomly generated images.
11. FashionMNIST: Fashion-MNIST Dataset.
12. FER2013: FER2013 Dataset.
13. FGVC Aircraft: FGVC Aircraft Dataset.
14. Flickr8k: Flickr8k Entities Dataset.
15. Flickr30k: Flickr30k Entities Dataset.
16. Flowers102: Oxford 102 Flower Dataset.
17. Food101: Food-101 Data Set.
18. GTSRB: German Traffic Sign Recognition Benchmark (GTSRB) Dataset.
19. iNaturalist: iNaturalist Dataset.
20. ImageNet: ImageNet 2012 Classification Dataset.
21. Imagenette: Imagenette image classification dataset.
22. KMNIST: Kuzushiji-MNIST Dataset.
23. LFW: Labeled Faces in the Wild (LFW) Dataset.
24. LSUN: Large-scale Scene Understanding (LSUN) Dataset.
25. MNIST: MNIST Dataset.
26. Omniglot: Omniglot Dataset.
27. Oxford-IIIT Pet: Oxford-IIIT Pet Dataset.
28. Places365: Places365 Classification Dataset.
29. PCAM: PatchCamelyon (PCAM) Dataset.
30. QMNIST: QMNIST Dataset.
31. Rendered SST2: Rendered SST2 Dataset.
32. SEMEION: SEMEION Dataset.
33. SBU: SBU Captioned Photo Dataset.
34. Stanford Cars: Stanford Cars Dataset.
35. STL10: STL-10 Dataset.
36. SUN397: SUN397 Data Set.
37. SVHN: Street View House Numbers (SVHN) Dataset.
38. USPS: USPS Dataset.





1. CocoDetection: MS Coco Detection Dataset.
2. CelebA: Large-scale CelebFaces Attributes (CelebA) Dataset.
3. Cityscapes: Cityscapes Dataset.
4. Kitti: KITTI Dataset.
5. OxfordIIITPet: Oxford-IIIT Pet Dataset.
6. SBDataset: Semantic Boundaries Dataset.
7. VOCSegmentation: Pascal VOC Segmentation Dataset.
8. VOCDetection: Pascal VOC Detection Dataset.
9. WIDERFace: WIDERFace Dataset.
10. Optical Flow:
    - FlyingChairs: FlyingChairs Dataset for optical flow.
    - FlyingThings3D: FlyingThings3D dataset for optical flow.
    - HD1K: HD1K dataset for optical flow.
    - KittiFlow: KITTI dataset for optical flow (2015).
    - Sintel: Sintel Dataset for optical flow.
11. Stereo Matching:
    - CarlaStereo: Carla simulator data linked in the CREStereo github repo.
    - Kitti2012Stereo: KITTI dataset from the 2012 stereo evaluation benchmark.
    - Kitti2015Stereo: KITTI dataset from the 2015 stereo evaluation benchmark.
    - CREStereo: Synthetic dataset used in training the CREStereo architecture.
    - FallingThingsStereo: FallingThings dataset.
    - SceneFlowStereo: Dataset interface for Scene Flow datasets.
    - SintelStereo: Sintel Stereo Dataset.
    - InStereo2k: InStereo2k dataset.
    - ETH3DStereo: ETH3D Low-Res Two-View dataset.
    - Middlebury2014Stereo: Publicly available scenes from the Middlebury dataset 2014 version.
12. Image Pairs:
    - LFWPairs: LFW Dataset.
    - PhotoTour: Multi-view Stereo Correspondence Dataset.
13. Image Captioning:
    - CocoCaptions: MS Coco Captions Dataset.
14. Video Classification:
    - HMDB51: HMDB51 dataset.
    - Kinetics: Generic Kinetics dataset.
    - UCF101: UCF101 dataset.
15. Video Prediction:
    - MovingMNIST: MovingMNIST Dataset.
16. Base Classes for Custom Datasets:
    - DatasetFolder: A generic data loader.
    - ImageFolder: A generic data loader where the images are arranged in a specific way by default.
    - VisionDataset: Base Class for making datasets compatible with torchvision.