diff --git a/0.11./_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip b/0.11./_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
deleted file mode 100644
index 386aeea972b..00000000000
Binary files a/0.11./_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip and /dev/null differ
diff --git a/0.11./_downloads/0a0ea3da81f0782f42d1ded74c1acb75/plot_video_api.ipynb b/0.11./_downloads/0a0ea3da81f0782f42d1ded74c1acb75/plot_video_api.ipynb
deleted file mode 100644
index afc4b7ffe50..00000000000
--- a/0.11./_downloads/0a0ea3da81f0782f42d1ded74c1acb75/plot_video_api.ipynb
+++ /dev/null
@@ -1,309 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n# Video API\n\nThis example illustrates some of the APIs that torchvision offers for\nvideos, together with the examples on how to build datasets and more.\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 1. Introduction: building a new video object and examining the properties\nFirst we select a video to test the object out. For the sake of argument\nwe're using one from kinetics400 dataset.\nTo create it, we need to define the path and the stream we want to use.\n\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Chosen video statistics:\n\n- WUzgd7C1pWA.mp4\n - source:\n - kinetics-400\n - video:\n - H-264\n - MPEG-4 AVC (part 10) (avc1)\n - fps: 29.97\n - audio:\n - MPEG AAC audio (mp4a)\n - sample rate: 48K Hz\n\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import torch\nimport torchvision\nfrom torchvision.datasets.utils import download_url\n\n# Download the sample video\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/WUzgd7C1pWA.mp4?raw=true\",\n \".\",\n \"WUzgd7C1pWA.mp4\"\n)\nvideo_path = \"./WUzgd7C1pWA.mp4\""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Streams are defined in a similar fashion as torch devices. We encode them as strings in a form\nof ``stream_type:stream_id`` where ``stream_type`` is a string and ``stream_id`` a long int.\nThe constructor accepts passing a ``stream_type`` only, in which case the stream is auto-discovered.\nFirstly, let's get the metadata for our particular video:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "stream = \"video\"\nvideo = torchvision.io.VideoReader(video_path, stream)\nvideo.get_metadata()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Here we can see that video has two streams - a video and an audio stream.\nCurrently available stream types include ['video', 'audio'].\nEach descriptor consists of two parts: stream type (e.g. 'video') and a unique stream id\n(which are determined by video encoding).\nIn this way, if the video container contains multiple streams of the same type,\nusers can access the one they want.\nIf only stream type is passed, the decoder auto-detects first stream of that type and returns it.\n\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Let's read all the frames from the video stream. By default, the return value of\n``next(video_reader)`` is a dict containing the following fields.\n\nThe return fields are:\n\n- ``data``: containing a torch.tensor\n- ``pts``: containing a float timestamp of this particular frame\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "metadata = video.get_metadata()\nvideo.set_current_stream(\"audio\")\n\nframes = [] # we are going to save the frames here.\nptss = [] # pts is a presentation timestamp in seconds (float) of each frame\nfor frame in video:\n frames.append(frame['data'])\n ptss.append(frame['pts'])\n\nprint(\"PTS for first five frames \", ptss[:5])\nprint(\"Total number of frames: \", len(frames))\napprox_nf = metadata['audio']['duration'][0] * metadata['audio']['framerate'][0]\nprint(\"Approx total number of datapoints we can expect: \", approx_nf)\nprint(\"Read data size: \", frames[0].size(0) * len(frames))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "But what if we only want to read certain time segment of the video?\nThat can be done easily using the combination of our ``seek`` function, and the fact that each call\nto next returns the presentation timestamp of the returned frame in seconds.\n\nGiven that our implementation relies on python iterators,\nwe can leverage itertools to simplify the process and make it more pythonic.\n\nFor example, if we wanted to read ten frames from second second:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import itertools\nvideo.set_current_stream(\"video\")\n\nframes = [] # we are going to save the frames here.\n\n# We seek into a second second of the video and use islice to get 10 frames since\nfor frame, pts in itertools.islice(video.seek(2), 10):\n frames.append(frame)\n\nprint(\"Total number of frames: \", len(frames))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Or if we wanted to read from 2nd to 5th second,\nWe seek into a second second of the video,\nthen we utilize the itertools takewhile to get the\ncorrect number of frames:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "video.set_current_stream(\"video\")\nframes = [] # we are going to save the frames here.\nvideo = video.seek(2)\n\nfor frame in itertools.takewhile(lambda x: x['pts'] <= 5, video):\n frames.append(frame['data'])\n\nprint(\"Total number of frames: \", len(frames))\napprox_nf = (5 - 2) * video.get_metadata()['video']['fps'][0]\nprint(\"We can expect approx: \", approx_nf)\nprint(\"Tensor size: \", frames[0].size())"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 2. Building a sample read_video function\nWe can utilize the methods above to build the read video function that follows\nthe same API to the existing ``read_video`` function.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "def example_read_video(video_object, start=0, end=None, read_video=True, read_audio=True):\n if end is None:\n end = float(\"inf\")\n if end < start:\n raise ValueError(\n \"end time should be larger than start time, got \"\n \"start time={} and end time={}\".format(start, end)\n )\n\n video_frames = torch.empty(0)\n video_pts = []\n if read_video:\n video_object.set_current_stream(\"video\")\n frames = []\n for frame in itertools.takewhile(lambda x: x['pts'] <= end, video_object.seek(start)):\n frames.append(frame['data'])\n video_pts.append(frame['pts'])\n if len(frames) > 0:\n video_frames = torch.stack(frames, 0)\n\n audio_frames = torch.empty(0)\n audio_pts = []\n if read_audio:\n video_object.set_current_stream(\"audio\")\n frames = []\n for frame in itertools.takewhile(lambda x: x['pts'] <= end, video_object.seek(start)):\n frames.append(frame['data'])\n video_pts.append(frame['pts'])\n if len(frames) > 0:\n audio_frames = torch.cat(frames, 0)\n\n return video_frames, audio_frames, (video_pts, audio_pts), video_object.get_metadata()\n\n\n# Total number of frames should be 327 for video and 523264 datapoints for audio\nvf, af, info, meta = example_read_video(video)\nprint(vf.size(), af.size())"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3. Building an example randomly sampled dataset (can be applied to training dataest of kinetics400)\nCool, so now we can use the same principle to make the sample dataset.\nWe suggest trying out iterable dataset for this purpose.\nHere, we are going to build an example dataset that reads randomly selected 10 frames of video.\n\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Make sample dataset\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import os\nos.makedirs(\"./dataset\", exist_ok=True)\nos.makedirs(\"./dataset/1\", exist_ok=True)\nos.makedirs(\"./dataset/2\", exist_ok=True)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Download the videos\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.datasets.utils import download_url\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/WUzgd7C1pWA.mp4?raw=true\",\n \"./dataset/1\", \"WUzgd7C1pWA.mp4\"\n)\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/RATRACE_wave_f_nm_np1_fr_goo_37.avi?raw=true\",\n \"./dataset/1\",\n \"RATRACE_wave_f_nm_np1_fr_goo_37.avi\"\n)\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/SOX5yA1l24A.mp4?raw=true\",\n \"./dataset/2\",\n \"SOX5yA1l24A.mp4\"\n)\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/v_SoccerJuggling_g23_c01.avi?raw=true\",\n \"./dataset/2\",\n \"v_SoccerJuggling_g23_c01.avi\"\n)\ndownload_url(\n \"https://github.com/pytorch/vision/blob/main/test/assets/videos/v_SoccerJuggling_g24_c01.avi?raw=true\",\n \"./dataset/2\",\n \"v_SoccerJuggling_g24_c01.avi\"\n)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Housekeeping and utilities\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import os\nimport random\n\nfrom torchvision.datasets.folder import make_dataset\nfrom torchvision import transforms as t\n\n\ndef _find_classes(dir):\n classes = [d.name for d in os.scandir(dir) if d.is_dir()]\n classes.sort()\n class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}\n return classes, class_to_idx\n\n\ndef get_samples(root, extensions=(\".mp4\", \".avi\")):\n _, class_to_idx = _find_classes(root)\n return make_dataset(root, class_to_idx, extensions=extensions)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We are going to define the dataset and some basic arguments.\nWe assume the structure of the FolderDataset, and add the following parameters:\n\n- ``clip_len``: length of a clip in frames\n- ``frame_transform``: transform for every frame individually\n- ``video_transform``: transform on a video sequence\n\n
Note
We actually add epoch size as using :func:`~torch.utils.data.IterableDataset`\n class allows us to naturally oversample clips or images from each video if needed.
\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "class RandomDataset(torch.utils.data.IterableDataset):\n def __init__(self, root, epoch_size=None, frame_transform=None, video_transform=None, clip_len=16):\n super(RandomDataset).__init__()\n\n self.samples = get_samples(root)\n\n # Allow for temporal jittering\n if epoch_size is None:\n epoch_size = len(self.samples)\n self.epoch_size = epoch_size\n\n self.clip_len = clip_len\n self.frame_transform = frame_transform\n self.video_transform = video_transform\n\n def __iter__(self):\n for i in range(self.epoch_size):\n # Get random sample\n path, target = random.choice(self.samples)\n # Get video object\n vid = torchvision.io.VideoReader(path, \"video\")\n metadata = vid.get_metadata()\n video_frames = [] # video frame buffer\n\n # Seek and return frames\n max_seek = metadata[\"video\"]['duration'][0] - (self.clip_len / metadata[\"video\"]['fps'][0])\n start = random.uniform(0., max_seek)\n for frame in itertools.islice(vid.seek(start), self.clip_len):\n video_frames.append(self.frame_transform(frame['data']))\n current_pts = frame['pts']\n # Stack it into a tensor\n video = torch.stack(video_frames, 0)\n if self.video_transform:\n video = self.video_transform(video)\n output = {\n 'path': path,\n 'video': video,\n 'target': target,\n 'start': start,\n 'end': current_pts}\n yield output"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Given a path of videos in a folder structure, i.e:\n\n- dataset\n - class 1\n - file 0\n - file 1\n - ...\n - class 2\n - file 0\n - file 1\n - ...\n - ...\n\nWe can generate a dataloader and test the dataset.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "transforms = [t.Resize((112, 112))]\nframe_transform = t.Compose(transforms)\n\ndataset = RandomDataset(\"./dataset\", epoch_size=None, frame_transform=frame_transform)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torch.utils.data import DataLoader\nloader = DataLoader(dataset, batch_size=12)\ndata = {\"video\": [], 'start': [], 'end': [], 'tensorsize': []}\nfor batch in loader:\n for i in range(len(batch['path'])):\n data['video'].append(batch['path'][i])\n data['start'].append(batch['start'][i].item())\n data['end'].append(batch['end'][i].item())\n data['tensorsize'].append(batch['video'][i].size())\nprint(data)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 4. Data Visualization\nExample of visualized video\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import matplotlib.pylab as plt\n\nplt.figure(figsize=(12, 12))\nfor i in range(16):\n plt.subplot(4, 4, i + 1)\n plt.imshow(batch[\"video\"][0, i, ...].permute(1, 2, 0))\n plt.axis(\"off\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Cleanup the video and dataset:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import os\nimport shutil\nos.remove(\"./WUzgd7C1pWA.mp4\")\nshutil.rmtree(\"./dataset\")"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
\ No newline at end of file
diff --git a/0.11./_downloads/1031091ece7f376de0a2c941f5c11f30/plot_visualization_utils.py b/0.11./_downloads/1031091ece7f376de0a2c941f5c11f30/plot_visualization_utils.py
deleted file mode 100644
index daa22fe8fa6..00000000000
--- a/0.11./_downloads/1031091ece7f376de0a2c941f5c11f30/plot_visualization_utils.py
+++ /dev/null
@@ -1,368 +0,0 @@
-"""
-=======================
-Visualization utilities
-=======================
-
-This example illustrates some of the utilities that torchvision offers for
-visualizing images, bounding boxes, and segmentation masks.
-"""
-
-# sphinx_gallery_thumbnail_path = "../../gallery/assets/visualization_utils_thumbnail.png"
-
-import torch
-import numpy as np
-import matplotlib.pyplot as plt
-
-import torchvision.transforms.functional as F
-
-
-plt.rcParams["savefig.bbox"] = 'tight'
-
-
-def show(imgs):
- if not isinstance(imgs, list):
- imgs = [imgs]
- fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)
- for i, img in enumerate(imgs):
- img = img.detach()
- img = F.to_pil_image(img)
- axs[0, i].imshow(np.asarray(img))
- axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
-
-
-####################################
-# Visualizing a grid of images
-# ----------------------------
-# The :func:`~torchvision.utils.make_grid` function can be used to create a
-# tensor that represents multiple images in a grid. This util requires a single
-# image of dtype ``uint8`` as input.
-
-from torchvision.utils import make_grid
-from torchvision.io import read_image
-from pathlib import Path
-
-dog1_int = read_image(str(Path('assets') / 'dog1.jpg'))
-dog2_int = read_image(str(Path('assets') / 'dog2.jpg'))
-
-grid = make_grid([dog1_int, dog2_int, dog1_int, dog2_int])
-show(grid)
-
-####################################
-# Visualizing bounding boxes
-# --------------------------
-# We can use :func:`~torchvision.utils.draw_bounding_boxes` to draw boxes on an
-# image. We can set the colors, labels, width as well as font and font size.
-# The boxes are in ``(xmin, ymin, xmax, ymax)`` format.
-
-from torchvision.utils import draw_bounding_boxes
-
-
-boxes = torch.tensor([[50, 50, 100, 200], [210, 150, 350, 430]], dtype=torch.float)
-colors = ["blue", "yellow"]
-result = draw_bounding_boxes(dog1_int, boxes, colors=colors, width=5)
-show(result)
-
-
-#####################################
-# Naturally, we can also plot bounding boxes produced by torchvision detection
-# models. Here is demo with a Faster R-CNN model loaded from
-# :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`
-# model. You can also try using a RetinaNet with
-# :func:`~torchvision.models.detection.retinanet_resnet50_fpn`, an SSDlite with
-# :func:`~torchvision.models.detection.ssdlite320_mobilenet_v3_large` or an SSD with
-# :func:`~torchvision.models.detection.ssd300_vgg16`. For more details
-# on the output of such models, you may refer to :ref:`instance_seg_output`.
-
-from torchvision.models.detection import fasterrcnn_resnet50_fpn
-from torchvision.transforms.functional import convert_image_dtype
-
-
-batch_int = torch.stack([dog1_int, dog2_int])
-batch = convert_image_dtype(batch_int, dtype=torch.float)
-
-model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)
-model = model.eval()
-
-outputs = model(batch)
-print(outputs)
-
-#####################################
-# Let's plot the boxes detected by our model. We will only plot the boxes with a
-# score greater than a given threshold.
-
-score_threshold = .8
-dogs_with_boxes = [
- draw_bounding_boxes(dog_int, boxes=output['boxes'][output['scores'] > score_threshold], width=4)
- for dog_int, output in zip(batch_int, outputs)
-]
-show(dogs_with_boxes)
-
-#####################################
-# Visualizing segmentation masks
-# ------------------------------
-# The :func:`~torchvision.utils.draw_segmentation_masks` function can be used to
-# draw segmentation masks on images. Semantic segmentation and instance
-# segmentation models have different outputs, so we will treat each
-# independently.
-#
-# .. _semantic_seg_output:
-#
-# Semantic segmentation models
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-#
-# We will see how to use it with torchvision's FCN Resnet-50, loaded with
-# :func:`~torchvision.models.segmentation.fcn_resnet50`. You can also try using
-# DeepLabv3 (:func:`~torchvision.models.segmentation.deeplabv3_resnet50`) or
-# lraspp mobilenet models
-# (:func:`~torchvision.models.segmentation.lraspp_mobilenet_v3_large`).
-#
-# Let's start by looking at the ouput of the model. Remember that in general,
-# images must be normalized before they're passed to a semantic segmentation
-# model.
-
-from torchvision.models.segmentation import fcn_resnet50
-
-
-model = fcn_resnet50(pretrained=True, progress=False)
-model = model.eval()
-
-normalized_batch = F.normalize(batch, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
-output = model(normalized_batch)['out']
-print(output.shape, output.min().item(), output.max().item())
-
-#####################################
-# As we can see above, the output of the segmentation model is a tensor of shape
-# ``(batch_size, num_classes, H, W)``. Each value is a non-normalized score, and
-# we can normalize them into ``[0, 1]`` by using a softmax. After the softmax,
-# we can interpret each value as a probability indicating how likely a given
-# pixel is to belong to a given class.
-#
-# Let's plot the masks that have been detected for the dog class and for the
-# boat class:
-
-sem_classes = [
- '__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
- 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',
- 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
-]
-sem_class_to_idx = {cls: idx for (idx, cls) in enumerate(sem_classes)}
-
-normalized_masks = torch.nn.functional.softmax(output, dim=1)
-
-dog_and_boat_masks = [
- normalized_masks[img_idx, sem_class_to_idx[cls]]
- for img_idx in range(batch.shape[0])
- for cls in ('dog', 'boat')
-]
-
-show(dog_and_boat_masks)
-
-#####################################
-# As expected, the model is confident about the dog class, but not so much for
-# the boat class.
-#
-# The :func:`~torchvision.utils.draw_segmentation_masks` function can be used to
-# plots those masks on top of the original image. This function expects the
-# masks to be boolean masks, but our masks above contain probabilities in ``[0,
-# 1]``. To get boolean masks, we can do the following:
-
-class_dim = 1
-boolean_dog_masks = (normalized_masks.argmax(class_dim) == sem_class_to_idx['dog'])
-print(f"shape = {boolean_dog_masks.shape}, dtype = {boolean_dog_masks.dtype}")
-show([m.float() for m in boolean_dog_masks])
-
-
-#####################################
-# The line above where we define ``boolean_dog_masks`` is a bit cryptic, but you
-# can read it as the following query: "For which pixels is 'dog' the most likely
-# class?"
-#
-# .. note::
-# While we're using the ``normalized_masks`` here, we would have
-# gotten the same result by using the non-normalized scores of the model
-# directly (as the softmax operation preserves the order).
-#
-# Now that we have boolean masks, we can use them with
-# :func:`~torchvision.utils.draw_segmentation_masks` to plot them on top of the
-# original images:
-
-from torchvision.utils import draw_segmentation_masks
-
-dogs_with_masks = [
- draw_segmentation_masks(img, masks=mask, alpha=0.7)
- for img, mask in zip(batch_int, boolean_dog_masks)
-]
-show(dogs_with_masks)
-
-#####################################
-# We can plot more than one mask per image! Remember that the model returned as
-# many masks as there are classes. Let's ask the same query as above, but this
-# time for *all* classes, not just the dog class: "For each pixel and each class
-# C, is class C the most most likely class?"
-#
-# This one is a bit more involved, so we'll first show how to do it with a
-# single image, and then we'll generalize to the batch
-
-num_classes = normalized_masks.shape[1]
-dog1_masks = normalized_masks[0]
-class_dim = 0
-dog1_all_classes_masks = dog1_masks.argmax(class_dim) == torch.arange(num_classes)[:, None, None]
-
-print(f"dog1_masks shape = {dog1_masks.shape}, dtype = {dog1_masks.dtype}")
-print(f"dog1_all_classes_masks = {dog1_all_classes_masks.shape}, dtype = {dog1_all_classes_masks.dtype}")
-
-dog_with_all_masks = draw_segmentation_masks(dog1_int, masks=dog1_all_classes_masks, alpha=.6)
-show(dog_with_all_masks)
-
-#####################################
-# We can see in the image above that only 2 masks were drawn: the mask for the
-# background and the mask for the dog. This is because the model thinks that
-# only these 2 classes are the most likely ones across all the pixels. If the
-# model had detected another class as the most likely among other pixels, we
-# would have seen its mask above.
-#
-# Removing the background mask is as simple as passing
-# ``masks=dog1_all_classes_masks[1:]``, because the background class is the
-# class with index 0.
-#
-# Let's now do the same but for an entire batch of images. The code is similar
-# but involves a bit more juggling with the dimensions.
-
-class_dim = 1
-all_classes_masks = normalized_masks.argmax(class_dim) == torch.arange(num_classes)[:, None, None, None]
-print(f"shape = {all_classes_masks.shape}, dtype = {all_classes_masks.dtype}")
-# The first dimension is the classes now, so we need to swap it
-all_classes_masks = all_classes_masks.swapaxes(0, 1)
-
-dogs_with_masks = [
- draw_segmentation_masks(img, masks=mask, alpha=.6)
- for img, mask in zip(batch_int, all_classes_masks)
-]
-show(dogs_with_masks)
-
-
-#####################################
-# .. _instance_seg_output:
-#
-# Instance segmentation models
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-#
-# Instance segmentation models have a significantly different output from the
-# semantic segmentation models. We will see here how to plot the masks for such
-# models. Let's start by analyzing the output of a Mask-RCNN model. Note that
-# these models don't require the images to be normalized, so we don't need to
-# use the normalized batch.
-#
-# .. note::
-#
-# We will here describe the output of a Mask-RCNN model. The models in
-# :ref:`object_det_inst_seg_pers_keypoint_det` all have a similar output
-# format, but some of them may have extra info like keypoints for
-# :func:`~torchvision.models.detection.keypointrcnn_resnet50_fpn`, and some
-# of them may not have masks, like
-# :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`.
-
-from torchvision.models.detection import maskrcnn_resnet50_fpn
-model = maskrcnn_resnet50_fpn(pretrained=True, progress=False)
-model = model.eval()
-
-output = model(batch)
-print(output)
-
-#####################################
-# Let's break this down. For each image in the batch, the model outputs some
-# detections (or instances). The number of detections varies for each input
-# image. Each instance is described by its bounding box, its label, its score
-# and its mask.
-#
-# The way the output is organized is as follows: the output is a list of length
-# ``batch_size``. Each entry in the list corresponds to an input image, and it
-# is a dict with keys 'boxes', 'labels', 'scores', and 'masks'. Each value
-# associated to those keys has ``num_instances`` elements in it. In our case
-# above there are 3 instances detected in the first image, and 2 instances in
-# the second one.
-#
-# The boxes can be plotted with :func:`~torchvision.utils.draw_bounding_boxes`
-# as above, but here we're more interested in the masks. These masks are quite
-# different from the masks that we saw above for the semantic segmentation
-# models.
-
-dog1_output = output[0]
-dog1_masks = dog1_output['masks']
-print(f"shape = {dog1_masks.shape}, dtype = {dog1_masks.dtype}, "
- f"min = {dog1_masks.min()}, max = {dog1_masks.max()}")
-
-#####################################
-# Here the masks corresponds to probabilities indicating, for each pixel, how
-# likely it is to belong to the predicted label of that instance. Those
-# predicted labels correspond to the 'labels' element in the same output dict.
-# Let's see which labels were predicted for the instances of the first image.
-
-inst_classes = [
- '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
- 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
- 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
- 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
- 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
- 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
- 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
- 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
- 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
- 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
- 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
- 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
-]
-
-inst_class_to_idx = {cls: idx for (idx, cls) in enumerate(inst_classes)}
-
-print("For the first dog, the following instances were detected:")
-print([inst_classes[label] for label in dog1_output['labels']])
-
-#####################################
-# Interestingly, the model detects two persons in the image. Let's go ahead and
-# plot those masks. Since :func:`~torchvision.utils.draw_segmentation_masks`
-# expects boolean masks, we need to convert those probabilities into boolean
-# values. Remember that the semantic of those masks is "How likely is this pixel
-# to belong to the predicted class?". As a result, a natural way of converting
-# those masks into boolean values is to threshold them with the 0.5 probability
-# (one could also choose a different threshold).
-
-proba_threshold = 0.5
-dog1_bool_masks = dog1_output['masks'] > proba_threshold
-print(f"shape = {dog1_bool_masks.shape}, dtype = {dog1_bool_masks.dtype}")
-
-# There's an extra dimension (1) to the masks. We need to remove it
-dog1_bool_masks = dog1_bool_masks.squeeze(1)
-
-show(draw_segmentation_masks(dog1_int, dog1_bool_masks, alpha=0.9))
-
-#####################################
-# The model seems to have properly detected the dog, but it also confused trees
-# with people. Looking more closely at the scores will help us plotting more
-# relevant masks:
-
-print(dog1_output['scores'])
-
-#####################################
-# Clearly the model is more confident about the dog detection than it is about
-# the people detections. That's good news. When plotting the masks, we can ask
-# for only those that have a good score. Let's use a score threshold of .75
-# here, and also plot the masks of the second dog.
-
-score_threshold = .75
-
-boolean_masks = [
- out['masks'][out['scores'] > score_threshold] > proba_threshold
- for out in output
-]
-
-dogs_with_masks = [
- draw_segmentation_masks(img, mask.squeeze(1))
- for img, mask in zip(batch_int, boolean_masks)
-]
-show(dogs_with_masks)
-
-#####################################
-# The two 'people' masks in the first image where not selected because they have
-# a lower score than the score threshold. Similarly in the second image, the
-# instance with class 15 (which corresponds to 'bench') was not selected.
diff --git a/0.11./_downloads/19a6d5f6ec4c29d7cbcc4a07a4b5339c/plot_video_api.py b/0.11./_downloads/19a6d5f6ec4c29d7cbcc4a07a4b5339c/plot_video_api.py
deleted file mode 100644
index fe296d67be0..00000000000
--- a/0.11./_downloads/19a6d5f6ec4c29d7cbcc4a07a4b5339c/plot_video_api.py
+++ /dev/null
@@ -1,341 +0,0 @@
-"""
-=======================
-Video API
-=======================
-
-This example illustrates some of the APIs that torchvision offers for
-videos, together with the examples on how to build datasets and more.
-"""
-
-####################################
-# 1. Introduction: building a new video object and examining the properties
-# -------------------------------------------------------------------------
-# First we select a video to test the object out. For the sake of argument
-# we're using one from kinetics400 dataset.
-# To create it, we need to define the path and the stream we want to use.
-
-######################################
-# Chosen video statistics:
-#
-# - WUzgd7C1pWA.mp4
-# - source:
-# - kinetics-400
-# - video:
-# - H-264
-# - MPEG-4 AVC (part 10) (avc1)
-# - fps: 29.97
-# - audio:
-# - MPEG AAC audio (mp4a)
-# - sample rate: 48K Hz
-#
-
-import torch
-import torchvision
-from torchvision.datasets.utils import download_url
-
-# Download the sample video
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/WUzgd7C1pWA.mp4?raw=true",
- ".",
- "WUzgd7C1pWA.mp4"
-)
-video_path = "./WUzgd7C1pWA.mp4"
-
-######################################
-# Streams are defined in a similar fashion as torch devices. We encode them as strings in a form
-# of ``stream_type:stream_id`` where ``stream_type`` is a string and ``stream_id`` a long int.
-# The constructor accepts passing a ``stream_type`` only, in which case the stream is auto-discovered.
-# Firstly, let's get the metadata for our particular video:
-
-stream = "video"
-video = torchvision.io.VideoReader(video_path, stream)
-video.get_metadata()
-
-######################################
-# Here we can see that video has two streams - a video and an audio stream.
-# Currently available stream types include ['video', 'audio'].
-# Each descriptor consists of two parts: stream type (e.g. 'video') and a unique stream id
-# (which are determined by video encoding).
-# In this way, if the video container contains multiple streams of the same type,
-# users can access the one they want.
-# If only stream type is passed, the decoder auto-detects first stream of that type and returns it.
-
-######################################
-# Let's read all the frames from the video stream. By default, the return value of
-# ``next(video_reader)`` is a dict containing the following fields.
-#
-# The return fields are:
-#
-# - ``data``: containing a torch.tensor
-# - ``pts``: containing a float timestamp of this particular frame
-
-metadata = video.get_metadata()
-video.set_current_stream("audio")
-
-frames = [] # we are going to save the frames here.
-ptss = [] # pts is a presentation timestamp in seconds (float) of each frame
-for frame in video:
- frames.append(frame['data'])
- ptss.append(frame['pts'])
-
-print("PTS for first five frames ", ptss[:5])
-print("Total number of frames: ", len(frames))
-approx_nf = metadata['audio']['duration'][0] * metadata['audio']['framerate'][0]
-print("Approx total number of datapoints we can expect: ", approx_nf)
-print("Read data size: ", frames[0].size(0) * len(frames))
-
-######################################
-# But what if we only want to read certain time segment of the video?
-# That can be done easily using the combination of our ``seek`` function, and the fact that each call
-# to next returns the presentation timestamp of the returned frame in seconds.
-#
-# Given that our implementation relies on python iterators,
-# we can leverage itertools to simplify the process and make it more pythonic.
-#
-# For example, if we wanted to read ten frames from second second:
-
-
-import itertools
-video.set_current_stream("video")
-
-frames = [] # we are going to save the frames here.
-
-# We seek into a second second of the video and use islice to get 10 frames since
-for frame, pts in itertools.islice(video.seek(2), 10):
- frames.append(frame)
-
-print("Total number of frames: ", len(frames))
-
-######################################
-# Or if we wanted to read from 2nd to 5th second,
-# We seek into a second second of the video,
-# then we utilize the itertools takewhile to get the
-# correct number of frames:
-
-video.set_current_stream("video")
-frames = [] # we are going to save the frames here.
-video = video.seek(2)
-
-for frame in itertools.takewhile(lambda x: x['pts'] <= 5, video):
- frames.append(frame['data'])
-
-print("Total number of frames: ", len(frames))
-approx_nf = (5 - 2) * video.get_metadata()['video']['fps'][0]
-print("We can expect approx: ", approx_nf)
-print("Tensor size: ", frames[0].size())
-
-####################################
-# 2. Building a sample read_video function
-# ----------------------------------------------------------------------------------------
-# We can utilize the methods above to build the read video function that follows
-# the same API to the existing ``read_video`` function.
-
-
-def example_read_video(video_object, start=0, end=None, read_video=True, read_audio=True):
- if end is None:
- end = float("inf")
- if end < start:
- raise ValueError(
- "end time should be larger than start time, got "
- "start time={} and end time={}".format(start, end)
- )
-
- video_frames = torch.empty(0)
- video_pts = []
- if read_video:
- video_object.set_current_stream("video")
- frames = []
- for frame in itertools.takewhile(lambda x: x['pts'] <= end, video_object.seek(start)):
- frames.append(frame['data'])
- video_pts.append(frame['pts'])
- if len(frames) > 0:
- video_frames = torch.stack(frames, 0)
-
- audio_frames = torch.empty(0)
- audio_pts = []
- if read_audio:
- video_object.set_current_stream("audio")
- frames = []
- for frame in itertools.takewhile(lambda x: x['pts'] <= end, video_object.seek(start)):
- frames.append(frame['data'])
- video_pts.append(frame['pts'])
- if len(frames) > 0:
- audio_frames = torch.cat(frames, 0)
-
- return video_frames, audio_frames, (video_pts, audio_pts), video_object.get_metadata()
-
-
-# Total number of frames should be 327 for video and 523264 datapoints for audio
-vf, af, info, meta = example_read_video(video)
-print(vf.size(), af.size())
-
-####################################
-# 3. Building an example randomly sampled dataset (can be applied to training dataest of kinetics400)
-# -------------------------------------------------------------------------------------------------------
-# Cool, so now we can use the same principle to make the sample dataset.
-# We suggest trying out iterable dataset for this purpose.
-# Here, we are going to build an example dataset that reads randomly selected 10 frames of video.
-
-####################################
-# Make sample dataset
-import os
-os.makedirs("./dataset", exist_ok=True)
-os.makedirs("./dataset/1", exist_ok=True)
-os.makedirs("./dataset/2", exist_ok=True)
-
-####################################
-# Download the videos
-from torchvision.datasets.utils import download_url
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/WUzgd7C1pWA.mp4?raw=true",
- "./dataset/1", "WUzgd7C1pWA.mp4"
-)
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/RATRACE_wave_f_nm_np1_fr_goo_37.avi?raw=true",
- "./dataset/1",
- "RATRACE_wave_f_nm_np1_fr_goo_37.avi"
-)
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/SOX5yA1l24A.mp4?raw=true",
- "./dataset/2",
- "SOX5yA1l24A.mp4"
-)
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/v_SoccerJuggling_g23_c01.avi?raw=true",
- "./dataset/2",
- "v_SoccerJuggling_g23_c01.avi"
-)
-download_url(
- "https://github.com/pytorch/vision/blob/main/test/assets/videos/v_SoccerJuggling_g24_c01.avi?raw=true",
- "./dataset/2",
- "v_SoccerJuggling_g24_c01.avi"
-)
-
-####################################
-# Housekeeping and utilities
-import os
-import random
-
-from torchvision.datasets.folder import make_dataset
-from torchvision import transforms as t
-
-
-def _find_classes(dir):
- classes = [d.name for d in os.scandir(dir) if d.is_dir()]
- classes.sort()
- class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
- return classes, class_to_idx
-
-
-def get_samples(root, extensions=(".mp4", ".avi")):
- _, class_to_idx = _find_classes(root)
- return make_dataset(root, class_to_idx, extensions=extensions)
-
-####################################
-# We are going to define the dataset and some basic arguments.
-# We assume the structure of the FolderDataset, and add the following parameters:
-#
-# - ``clip_len``: length of a clip in frames
-# - ``frame_transform``: transform for every frame individually
-# - ``video_transform``: transform on a video sequence
-#
-# .. note::
-# We actually add epoch size as using :func:`~torch.utils.data.IterableDataset`
-# class allows us to naturally oversample clips or images from each video if needed.
-
-
-class RandomDataset(torch.utils.data.IterableDataset):
- def __init__(self, root, epoch_size=None, frame_transform=None, video_transform=None, clip_len=16):
- super(RandomDataset).__init__()
-
- self.samples = get_samples(root)
-
- # Allow for temporal jittering
- if epoch_size is None:
- epoch_size = len(self.samples)
- self.epoch_size = epoch_size
-
- self.clip_len = clip_len
- self.frame_transform = frame_transform
- self.video_transform = video_transform
-
- def __iter__(self):
- for i in range(self.epoch_size):
- # Get random sample
- path, target = random.choice(self.samples)
- # Get video object
- vid = torchvision.io.VideoReader(path, "video")
- metadata = vid.get_metadata()
- video_frames = [] # video frame buffer
-
- # Seek and return frames
- max_seek = metadata["video"]['duration'][0] - (self.clip_len / metadata["video"]['fps'][0])
- start = random.uniform(0., max_seek)
- for frame in itertools.islice(vid.seek(start), self.clip_len):
- video_frames.append(self.frame_transform(frame['data']))
- current_pts = frame['pts']
- # Stack it into a tensor
- video = torch.stack(video_frames, 0)
- if self.video_transform:
- video = self.video_transform(video)
- output = {
- 'path': path,
- 'video': video,
- 'target': target,
- 'start': start,
- 'end': current_pts}
- yield output
-
-####################################
-# Given a path of videos in a folder structure, i.e:
-#
-# - dataset
-# - class 1
-# - file 0
-# - file 1
-# - ...
-# - class 2
-# - file 0
-# - file 1
-# - ...
-# - ...
-#
-# We can generate a dataloader and test the dataset.
-
-
-transforms = [t.Resize((112, 112))]
-frame_transform = t.Compose(transforms)
-
-dataset = RandomDataset("./dataset", epoch_size=None, frame_transform=frame_transform)
-
-####################################
-from torch.utils.data import DataLoader
-loader = DataLoader(dataset, batch_size=12)
-data = {"video": [], 'start': [], 'end': [], 'tensorsize': []}
-for batch in loader:
- for i in range(len(batch['path'])):
- data['video'].append(batch['path'][i])
- data['start'].append(batch['start'][i].item())
- data['end'].append(batch['end'][i].item())
- data['tensorsize'].append(batch['video'][i].size())
-print(data)
-
-####################################
-# 4. Data Visualization
-# ----------------------------------
-# Example of visualized video
-
-import matplotlib.pylab as plt
-
-plt.figure(figsize=(12, 12))
-for i in range(16):
- plt.subplot(4, 4, i + 1)
- plt.imshow(batch["video"][0, i, ...].permute(1, 2, 0))
- plt.axis("off")
-
-####################################
-# Cleanup the video and dataset:
-import os
-import shutil
-os.remove("./WUzgd7C1pWA.mp4")
-shutil.rmtree("./dataset")
diff --git a/0.11./_downloads/2fc879ef12ea97750926a04c0a48c66b/plot_transforms.py b/0.11./_downloads/2fc879ef12ea97750926a04c0a48c66b/plot_transforms.py
deleted file mode 100644
index ab0cb892b16..00000000000
--- a/0.11./_downloads/2fc879ef12ea97750926a04c0a48c66b/plot_transforms.py
+++ /dev/null
@@ -1,300 +0,0 @@
-"""
-==========================
-Illustration of transforms
-==========================
-
-This example illustrates the various transforms available in :ref:`the
-torchvision.transforms module `.
-"""
-
-# sphinx_gallery_thumbnail_path = "../../gallery/assets/transforms_thumbnail.png"
-
-from PIL import Image
-from pathlib import Path
-import matplotlib.pyplot as plt
-import numpy as np
-
-import torch
-import torchvision.transforms as T
-
-
-plt.rcParams["savefig.bbox"] = 'tight'
-orig_img = Image.open(Path('assets') / 'astronaut.jpg')
-# if you change the seed, make sure that the randomly-applied transforms
-# properly show that the image can be both transformed and *not* transformed!
-torch.manual_seed(0)
-
-
-def plot(imgs, with_orig=True, row_title=None, **imshow_kwargs):
- if not isinstance(imgs[0], list):
- # Make a 2d grid even if there's just 1 row
- imgs = [imgs]
-
- num_rows = len(imgs)
- num_cols = len(imgs[0]) + with_orig
- fig, axs = plt.subplots(nrows=num_rows, ncols=num_cols, squeeze=False)
- for row_idx, row in enumerate(imgs):
- row = [orig_img] + row if with_orig else row
- for col_idx, img in enumerate(row):
- ax = axs[row_idx, col_idx]
- ax.imshow(np.asarray(img), **imshow_kwargs)
- ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
-
- if with_orig:
- axs[0, 0].set(title='Original image')
- axs[0, 0].title.set_size(8)
- if row_title is not None:
- for row_idx in range(num_rows):
- axs[row_idx, 0].set(ylabel=row_title[row_idx])
-
- plt.tight_layout()
-
-
-####################################
-# Pad
-# ---
-# The :class:`~torchvision.transforms.Pad` transform
-# (see also :func:`~torchvision.transforms.functional.pad`)
-# fills image borders with some pixel values.
-padded_imgs = [T.Pad(padding=padding)(orig_img) for padding in (3, 10, 30, 50)]
-plot(padded_imgs)
-
-####################################
-# Resize
-# ------
-# The :class:`~torchvision.transforms.Resize` transform
-# (see also :func:`~torchvision.transforms.functional.resize`)
-# resizes an image.
-resized_imgs = [T.Resize(size=size)(orig_img) for size in (30, 50, 100, orig_img.size)]
-plot(resized_imgs)
-
-####################################
-# CenterCrop
-# ----------
-# The :class:`~torchvision.transforms.CenterCrop` transform
-# (see also :func:`~torchvision.transforms.functional.center_crop`)
-# crops the given image at the center.
-center_crops = [T.CenterCrop(size=size)(orig_img) for size in (30, 50, 100, orig_img.size)]
-plot(center_crops)
-
-####################################
-# FiveCrop
-# --------
-# The :class:`~torchvision.transforms.FiveCrop` transform
-# (see also :func:`~torchvision.transforms.functional.five_crop`)
-# crops the given image into four corners and the central crop.
-(top_left, top_right, bottom_left, bottom_right, center) = T.FiveCrop(size=(100, 100))(orig_img)
-plot([top_left, top_right, bottom_left, bottom_right, center])
-
-####################################
-# Grayscale
-# ---------
-# The :class:`~torchvision.transforms.Grayscale` transform
-# (see also :func:`~torchvision.transforms.functional.to_grayscale`)
-# converts an image to grayscale
-gray_img = T.Grayscale()(orig_img)
-plot([gray_img], cmap='gray')
-
-####################################
-# Random transforms
-# -----------------
-# The following transforms are random, which means that the same transfomer
-# instance will produce different result each time it transforms a given image.
-#
-# ColorJitter
-# ~~~~~~~~~~~
-# The :class:`~torchvision.transforms.ColorJitter` transform
-# randomly changes the brightness, saturation, and other properties of an image.
-jitter = T.ColorJitter(brightness=.5, hue=.3)
-jitted_imgs = [jitter(orig_img) for _ in range(4)]
-plot(jitted_imgs)
-
-####################################
-# GaussianBlur
-# ~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.GaussianBlur` transform
-# (see also :func:`~torchvision.transforms.functional.gaussian_blur`)
-# performs gaussian blur transform on an image.
-blurrer = T.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5))
-blurred_imgs = [blurrer(orig_img) for _ in range(4)]
-plot(blurred_imgs)
-
-####################################
-# RandomPerspective
-# ~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomPerspective` transform
-# (see also :func:`~torchvision.transforms.functional.perspective`)
-# performs random perspective transform on an image.
-perspective_transformer = T.RandomPerspective(distortion_scale=0.6, p=1.0)
-perspective_imgs = [perspective_transformer(orig_img) for _ in range(4)]
-plot(perspective_imgs)
-
-####################################
-# RandomRotation
-# ~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomRotation` transform
-# (see also :func:`~torchvision.transforms.functional.rotate`)
-# rotates an image with random angle.
-rotater = T.RandomRotation(degrees=(0, 180))
-rotated_imgs = [rotater(orig_img) for _ in range(4)]
-plot(rotated_imgs)
-
-####################################
-# RandomAffine
-# ~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomAffine` transform
-# (see also :func:`~torchvision.transforms.functional.affine`)
-# performs random affine transform on an image.
-affine_transfomer = T.RandomAffine(degrees=(30, 70), translate=(0.1, 0.3), scale=(0.5, 0.75))
-affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
-plot(affine_imgs)
-
-####################################
-# RandomCrop
-# ~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomCrop` transform
-# (see also :func:`~torchvision.transforms.functional.crop`)
-# crops an image at a random location.
-cropper = T.RandomCrop(size=(128, 128))
-crops = [cropper(orig_img) for _ in range(4)]
-plot(crops)
-
-####################################
-# RandomResizedCrop
-# ~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomResizedCrop` transform
-# (see also :func:`~torchvision.transforms.functional.resized_crop`)
-# crops an image at a random location, and then resizes the crop to a given
-# size.
-resize_cropper = T.RandomResizedCrop(size=(32, 32))
-resized_crops = [resize_cropper(orig_img) for _ in range(4)]
-plot(resized_crops)
-
-####################################
-# RandomInvert
-# ~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomInvert` transform
-# (see also :func:`~torchvision.transforms.functional.invert`)
-# randomly inverts the colors of the given image.
-inverter = T.RandomInvert()
-invertered_imgs = [inverter(orig_img) for _ in range(4)]
-plot(invertered_imgs)
-
-####################################
-# RandomPosterize
-# ~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomPosterize` transform
-# (see also :func:`~torchvision.transforms.functional.posterize`)
-# randomly posterizes the image by reducing the number of bits
-# of each color channel.
-posterizer = T.RandomPosterize(bits=2)
-posterized_imgs = [posterizer(orig_img) for _ in range(4)]
-plot(posterized_imgs)
-
-####################################
-# RandomSolarize
-# ~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomSolarize` transform
-# (see also :func:`~torchvision.transforms.functional.solarize`)
-# randomly solarizes the image by inverting all pixel values above
-# the threshold.
-solarizer = T.RandomSolarize(threshold=192.0)
-solarized_imgs = [solarizer(orig_img) for _ in range(4)]
-plot(solarized_imgs)
-
-####################################
-# RandomAdjustSharpness
-# ~~~~~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomAdjustSharpness` transform
-# (see also :func:`~torchvision.transforms.functional.adjust_sharpness`)
-# randomly adjusts the sharpness of the given image.
-sharpness_adjuster = T.RandomAdjustSharpness(sharpness_factor=2)
-sharpened_imgs = [sharpness_adjuster(orig_img) for _ in range(4)]
-plot(sharpened_imgs)
-
-####################################
-# RandomAutocontrast
-# ~~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomAutocontrast` transform
-# (see also :func:`~torchvision.transforms.functional.autocontrast`)
-# randomly applies autocontrast to the given image.
-autocontraster = T.RandomAutocontrast()
-autocontrasted_imgs = [autocontraster(orig_img) for _ in range(4)]
-plot(autocontrasted_imgs)
-
-####################################
-# RandomEqualize
-# ~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomEqualize` transform
-# (see also :func:`~torchvision.transforms.functional.equalize`)
-# randomly equalizes the histogram of the given image.
-equalizer = T.RandomEqualize()
-equalized_imgs = [equalizer(orig_img) for _ in range(4)]
-plot(equalized_imgs)
-
-####################################
-# AutoAugment
-# ~~~~~~~~~~~
-# The :class:`~torchvision.transforms.AutoAugment` transform
-# automatically augments data based on a given auto-augmentation policy.
-# See :class:`~torchvision.transforms.AutoAugmentPolicy` for the available policies.
-policies = [T.AutoAugmentPolicy.CIFAR10, T.AutoAugmentPolicy.IMAGENET, T.AutoAugmentPolicy.SVHN]
-augmenters = [T.AutoAugment(policy) for policy in policies]
-imgs = [
- [augmenter(orig_img) for _ in range(4)]
- for augmenter in augmenters
-]
-row_title = [str(policy).split('.')[-1] for policy in policies]
-plot(imgs, row_title=row_title)
-
-####################################
-# RandAugment
-# ~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandAugment` transform automatically augments the data.
-augmenter = T.RandAugment()
-imgs = [augmenter(orig_img) for _ in range(4)]
-plot(imgs)
-
-####################################
-# TrivialAugmentWide
-# ~~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.TrivialAugmentWide` transform automatically augments the data.
-augmenter = T.TrivialAugmentWide()
-imgs = [augmenter(orig_img) for _ in range(4)]
-plot(imgs)
-
-####################################
-# Randomly-applied transforms
-# ---------------------------
-#
-# Some transforms are randomly-applied given a probability ``p``. That is, the
-# transformed image may actually be the same as the original one, even when
-# called with the same transformer instance!
-#
-# RandomHorizontalFlip
-# ~~~~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomHorizontalFlip` transform
-# (see also :func:`~torchvision.transforms.functional.hflip`)
-# performs horizontal flip of an image, with a given probability.
-hflipper = T.RandomHorizontalFlip(p=0.5)
-transformed_imgs = [hflipper(orig_img) for _ in range(4)]
-plot(transformed_imgs)
-
-####################################
-# RandomVerticalFlip
-# ~~~~~~~~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomVerticalFlip` transform
-# (see also :func:`~torchvision.transforms.functional.vflip`)
-# performs vertical flip of an image, with a given probability.
-vflipper = T.RandomVerticalFlip(p=0.5)
-transformed_imgs = [vflipper(orig_img) for _ in range(4)]
-plot(transformed_imgs)
-
-####################################
-# RandomApply
-# ~~~~~~~~~~~
-# The :class:`~torchvision.transforms.RandomApply` transform
-# randomly applies a list of transforms, with a given probability.
-applier = T.RandomApply(transforms=[T.RandomCrop(size=(64, 64))], p=0.5)
-transformed_imgs = [applier(orig_img) for _ in range(4)]
-plot(transformed_imgs)
diff --git a/0.11./_downloads/44cefcbc2110528a73124d64db3315fc/plot_visualization_utils.ipynb b/0.11./_downloads/44cefcbc2110528a73124d64db3315fc/plot_visualization_utils.ipynb
deleted file mode 100644
index 8d2a72c66aa..00000000000
--- a/0.11./_downloads/44cefcbc2110528a73124d64db3315fc/plot_visualization_utils.ipynb
+++ /dev/null
@@ -1,349 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n# Visualization utilities\n\nThis example illustrates some of the utilities that torchvision offers for\nvisualizing images, bounding boxes, and segmentation masks.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "# sphinx_gallery_thumbnail_path = \"../../gallery/assets/visualization_utils_thumbnail.png\"\n\nimport torch\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nimport torchvision.transforms.functional as F\n\n\nplt.rcParams[\"savefig.bbox\"] = 'tight'\n\n\ndef show(imgs):\n if not isinstance(imgs, list):\n imgs = [imgs]\n fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n for i, img in enumerate(imgs):\n img = img.detach()\n img = F.to_pil_image(img)\n axs[0, i].imshow(np.asarray(img))\n axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Visualizing a grid of images\nThe :func:`~torchvision.utils.make_grid` function can be used to create a\ntensor that represents multiple images in a grid. This util requires a single\nimage of dtype ``uint8`` as input.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.utils import make_grid\nfrom torchvision.io import read_image\nfrom pathlib import Path\n\ndog1_int = read_image(str(Path('assets') / 'dog1.jpg'))\ndog2_int = read_image(str(Path('assets') / 'dog2.jpg'))\n\ngrid = make_grid([dog1_int, dog2_int, dog1_int, dog2_int])\nshow(grid)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Visualizing bounding boxes\nWe can use :func:`~torchvision.utils.draw_bounding_boxes` to draw boxes on an\nimage. We can set the colors, labels, width as well as font and font size.\nThe boxes are in ``(xmin, ymin, xmax, ymax)`` format.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.utils import draw_bounding_boxes\n\n\nboxes = torch.tensor([[50, 50, 100, 200], [210, 150, 350, 430]], dtype=torch.float)\ncolors = [\"blue\", \"yellow\"]\nresult = draw_bounding_boxes(dog1_int, boxes, colors=colors, width=5)\nshow(result)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Naturally, we can also plot bounding boxes produced by torchvision detection\nmodels. Here is demo with a Faster R-CNN model loaded from\n:func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`\nmodel. You can also try using a RetinaNet with\n:func:`~torchvision.models.detection.retinanet_resnet50_fpn`, an SSDlite with\n:func:`~torchvision.models.detection.ssdlite320_mobilenet_v3_large` or an SSD with\n:func:`~torchvision.models.detection.ssd300_vgg16`. For more details\non the output of such models, you may refer to `instance_seg_output`.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.models.detection import fasterrcnn_resnet50_fpn\nfrom torchvision.transforms.functional import convert_image_dtype\n\n\nbatch_int = torch.stack([dog1_int, dog2_int])\nbatch = convert_image_dtype(batch_int, dtype=torch.float)\n\nmodel = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)\nmodel = model.eval()\n\noutputs = model(batch)\nprint(outputs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Let's plot the boxes detected by our model. We will only plot the boxes with a\nscore greater than a given threshold.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "score_threshold = .8\ndogs_with_boxes = [\n draw_bounding_boxes(dog_int, boxes=output['boxes'][output['scores'] > score_threshold], width=4)\n for dog_int, output in zip(batch_int, outputs)\n]\nshow(dogs_with_boxes)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Visualizing segmentation masks\nThe :func:`~torchvision.utils.draw_segmentation_masks` function can be used to\ndraw segmentation masks on images. Semantic segmentation and instance\nsegmentation models have different outputs, so we will treat each\nindependently.\n\n\n### Semantic segmentation models\n\nWe will see how to use it with torchvision's FCN Resnet-50, loaded with\n:func:`~torchvision.models.segmentation.fcn_resnet50`. You can also try using\nDeepLabv3 (:func:`~torchvision.models.segmentation.deeplabv3_resnet50`) or\nlraspp mobilenet models\n(:func:`~torchvision.models.segmentation.lraspp_mobilenet_v3_large`).\n\nLet's start by looking at the ouput of the model. Remember that in general,\nimages must be normalized before they're passed to a semantic segmentation\nmodel.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.models.segmentation import fcn_resnet50\n\n\nmodel = fcn_resnet50(pretrained=True, progress=False)\nmodel = model.eval()\n\nnormalized_batch = F.normalize(batch, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))\noutput = model(normalized_batch)['out']\nprint(output.shape, output.min().item(), output.max().item())"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "As we can see above, the output of the segmentation model is a tensor of shape\n``(batch_size, num_classes, H, W)``. Each value is a non-normalized score, and\nwe can normalize them into ``[0, 1]`` by using a softmax. After the softmax,\nwe can interpret each value as a probability indicating how likely a given\npixel is to belong to a given class.\n\nLet's plot the masks that have been detected for the dog class and for the\nboat class:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "sem_classes = [\n '__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',\n 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',\n 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'\n]\nsem_class_to_idx = {cls: idx for (idx, cls) in enumerate(sem_classes)}\n\nnormalized_masks = torch.nn.functional.softmax(output, dim=1)\n\ndog_and_boat_masks = [\n normalized_masks[img_idx, sem_class_to_idx[cls]]\n for img_idx in range(batch.shape[0])\n for cls in ('dog', 'boat')\n]\n\nshow(dog_and_boat_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "As expected, the model is confident about the dog class, but not so much for\nthe boat class.\n\nThe :func:`~torchvision.utils.draw_segmentation_masks` function can be used to\nplots those masks on top of the original image. This function expects the\nmasks to be boolean masks, but our masks above contain probabilities in ``[0,\n1]``. To get boolean masks, we can do the following:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "class_dim = 1\nboolean_dog_masks = (normalized_masks.argmax(class_dim) == sem_class_to_idx['dog'])\nprint(f\"shape = {boolean_dog_masks.shape}, dtype = {boolean_dog_masks.dtype}\")\nshow([m.float() for m in boolean_dog_masks])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The line above where we define ``boolean_dog_masks`` is a bit cryptic, but you\ncan read it as the following query: \"For which pixels is 'dog' the most likely\nclass?\"\n\n
Note
While we're using the ``normalized_masks`` here, we would have\n gotten the same result by using the non-normalized scores of the model\n directly (as the softmax operation preserves the order).
\n\nNow that we have boolean masks, we can use them with\n:func:`~torchvision.utils.draw_segmentation_masks` to plot them on top of the\noriginal images:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.utils import draw_segmentation_masks\n\ndogs_with_masks = [\n draw_segmentation_masks(img, masks=mask, alpha=0.7)\n for img, mask in zip(batch_int, boolean_dog_masks)\n]\nshow(dogs_with_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We can plot more than one mask per image! Remember that the model returned as\nmany masks as there are classes. Let's ask the same query as above, but this\ntime for *all* classes, not just the dog class: \"For each pixel and each class\nC, is class C the most most likely class?\"\n\nThis one is a bit more involved, so we'll first show how to do it with a\nsingle image, and then we'll generalize to the batch\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "num_classes = normalized_masks.shape[1]\ndog1_masks = normalized_masks[0]\nclass_dim = 0\ndog1_all_classes_masks = dog1_masks.argmax(class_dim) == torch.arange(num_classes)[:, None, None]\n\nprint(f\"dog1_masks shape = {dog1_masks.shape}, dtype = {dog1_masks.dtype}\")\nprint(f\"dog1_all_classes_masks = {dog1_all_classes_masks.shape}, dtype = {dog1_all_classes_masks.dtype}\")\n\ndog_with_all_masks = draw_segmentation_masks(dog1_int, masks=dog1_all_classes_masks, alpha=.6)\nshow(dog_with_all_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We can see in the image above that only 2 masks were drawn: the mask for the\nbackground and the mask for the dog. This is because the model thinks that\nonly these 2 classes are the most likely ones across all the pixels. If the\nmodel had detected another class as the most likely among other pixels, we\nwould have seen its mask above.\n\nRemoving the background mask is as simple as passing\n``masks=dog1_all_classes_masks[1:]``, because the background class is the\nclass with index 0.\n\nLet's now do the same but for an entire batch of images. The code is similar\nbut involves a bit more juggling with the dimensions.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "class_dim = 1\nall_classes_masks = normalized_masks.argmax(class_dim) == torch.arange(num_classes)[:, None, None, None]\nprint(f\"shape = {all_classes_masks.shape}, dtype = {all_classes_masks.dtype}\")\n# The first dimension is the classes now, so we need to swap it\nall_classes_masks = all_classes_masks.swapaxes(0, 1)\n\ndogs_with_masks = [\n draw_segmentation_masks(img, masks=mask, alpha=.6)\n for img, mask in zip(batch_int, all_classes_masks)\n]\nshow(dogs_with_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n### Instance segmentation models\n\nInstance segmentation models have a significantly different output from the\nsemantic segmentation models. We will see here how to plot the masks for such\nmodels. Let's start by analyzing the output of a Mask-RCNN model. Note that\nthese models don't require the images to be normalized, so we don't need to\nuse the normalized batch.\n\n
Note
We will here describe the output of a Mask-RCNN model. The models in\n `object_det_inst_seg_pers_keypoint_det` all have a similar output\n format, but some of them may have extra info like keypoints for\n :func:`~torchvision.models.detection.keypointrcnn_resnet50_fpn`, and some\n of them may not have masks, like\n :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`.
\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.models.detection import maskrcnn_resnet50_fpn\nmodel = maskrcnn_resnet50_fpn(pretrained=True, progress=False)\nmodel = model.eval()\n\noutput = model(batch)\nprint(output)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Let's break this down. For each image in the batch, the model outputs some\ndetections (or instances). The number of detections varies for each input\nimage. Each instance is described by its bounding box, its label, its score\nand its mask.\n\nThe way the output is organized is as follows: the output is a list of length\n``batch_size``. Each entry in the list corresponds to an input image, and it\nis a dict with keys 'boxes', 'labels', 'scores', and 'masks'. Each value\nassociated to those keys has ``num_instances`` elements in it. In our case\nabove there are 3 instances detected in the first image, and 2 instances in\nthe second one.\n\nThe boxes can be plotted with :func:`~torchvision.utils.draw_bounding_boxes`\nas above, but here we're more interested in the masks. These masks are quite\ndifferent from the masks that we saw above for the semantic segmentation\nmodels.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "dog1_output = output[0]\ndog1_masks = dog1_output['masks']\nprint(f\"shape = {dog1_masks.shape}, dtype = {dog1_masks.dtype}, \"\n f\"min = {dog1_masks.min()}, max = {dog1_masks.max()}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Here the masks corresponds to probabilities indicating, for each pixel, how\nlikely it is to belong to the predicted label of that instance. Those\npredicted labels correspond to the 'labels' element in the same output dict.\nLet's see which labels were predicted for the instances of the first image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "inst_classes = [\n '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',\n 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',\n 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\n 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',\n 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',\n 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',\n 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',\n 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',\n 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',\n 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',\n 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',\n 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'\n]\n\ninst_class_to_idx = {cls: idx for (idx, cls) in enumerate(inst_classes)}\n\nprint(\"For the first dog, the following instances were detected:\")\nprint([inst_classes[label] for label in dog1_output['labels']])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Interestingly, the model detects two persons in the image. Let's go ahead and\nplot those masks. Since :func:`~torchvision.utils.draw_segmentation_masks`\nexpects boolean masks, we need to convert those probabilities into boolean\nvalues. Remember that the semantic of those masks is \"How likely is this pixel\nto belong to the predicted class?\". As a result, a natural way of converting\nthose masks into boolean values is to threshold them with the 0.5 probability\n(one could also choose a different threshold).\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "proba_threshold = 0.5\ndog1_bool_masks = dog1_output['masks'] > proba_threshold\nprint(f\"shape = {dog1_bool_masks.shape}, dtype = {dog1_bool_masks.dtype}\")\n\n# There's an extra dimension (1) to the masks. We need to remove it\ndog1_bool_masks = dog1_bool_masks.squeeze(1)\n\nshow(draw_segmentation_masks(dog1_int, dog1_bool_masks, alpha=0.9))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The model seems to have properly detected the dog, but it also confused trees\nwith people. Looking more closely at the scores will help us plotting more\nrelevant masks:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "print(dog1_output['scores'])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Clearly the model is more confident about the dog detection than it is about\nthe people detections. That's good news. When plotting the masks, we can ask\nfor only those that have a good score. Let's use a score threshold of .75\nhere, and also plot the masks of the second dog.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "score_threshold = .75\n\nboolean_masks = [\n out['masks'][out['scores'] > score_threshold] > proba_threshold\n for out in output\n]\n\ndogs_with_masks = [\n draw_segmentation_masks(img, mask.squeeze(1))\n for img, mask in zip(batch_int, boolean_masks)\n]\nshow(dogs_with_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The two 'people' masks in the first image where not selected because they have\na lower score than the score threshold. Similarly in the second image, the\ninstance with class 15 (which corresponds to 'bench') was not selected.\n\n"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
\ No newline at end of file
diff --git a/0.11./_downloads/64bb9b01863bd76675f57b9a9a8e6229/plot_repurposing_annotations.ipynb b/0.11./_downloads/64bb9b01863bd76675f57b9a9a8e6229/plot_repurposing_annotations.ipynb
deleted file mode 100644
index e25b354b777..00000000000
--- a/0.11./_downloads/64bb9b01863bd76675f57b9a9a8e6229/plot_repurposing_annotations.ipynb
+++ /dev/null
@@ -1,216 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n# Repurposing masks into bounding boxes\n\nThe following example illustrates the operations available\nthe `torchvision.ops ` module for repurposing\nsegmentation masks into object localization annotations for different tasks\n(e.g. transforming masks used by instance and panoptic segmentation\nmethods into bounding boxes used by object detection methods).\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "# sphinx_gallery_thumbnail_path = \"../../gallery/assets/repurposing_annotations_thumbnail.png\"\n\nimport os\nimport numpy as np\nimport torch\nimport matplotlib.pyplot as plt\n\nimport torchvision.transforms.functional as F\n\n\nASSETS_DIRECTORY = \"assets\"\n\nplt.rcParams[\"savefig.bbox\"] = \"tight\"\n\n\ndef show(imgs):\n if not isinstance(imgs, list):\n imgs = [imgs]\n fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n for i, img in enumerate(imgs):\n img = img.detach()\n img = F.to_pil_image(img)\n axs[0, i].imshow(np.asarray(img))\n axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Masks\nIn tasks like instance and panoptic segmentation, masks are commonly defined, and are defined by this package,\nas a multi-dimensional array (e.g. a NumPy array or a PyTorch tensor) with the following shape:\n\n (num_objects, height, width)\n\nWhere num_objects is the number of annotated objects in the image. Each (height, width) object corresponds to exactly\none object. For example, if your input image has the dimensions 224 x 224 and has four annotated objects the shape\nof your masks annotation has the following shape:\n\n (4, 224, 224).\n\nA nice property of masks is that they can be easily repurposed to be used in methods to solve a variety of object\nlocalization tasks.\n\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Converting Masks to Bounding Boxes\nFor example, the :func:`~torchvision.ops.masks_to_boxes` operation can be used to\ntransform masks into bounding boxes that can be\nused as input to detection models such as FasterRCNN and RetinaNet.\nWe will take images and masks from the `PenFudan Dataset `_.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.io import read_image\n\nimg_path = os.path.join(ASSETS_DIRECTORY, \"FudanPed00054.png\")\nmask_path = os.path.join(ASSETS_DIRECTORY, \"FudanPed00054_mask.png\")\nimg = read_image(img_path)\nmask = read_image(mask_path)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Here the masks are represented as a PNG Image, with floating point values.\nEach pixel is encoded as different colors, with 0 being background.\nNotice that the spatial dimensions of image and mask match.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "print(mask.size())\nprint(img.size())\nprint(mask)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "# We get the unique colors, as these would be the object ids.\nobj_ids = torch.unique(mask)\n\n# first id is the background, so remove it.\nobj_ids = obj_ids[1:]\n\n# split the color-encoded mask into a set of boolean masks.\n# Note that this snippet would work as well if the masks were float values instead of ints.\nmasks = mask == obj_ids[:, None, None]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Now the masks are a boolean tensor.\nThe first dimension in this case 3 and denotes the number of instances: there are 3 people in the image.\nThe other two dimensions are height and width, which are equal to the dimensions of the image.\nFor each instance, the boolean tensors represent if the particular pixel\nbelongs to the segmentation mask of the image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "print(masks.size())\nprint(masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Let us visualize an image and plot its corresponding segmentation masks.\nWe will use the :func:`~torchvision.utils.draw_segmentation_masks` to draw the segmentation masks.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.utils import draw_segmentation_masks\n\ndrawn_masks = []\nfor mask in masks:\n drawn_masks.append(draw_segmentation_masks(img, mask, alpha=0.8, colors=\"blue\"))\n\nshow(drawn_masks)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To convert the boolean masks into bounding boxes.\nWe will use the :func:`~torchvision.ops.masks_to_boxes` from the torchvision.ops module\nIt returns the boxes in ``(xmin, ymin, xmax, ymax)`` format.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.ops import masks_to_boxes\n\nboxes = masks_to_boxes(masks)\nprint(boxes.size())\nprint(boxes)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "As the shape denotes, there are 3 boxes and in ``(xmin, ymin, xmax, ymax)`` format.\nThese can be visualized very easily with :func:`~torchvision.utils.draw_bounding_boxes` utility\nprovided in `torchvision.utils `.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.utils import draw_bounding_boxes\n\ndrawn_boxes = draw_bounding_boxes(img, boxes, colors=\"red\")\nshow(drawn_boxes)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "These boxes can now directly be used by detection models in torchvision.\nHere is demo with a Faster R-CNN model loaded from\n:func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.models.detection import fasterrcnn_resnet50_fpn\n\nmodel = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)\nprint(img.size())\n\nimg = F.convert_image_dtype(img, torch.float)\ntarget = {}\ntarget[\"boxes\"] = boxes\ntarget[\"labels\"] = labels = torch.ones((masks.size(0),), dtype=torch.int64)\ndetection_outputs = model(img.unsqueeze(0), [target])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Converting Segmentation Dataset to Detection Dataset\n\nWith this utility it becomes very simple to convert a segmentation dataset to a detection dataset.\nWith this we can now use a segmentation dataset to train a detection model.\nOne can similarly convert panoptic dataset to detection dataset.\nHere is an example where we re-purpose the dataset from the\n`PenFudan Detection Tutorial `_.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "class SegmentationToDetectionDataset(torch.utils.data.Dataset):\n def __init__(self, root, transforms):\n self.root = root\n self.transforms = transforms\n # load all image files, sorting them to\n # ensure that they are aligned\n self.imgs = list(sorted(os.listdir(os.path.join(root, \"PNGImages\"))))\n self.masks = list(sorted(os.listdir(os.path.join(root, \"PedMasks\"))))\n\n def __getitem__(self, idx):\n # load images and masks\n img_path = os.path.join(self.root, \"PNGImages\", self.imgs[idx])\n mask_path = os.path.join(self.root, \"PedMasks\", self.masks[idx])\n\n img = read_image(img_path)\n mask = read_image(mask_path)\n\n img = F.convert_image_dtype(img, dtype=torch.float)\n mask = F.convert_image_dtype(mask, dtype=torch.float)\n\n # We get the unique colors, as these would be the object ids.\n obj_ids = torch.unique(mask)\n\n # first id is the background, so remove it.\n obj_ids = obj_ids[1:]\n\n # split the color-encoded mask into a set of boolean masks.\n masks = mask == obj_ids[:, None, None]\n\n boxes = masks_to_boxes(masks)\n\n # there is only one class\n labels = torch.ones((masks.shape[0],), dtype=torch.int64)\n\n target = {}\n target[\"boxes\"] = boxes\n target[\"labels\"] = labels\n\n if self.transforms is not None:\n img, target = self.transforms(img, target)\n\n return img, target"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
\ No newline at end of file
diff --git a/0.11./_downloads/6a01ac4f9248f75d7d59fd4b8e9147f5/plot_scripted_tensor_transforms.py b/0.11./_downloads/6a01ac4f9248f75d7d59fd4b8e9147f5/plot_scripted_tensor_transforms.py
deleted file mode 100644
index 6f3cc22073e..00000000000
--- a/0.11./_downloads/6a01ac4f9248f75d7d59fd4b8e9147f5/plot_scripted_tensor_transforms.py
+++ /dev/null
@@ -1,145 +0,0 @@
-"""
-=========================
-Tensor transforms and JIT
-=========================
-
-This example illustrates various features that are now supported by the
-:ref:`image transformations ` on Tensor images. In particular, we
-show how image transforms can be performed on GPU, and how one can also script
-them using JIT compilation.
-
-Prior to v0.8.0, transforms in torchvision have traditionally been PIL-centric
-and presented multiple limitations due to that. Now, since v0.8.0, transforms
-implementations are Tensor and PIL compatible and we can achieve the following
-new features:
-
-- transform multi-band torch tensor images (with more than 3-4 channels)
-- torchscript transforms together with your model for deployment
-- support for GPU acceleration
-- batched transformation such as for videos
-- read and decode data directly as torch tensor with torchscript support (for PNG and JPEG image formats)
-
-.. note::
- These features are only possible with **Tensor** images.
-"""
-
-from pathlib import Path
-
-import matplotlib.pyplot as plt
-import numpy as np
-
-import torch
-import torchvision.transforms as T
-from torchvision.io import read_image
-
-
-plt.rcParams["savefig.bbox"] = 'tight'
-torch.manual_seed(1)
-
-
-def show(imgs):
- fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)
- for i, img in enumerate(imgs):
- img = T.ToPILImage()(img.to('cpu'))
- axs[0, i].imshow(np.asarray(img))
- axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
-
-
-####################################
-# The :func:`~torchvision.io.read_image` function allows to read an image and
-# directly load it as a tensor
-
-dog1 = read_image(str(Path('assets') / 'dog1.jpg'))
-dog2 = read_image(str(Path('assets') / 'dog2.jpg'))
-show([dog1, dog2])
-
-####################################
-# Transforming images on GPU
-# --------------------------
-# Most transforms natively support tensors on top of PIL images (to visualize
-# the effect of the transforms, you may refer to see
-# :ref:`sphx_glr_auto_examples_plot_transforms.py`).
-# Using tensor images, we can run the transforms on GPUs if cuda is available!
-
-import torch.nn as nn
-
-transforms = torch.nn.Sequential(
- T.RandomCrop(224),
- T.RandomHorizontalFlip(p=0.3),
-)
-
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-dog1 = dog1.to(device)
-dog2 = dog2.to(device)
-
-transformed_dog1 = transforms(dog1)
-transformed_dog2 = transforms(dog2)
-show([transformed_dog1, transformed_dog2])
-
-####################################
-# Scriptable transforms for easier deployment via torchscript
-# -----------------------------------------------------------
-# We now show how to combine image transformations and a model forward pass,
-# while using ``torch.jit.script`` to obtain a single scripted module.
-#
-# Let's define a ``Predictor`` module that transforms the input tensor and then
-# applies an ImageNet model on it.
-
-from torchvision.models import resnet18
-
-
-class Predictor(nn.Module):
-
- def __init__(self):
- super().__init__()
- self.resnet18 = resnet18(pretrained=True, progress=False).eval()
- self.transforms = nn.Sequential(
- T.Resize([256, ]), # We use single int value inside a list due to torchscript type restrictions
- T.CenterCrop(224),
- T.ConvertImageDtype(torch.float),
- T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
- )
-
- def forward(self, x: torch.Tensor) -> torch.Tensor:
- with torch.no_grad():
- x = self.transforms(x)
- y_pred = self.resnet18(x)
- return y_pred.argmax(dim=1)
-
-
-####################################
-# Now, let's define scripted and non-scripted instances of ``Predictor`` and
-# apply it on multiple tensor images of the same size
-
-predictor = Predictor().to(device)
-scripted_predictor = torch.jit.script(predictor).to(device)
-
-batch = torch.stack([dog1, dog2]).to(device)
-
-res = predictor(batch)
-res_scripted = scripted_predictor(batch)
-
-####################################
-# We can verify that the prediction of the scripted and non-scripted models are
-# the same:
-
-import json
-
-with open(Path('assets') / 'imagenet_class_index.json', 'r') as labels_file:
- labels = json.load(labels_file)
-
-for i, (pred, pred_scripted) in enumerate(zip(res, res_scripted)):
- assert pred == pred_scripted
- print(f"Prediction for Dog {i + 1}: {labels[str(pred.item())]}")
-
-####################################
-# Since the model is scripted, it can be easily dumped on disk and re-used
-
-import tempfile
-
-with tempfile.NamedTemporaryFile() as f:
- scripted_predictor.save(f.name)
-
- dumped_scripted_predictor = torch.jit.load(f.name)
- res_scripted_dumped = dumped_scripted_predictor(batch)
-assert (res_scripted_dumped == res_scripted).all()
diff --git a/0.11./_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip b/0.11./_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
deleted file mode 100644
index edefd9291ca..00000000000
Binary files a/0.11./_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip and /dev/null differ
diff --git a/0.11./_downloads/82da64ed59815304068ce683aaf81dd9/plot_transforms.ipynb b/0.11./_downloads/82da64ed59815304068ce683aaf81dd9/plot_transforms.ipynb
deleted file mode 100644
index 578f469aff9..00000000000
--- a/0.11./_downloads/82da64ed59815304068ce683aaf81dd9/plot_transforms.ipynb
+++ /dev/null
@@ -1,486 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n# Illustration of transforms\n\nThis example illustrates the various transforms available in `the\ntorchvision.transforms module `.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "# sphinx_gallery_thumbnail_path = \"../../gallery/assets/transforms_thumbnail.png\"\n\nfrom PIL import Image\nfrom pathlib import Path\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport torch\nimport torchvision.transforms as T\n\n\nplt.rcParams[\"savefig.bbox\"] = 'tight'\norig_img = Image.open(Path('assets') / 'astronaut.jpg')\n# if you change the seed, make sure that the randomly-applied transforms\n# properly show that the image can be both transformed and *not* transformed!\ntorch.manual_seed(0)\n\n\ndef plot(imgs, with_orig=True, row_title=None, **imshow_kwargs):\n if not isinstance(imgs[0], list):\n # Make a 2d grid even if there's just 1 row\n imgs = [imgs]\n\n num_rows = len(imgs)\n num_cols = len(imgs[0]) + with_orig\n fig, axs = plt.subplots(nrows=num_rows, ncols=num_cols, squeeze=False)\n for row_idx, row in enumerate(imgs):\n row = [orig_img] + row if with_orig else row\n for col_idx, img in enumerate(row):\n ax = axs[row_idx, col_idx]\n ax.imshow(np.asarray(img), **imshow_kwargs)\n ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])\n\n if with_orig:\n axs[0, 0].set(title='Original image')\n axs[0, 0].title.set_size(8)\n if row_title is not None:\n for row_idx in range(num_rows):\n axs[row_idx, 0].set(ylabel=row_title[row_idx])\n\n plt.tight_layout()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Pad\nThe :class:`~torchvision.transforms.Pad` transform\n(see also :func:`~torchvision.transforms.functional.pad`)\nfills image borders with some pixel values.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "padded_imgs = [T.Pad(padding=padding)(orig_img) for padding in (3, 10, 30, 50)]\nplot(padded_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Resize\nThe :class:`~torchvision.transforms.Resize` transform\n(see also :func:`~torchvision.transforms.functional.resize`)\nresizes an image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "resized_imgs = [T.Resize(size=size)(orig_img) for size in (30, 50, 100, orig_img.size)]\nplot(resized_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## CenterCrop\nThe :class:`~torchvision.transforms.CenterCrop` transform\n(see also :func:`~torchvision.transforms.functional.center_crop`)\ncrops the given image at the center.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "center_crops = [T.CenterCrop(size=size)(orig_img) for size in (30, 50, 100, orig_img.size)]\nplot(center_crops)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## FiveCrop\nThe :class:`~torchvision.transforms.FiveCrop` transform\n(see also :func:`~torchvision.transforms.functional.five_crop`)\ncrops the given image into four corners and the central crop.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "(top_left, top_right, bottom_left, bottom_right, center) = T.FiveCrop(size=(100, 100))(orig_img)\nplot([top_left, top_right, bottom_left, bottom_right, center])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Grayscale\nThe :class:`~torchvision.transforms.Grayscale` transform\n(see also :func:`~torchvision.transforms.functional.to_grayscale`)\nconverts an image to grayscale\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "gray_img = T.Grayscale()(orig_img)\nplot([gray_img], cmap='gray')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Random transforms\nThe following transforms are random, which means that the same transfomer\ninstance will produce different result each time it transforms a given image.\n\n### ColorJitter\nThe :class:`~torchvision.transforms.ColorJitter` transform\nrandomly changes the brightness, saturation, and other properties of an image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "jitter = T.ColorJitter(brightness=.5, hue=.3)\njitted_imgs = [jitter(orig_img) for _ in range(4)]\nplot(jitted_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### GaussianBlur\nThe :class:`~torchvision.transforms.GaussianBlur` transform\n(see also :func:`~torchvision.transforms.functional.gaussian_blur`)\nperforms gaussian blur transform on an image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "blurrer = T.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5))\nblurred_imgs = [blurrer(orig_img) for _ in range(4)]\nplot(blurred_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomPerspective\nThe :class:`~torchvision.transforms.RandomPerspective` transform\n(see also :func:`~torchvision.transforms.functional.perspective`)\nperforms random perspective transform on an image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "perspective_transformer = T.RandomPerspective(distortion_scale=0.6, p=1.0)\nperspective_imgs = [perspective_transformer(orig_img) for _ in range(4)]\nplot(perspective_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomRotation\nThe :class:`~torchvision.transforms.RandomRotation` transform\n(see also :func:`~torchvision.transforms.functional.rotate`)\nrotates an image with random angle.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "rotater = T.RandomRotation(degrees=(0, 180))\nrotated_imgs = [rotater(orig_img) for _ in range(4)]\nplot(rotated_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomAffine\nThe :class:`~torchvision.transforms.RandomAffine` transform\n(see also :func:`~torchvision.transforms.functional.affine`)\nperforms random affine transform on an image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "affine_transfomer = T.RandomAffine(degrees=(30, 70), translate=(0.1, 0.3), scale=(0.5, 0.75))\naffine_imgs = [affine_transfomer(orig_img) for _ in range(4)]\nplot(affine_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomCrop\nThe :class:`~torchvision.transforms.RandomCrop` transform\n(see also :func:`~torchvision.transforms.functional.crop`)\ncrops an image at a random location.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "cropper = T.RandomCrop(size=(128, 128))\ncrops = [cropper(orig_img) for _ in range(4)]\nplot(crops)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomResizedCrop\nThe :class:`~torchvision.transforms.RandomResizedCrop` transform\n(see also :func:`~torchvision.transforms.functional.resized_crop`)\ncrops an image at a random location, and then resizes the crop to a given\nsize.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "resize_cropper = T.RandomResizedCrop(size=(32, 32))\nresized_crops = [resize_cropper(orig_img) for _ in range(4)]\nplot(resized_crops)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomInvert\nThe :class:`~torchvision.transforms.RandomInvert` transform\n(see also :func:`~torchvision.transforms.functional.invert`)\nrandomly inverts the colors of the given image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "inverter = T.RandomInvert()\ninvertered_imgs = [inverter(orig_img) for _ in range(4)]\nplot(invertered_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomPosterize\nThe :class:`~torchvision.transforms.RandomPosterize` transform\n(see also :func:`~torchvision.transforms.functional.posterize`)\nrandomly posterizes the image by reducing the number of bits\nof each color channel.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "posterizer = T.RandomPosterize(bits=2)\nposterized_imgs = [posterizer(orig_img) for _ in range(4)]\nplot(posterized_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomSolarize\nThe :class:`~torchvision.transforms.RandomSolarize` transform\n(see also :func:`~torchvision.transforms.functional.solarize`)\nrandomly solarizes the image by inverting all pixel values above\nthe threshold.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "solarizer = T.RandomSolarize(threshold=192.0)\nsolarized_imgs = [solarizer(orig_img) for _ in range(4)]\nplot(solarized_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomAdjustSharpness\nThe :class:`~torchvision.transforms.RandomAdjustSharpness` transform\n(see also :func:`~torchvision.transforms.functional.adjust_sharpness`)\nrandomly adjusts the sharpness of the given image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "sharpness_adjuster = T.RandomAdjustSharpness(sharpness_factor=2)\nsharpened_imgs = [sharpness_adjuster(orig_img) for _ in range(4)]\nplot(sharpened_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomAutocontrast\nThe :class:`~torchvision.transforms.RandomAutocontrast` transform\n(see also :func:`~torchvision.transforms.functional.autocontrast`)\nrandomly applies autocontrast to the given image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "autocontraster = T.RandomAutocontrast()\nautocontrasted_imgs = [autocontraster(orig_img) for _ in range(4)]\nplot(autocontrasted_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomEqualize\nThe :class:`~torchvision.transforms.RandomEqualize` transform\n(see also :func:`~torchvision.transforms.functional.equalize`)\nrandomly equalizes the histogram of the given image.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "equalizer = T.RandomEqualize()\nequalized_imgs = [equalizer(orig_img) for _ in range(4)]\nplot(equalized_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### AutoAugment\nThe :class:`~torchvision.transforms.AutoAugment` transform\nautomatically augments data based on a given auto-augmentation policy.\nSee :class:`~torchvision.transforms.AutoAugmentPolicy` for the available policies.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "policies = [T.AutoAugmentPolicy.CIFAR10, T.AutoAugmentPolicy.IMAGENET, T.AutoAugmentPolicy.SVHN]\naugmenters = [T.AutoAugment(policy) for policy in policies]\nimgs = [\n [augmenter(orig_img) for _ in range(4)]\n for augmenter in augmenters\n]\nrow_title = [str(policy).split('.')[-1] for policy in policies]\nplot(imgs, row_title=row_title)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandAugment\nThe :class:`~torchvision.transforms.RandAugment` transform automatically augments the data.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "augmenter = T.RandAugment()\nimgs = [augmenter(orig_img) for _ in range(4)]\nplot(imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### TrivialAugmentWide\nThe :class:`~torchvision.transforms.TrivialAugmentWide` transform automatically augments the data.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "augmenter = T.TrivialAugmentWide()\nimgs = [augmenter(orig_img) for _ in range(4)]\nplot(imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Randomly-applied transforms\n\nSome transforms are randomly-applied given a probability ``p``. That is, the\ntransformed image may actually be the same as the original one, even when\ncalled with the same transformer instance!\n\n### RandomHorizontalFlip\nThe :class:`~torchvision.transforms.RandomHorizontalFlip` transform\n(see also :func:`~torchvision.transforms.functional.hflip`)\nperforms horizontal flip of an image, with a given probability.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "hflipper = T.RandomHorizontalFlip(p=0.5)\ntransformed_imgs = [hflipper(orig_img) for _ in range(4)]\nplot(transformed_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomVerticalFlip\nThe :class:`~torchvision.transforms.RandomVerticalFlip` transform\n(see also :func:`~torchvision.transforms.functional.vflip`)\nperforms vertical flip of an image, with a given probability.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "vflipper = T.RandomVerticalFlip(p=0.5)\ntransformed_imgs = [vflipper(orig_img) for _ in range(4)]\nplot(transformed_imgs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### RandomApply\nThe :class:`~torchvision.transforms.RandomApply` transform\nrandomly applies a list of transforms, with a given probability.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "applier = T.RandomApply(transforms=[T.RandomCrop(size=(64, 64))], p=0.5)\ntransformed_imgs = [applier(orig_img) for _ in range(4)]\nplot(transformed_imgs)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
\ No newline at end of file
diff --git a/0.11./_downloads/835d7cd9c1c44b0656fb931b7f5af002/plot_scripted_tensor_transforms.ipynb b/0.11./_downloads/835d7cd9c1c44b0656fb931b7f5af002/plot_scripted_tensor_transforms.ipynb
deleted file mode 100644
index 2062c325361..00000000000
--- a/0.11./_downloads/835d7cd9c1c44b0656fb931b7f5af002/plot_scripted_tensor_transforms.ipynb
+++ /dev/null
@@ -1,162 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n# Tensor transforms and JIT\n\nThis example illustrates various features that are now supported by the\n`image transformations ` on Tensor images. In particular, we\nshow how image transforms can be performed on GPU, and how one can also script\nthem using JIT compilation.\n\nPrior to v0.8.0, transforms in torchvision have traditionally been PIL-centric\nand presented multiple limitations due to that. Now, since v0.8.0, transforms\nimplementations are Tensor and PIL compatible and we can achieve the following\nnew features:\n\n- transform multi-band torch tensor images (with more than 3-4 channels)\n- torchscript transforms together with your model for deployment\n- support for GPU acceleration\n- batched transformation such as for videos\n- read and decode data directly as torch tensor with torchscript support (for PNG and JPEG image formats)\n\n
Note
These features are only possible with **Tensor** images.
\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from pathlib import Path\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport torch\nimport torchvision.transforms as T\nfrom torchvision.io import read_image\n\n\nplt.rcParams[\"savefig.bbox\"] = 'tight'\ntorch.manual_seed(1)\n\n\ndef show(imgs):\n fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n for i, img in enumerate(imgs):\n img = T.ToPILImage()(img.to('cpu'))\n axs[0, i].imshow(np.asarray(img))\n axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The :func:`~torchvision.io.read_image` function allows to read an image and\ndirectly load it as a tensor\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "dog1 = read_image(str(Path('assets') / 'dog1.jpg'))\ndog2 = read_image(str(Path('assets') / 'dog2.jpg'))\nshow([dog1, dog2])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Transforming images on GPU\nMost transforms natively support tensors on top of PIL images (to visualize\nthe effect of the transforms, you may refer to see\n`sphx_glr_auto_examples_plot_transforms.py`).\nUsing tensor images, we can run the transforms on GPUs if cuda is available!\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import torch.nn as nn\n\ntransforms = torch.nn.Sequential(\n T.RandomCrop(224),\n T.RandomHorizontalFlip(p=0.3),\n)\n\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\ndog1 = dog1.to(device)\ndog2 = dog2.to(device)\n\ntransformed_dog1 = transforms(dog1)\ntransformed_dog2 = transforms(dog2)\nshow([transformed_dog1, transformed_dog2])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Scriptable transforms for easier deployment via torchscript\nWe now show how to combine image transformations and a model forward pass,\nwhile using ``torch.jit.script`` to obtain a single scripted module.\n\nLet's define a ``Predictor`` module that transforms the input tensor and then\napplies an ImageNet model on it.\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "from torchvision.models import resnet18\n\n\nclass Predictor(nn.Module):\n\n def __init__(self):\n super().__init__()\n self.resnet18 = resnet18(pretrained=True, progress=False).eval()\n self.transforms = nn.Sequential(\n T.Resize([256, ]), # We use single int value inside a list due to torchscript type restrictions\n T.CenterCrop(224),\n T.ConvertImageDtype(torch.float),\n T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n )\n\n def forward(self, x: torch.Tensor) -> torch.Tensor:\n with torch.no_grad():\n x = self.transforms(x)\n y_pred = self.resnet18(x)\n return y_pred.argmax(dim=1)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Now, let's define scripted and non-scripted instances of ``Predictor`` and\napply it on multiple tensor images of the same size\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "predictor = Predictor().to(device)\nscripted_predictor = torch.jit.script(predictor).to(device)\n\nbatch = torch.stack([dog1, dog2]).to(device)\n\nres = predictor(batch)\nres_scripted = scripted_predictor(batch)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We can verify that the prediction of the scripted and non-scripted models are\nthe same:\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import json\n\nwith open(Path('assets') / 'imagenet_class_index.json', 'r') as labels_file:\n labels = json.load(labels_file)\n\nfor i, (pred, pred_scripted) in enumerate(zip(res, res_scripted)):\n assert pred == pred_scripted\n print(f\"Prediction for Dog {i + 1}: {labels[str(pred.item())]}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Since the model is scripted, it can be easily dumped on disk and re-used\n\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": false
- },
- "outputs": [],
- "source": [
- "import tempfile\n\nwith tempfile.NamedTemporaryFile() as f:\n scripted_predictor.save(f.name)\n\n dumped_scripted_predictor = torch.jit.load(f.name)\n res_scripted_dumped = dumped_scripted_predictor(batch)\nassert (res_scripted_dumped == res_scripted).all()"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
\ No newline at end of file
diff --git a/0.11./_downloads/b8b00ddf3e9bca37ad16e1aced1e3ea4/plot_repurposing_annotations.py b/0.11./_downloads/b8b00ddf3e9bca37ad16e1aced1e3ea4/plot_repurposing_annotations.py
deleted file mode 100644
index fb4835496c3..00000000000
--- a/0.11./_downloads/b8b00ddf3e9bca37ad16e1aced1e3ea4/plot_repurposing_annotations.py
+++ /dev/null
@@ -1,205 +0,0 @@
-"""
-=====================================
-Repurposing masks into bounding boxes
-=====================================
-
-The following example illustrates the operations available
-the :ref:`torchvision.ops ` module for repurposing
-segmentation masks into object localization annotations for different tasks
-(e.g. transforming masks used by instance and panoptic segmentation
-methods into bounding boxes used by object detection methods).
-"""
-
-# sphinx_gallery_thumbnail_path = "../../gallery/assets/repurposing_annotations_thumbnail.png"
-
-import os
-import numpy as np
-import torch
-import matplotlib.pyplot as plt
-
-import torchvision.transforms.functional as F
-
-
-ASSETS_DIRECTORY = "assets"
-
-plt.rcParams["savefig.bbox"] = "tight"
-
-
-def show(imgs):
- if not isinstance(imgs, list):
- imgs = [imgs]
- fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)
- for i, img in enumerate(imgs):
- img = img.detach()
- img = F.to_pil_image(img)
- axs[0, i].imshow(np.asarray(img))
- axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
-
-
-####################################
-# Masks
-# -----
-# In tasks like instance and panoptic segmentation, masks are commonly defined, and are defined by this package,
-# as a multi-dimensional array (e.g. a NumPy array or a PyTorch tensor) with the following shape:
-#
-# (num_objects, height, width)
-#
-# Where num_objects is the number of annotated objects in the image. Each (height, width) object corresponds to exactly
-# one object. For example, if your input image has the dimensions 224 x 224 and has four annotated objects the shape
-# of your masks annotation has the following shape:
-#
-# (4, 224, 224).
-#
-# A nice property of masks is that they can be easily repurposed to be used in methods to solve a variety of object
-# localization tasks.
-
-####################################
-# Converting Masks to Bounding Boxes
-# -----------------------------------------------
-# For example, the :func:`~torchvision.ops.masks_to_boxes` operation can be used to
-# transform masks into bounding boxes that can be
-# used as input to detection models such as FasterRCNN and RetinaNet.
-# We will take images and masks from the `PenFudan Dataset `_.
-
-
-from torchvision.io import read_image
-
-img_path = os.path.join(ASSETS_DIRECTORY, "FudanPed00054.png")
-mask_path = os.path.join(ASSETS_DIRECTORY, "FudanPed00054_mask.png")
-img = read_image(img_path)
-mask = read_image(mask_path)
-
-
-#########################
-# Here the masks are represented as a PNG Image, with floating point values.
-# Each pixel is encoded as different colors, with 0 being background.
-# Notice that the spatial dimensions of image and mask match.
-
-print(mask.size())
-print(img.size())
-print(mask)
-
-############################
-
-# We get the unique colors, as these would be the object ids.
-obj_ids = torch.unique(mask)
-
-# first id is the background, so remove it.
-obj_ids = obj_ids[1:]
-
-# split the color-encoded mask into a set of boolean masks.
-# Note that this snippet would work as well if the masks were float values instead of ints.
-masks = mask == obj_ids[:, None, None]
-
-########################
-# Now the masks are a boolean tensor.
-# The first dimension in this case 3 and denotes the number of instances: there are 3 people in the image.
-# The other two dimensions are height and width, which are equal to the dimensions of the image.
-# For each instance, the boolean tensors represent if the particular pixel
-# belongs to the segmentation mask of the image.
-
-print(masks.size())
-print(masks)
-
-####################################
-# Let us visualize an image and plot its corresponding segmentation masks.
-# We will use the :func:`~torchvision.utils.draw_segmentation_masks` to draw the segmentation masks.
-
-from torchvision.utils import draw_segmentation_masks
-
-drawn_masks = []
-for mask in masks:
- drawn_masks.append(draw_segmentation_masks(img, mask, alpha=0.8, colors="blue"))
-
-show(drawn_masks)
-
-####################################
-# To convert the boolean masks into bounding boxes.
-# We will use the :func:`~torchvision.ops.masks_to_boxes` from the torchvision.ops module
-# It returns the boxes in ``(xmin, ymin, xmax, ymax)`` format.
-
-from torchvision.ops import masks_to_boxes
-
-boxes = masks_to_boxes(masks)
-print(boxes.size())
-print(boxes)
-
-####################################
-# As the shape denotes, there are 3 boxes and in ``(xmin, ymin, xmax, ymax)`` format.
-# These can be visualized very easily with :func:`~torchvision.utils.draw_bounding_boxes` utility
-# provided in :ref:`torchvision.utils `.
-
-from torchvision.utils import draw_bounding_boxes
-
-drawn_boxes = draw_bounding_boxes(img, boxes, colors="red")
-show(drawn_boxes)
-
-###################################
-# These boxes can now directly be used by detection models in torchvision.
-# Here is demo with a Faster R-CNN model loaded from
-# :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`
-
-from torchvision.models.detection import fasterrcnn_resnet50_fpn
-
-model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)
-print(img.size())
-
-img = F.convert_image_dtype(img, torch.float)
-target = {}
-target["boxes"] = boxes
-target["labels"] = labels = torch.ones((masks.size(0),), dtype=torch.int64)
-detection_outputs = model(img.unsqueeze(0), [target])
-
-
-####################################
-# Converting Segmentation Dataset to Detection Dataset
-# ----------------------------------------------------
-#
-# With this utility it becomes very simple to convert a segmentation dataset to a detection dataset.
-# With this we can now use a segmentation dataset to train a detection model.
-# One can similarly convert panoptic dataset to detection dataset.
-# Here is an example where we re-purpose the dataset from the
-# `PenFudan Detection Tutorial `_.
-
-class SegmentationToDetectionDataset(torch.utils.data.Dataset):
- def __init__(self, root, transforms):
- self.root = root
- self.transforms = transforms
- # load all image files, sorting them to
- # ensure that they are aligned
- self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
- self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
-
- def __getitem__(self, idx):
- # load images and masks
- img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
- mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
-
- img = read_image(img_path)
- mask = read_image(mask_path)
-
- img = F.convert_image_dtype(img, dtype=torch.float)
- mask = F.convert_image_dtype(mask, dtype=torch.float)
-
- # We get the unique colors, as these would be the object ids.
- obj_ids = torch.unique(mask)
-
- # first id is the background, so remove it.
- obj_ids = obj_ids[1:]
-
- # split the color-encoded mask into a set of boolean masks.
- masks = mask == obj_ids[:, None, None]
-
- boxes = masks_to_boxes(masks)
-
- # there is only one class
- labels = torch.ones((masks.shape[0],), dtype=torch.int64)
-
- target = {}
- target["boxes"] = boxes
- target["labels"] = labels
-
- if self.transforms is not None:
- img, target = self.transforms(img, target)
-
- return img, target
diff --git a/0.11./_images/sphx_glr_plot_repurposing_annotations_001.png b/0.11./_images/sphx_glr_plot_repurposing_annotations_001.png
deleted file mode 100644
index 12a725677d8..00000000000
Binary files a/0.11./_images/sphx_glr_plot_repurposing_annotations_001.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_repurposing_annotations_002.png b/0.11./_images/sphx_glr_plot_repurposing_annotations_002.png
deleted file mode 100644
index 6be6913f533..00000000000
Binary files a/0.11./_images/sphx_glr_plot_repurposing_annotations_002.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_repurposing_annotations_thumb.png b/0.11./_images/sphx_glr_plot_repurposing_annotations_thumb.png
deleted file mode 100644
index fbed6047b39..00000000000
Binary files a/0.11./_images/sphx_glr_plot_repurposing_annotations_thumb.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_001.png b/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_001.png
deleted file mode 100644
index 0ffa6b771e3..00000000000
Binary files a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_001.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_002.png b/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_002.png
deleted file mode 100644
index 0a25d59f73c..00000000000
Binary files a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_002.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_thumb.png b/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_thumb.png
deleted file mode 100644
index d1a78f5e5d2..00000000000
Binary files a/0.11./_images/sphx_glr_plot_scripted_tensor_transforms_thumb.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_001.png b/0.11./_images/sphx_glr_plot_transforms_001.png
deleted file mode 100644
index a3c12318136..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_001.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_002.png b/0.11./_images/sphx_glr_plot_transforms_002.png
deleted file mode 100644
index af16bec8706..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_002.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_003.png b/0.11./_images/sphx_glr_plot_transforms_003.png
deleted file mode 100644
index 36cf9b27d46..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_003.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_004.png b/0.11./_images/sphx_glr_plot_transforms_004.png
deleted file mode 100644
index 993a060d324..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_004.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_005.png b/0.11./_images/sphx_glr_plot_transforms_005.png
deleted file mode 100644
index d7a47aa622e..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_005.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_006.png b/0.11./_images/sphx_glr_plot_transforms_006.png
deleted file mode 100644
index 4f3ed913298..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_006.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_007.png b/0.11./_images/sphx_glr_plot_transforms_007.png
deleted file mode 100644
index 78c8e80806c..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_007.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_008.png b/0.11./_images/sphx_glr_plot_transforms_008.png
deleted file mode 100644
index 1c8306e17a5..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_008.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_009.png b/0.11./_images/sphx_glr_plot_transforms_009.png
deleted file mode 100644
index c359be4a8cc..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_009.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_010.png b/0.11./_images/sphx_glr_plot_transforms_010.png
deleted file mode 100644
index dce7325bbde..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_010.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_011.png b/0.11./_images/sphx_glr_plot_transforms_011.png
deleted file mode 100644
index 88f6f352d1c..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_011.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_012.png b/0.11./_images/sphx_glr_plot_transforms_012.png
deleted file mode 100644
index c3e919fd57a..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_012.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_013.png b/0.11./_images/sphx_glr_plot_transforms_013.png
deleted file mode 100644
index 889fdd4163c..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_013.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_014.png b/0.11./_images/sphx_glr_plot_transforms_014.png
deleted file mode 100644
index 0e32e949ce6..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_014.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_015.png b/0.11./_images/sphx_glr_plot_transforms_015.png
deleted file mode 100644
index a9662c81b61..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_015.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_016.png b/0.11./_images/sphx_glr_plot_transforms_016.png
deleted file mode 100644
index 0fba0cb687f..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_016.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_017.png b/0.11./_images/sphx_glr_plot_transforms_017.png
deleted file mode 100644
index 0fba0cb687f..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_017.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_018.png b/0.11./_images/sphx_glr_plot_transforms_018.png
deleted file mode 100644
index cf3fb3cbd79..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_018.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_019.png b/0.11./_images/sphx_glr_plot_transforms_019.png
deleted file mode 100644
index f4f713729b7..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_019.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_020.png b/0.11./_images/sphx_glr_plot_transforms_020.png
deleted file mode 100644
index 472d6544f45..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_020.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_021.png b/0.11./_images/sphx_glr_plot_transforms_021.png
deleted file mode 100644
index 123c1bc748c..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_021.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_022.png b/0.11./_images/sphx_glr_plot_transforms_022.png
deleted file mode 100644
index 84655812c0b..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_022.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_023.png b/0.11./_images/sphx_glr_plot_transforms_023.png
deleted file mode 100644
index 6497576a3cc..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_023.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_024.png b/0.11./_images/sphx_glr_plot_transforms_024.png
deleted file mode 100644
index a572f221a1d..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_024.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_transforms_thumb.png b/0.11./_images/sphx_glr_plot_transforms_thumb.png
deleted file mode 100644
index d6d933b2a69..00000000000
Binary files a/0.11./_images/sphx_glr_plot_transforms_thumb.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_video_api_001.png b/0.11./_images/sphx_glr_plot_video_api_001.png
deleted file mode 100644
index 0305457b9fc..00000000000
Binary files a/0.11./_images/sphx_glr_plot_video_api_001.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_video_api_thumb.png b/0.11./_images/sphx_glr_plot_video_api_thumb.png
deleted file mode 100644
index c4555201856..00000000000
Binary files a/0.11./_images/sphx_glr_plot_video_api_thumb.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_001.png b/0.11./_images/sphx_glr_plot_visualization_utils_001.png
deleted file mode 100644
index f52173325f9..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_001.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_002.png b/0.11./_images/sphx_glr_plot_visualization_utils_002.png
deleted file mode 100644
index 4fe56400208..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_002.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_003.png b/0.11./_images/sphx_glr_plot_visualization_utils_003.png
deleted file mode 100644
index df5482a7615..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_003.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_004.png b/0.11./_images/sphx_glr_plot_visualization_utils_004.png
deleted file mode 100644
index c3ffb3325b1..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_004.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_005.png b/0.11./_images/sphx_glr_plot_visualization_utils_005.png
deleted file mode 100644
index 1dbdab571dc..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_005.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_006.png b/0.11./_images/sphx_glr_plot_visualization_utils_006.png
deleted file mode 100644
index 7f65851a71b..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_006.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_007.png b/0.11./_images/sphx_glr_plot_visualization_utils_007.png
deleted file mode 100644
index 0098ec17765..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_007.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_008.png b/0.11./_images/sphx_glr_plot_visualization_utils_008.png
deleted file mode 100644
index df2272d0b67..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_008.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_009.png b/0.11./_images/sphx_glr_plot_visualization_utils_009.png
deleted file mode 100644
index e7cca5bab1f..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_009.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_010.png b/0.11./_images/sphx_glr_plot_visualization_utils_010.png
deleted file mode 100644
index 58725b5ffda..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_010.png and /dev/null differ
diff --git a/0.11./_images/sphx_glr_plot_visualization_utils_thumb.png b/0.11./_images/sphx_glr_plot_visualization_utils_thumb.png
deleted file mode 100644
index 359de279600..00000000000
Binary files a/0.11./_images/sphx_glr_plot_visualization_utils_thumb.png and /dev/null differ
diff --git a/0.11./_modules/index.html b/0.11./_modules/index.html
deleted file mode 100644
index 82db8007117..00000000000
--- a/0.11./_modules/index.html
+++ /dev/null
@@ -1,692 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- Overview: module code — Torchvision main documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: Cookies Policy.
-importwarnings
-importos
-
-from.extensionimport_HAS_OPS
-
-fromtorchvisionimportmodels
-fromtorchvisionimportdatasets
-fromtorchvisionimportops
-fromtorchvisionimporttransforms
-fromtorchvisionimportutils
-fromtorchvisionimportio
-
-importtorch
-
-try:
- from.versionimport__version__# noqa: F401
-exceptImportError:
- pass
-
-# Check if torchvision is being imported within the root folder
-if(not_HAS_OPSandos.path.dirname(os.path.realpath(__file__))==
- os.path.join(os.path.realpath(os.getcwd()),'torchvision')):
- message=('You are importing torchvision within its own root folder ({}). '
- 'This is not expected to work and may give errors. Please exit the '
- 'torchvision project source and relaunch your python interpreter.')
- warnings.warn(message.format(os.getcwd()))
-
-_image_backend='PIL'
-
-_video_backend="pyav"
-
-
-
[docs]defset_image_backend(backend):
- """
- Specifies the package used to load images.
-
- Args:
- backend (string): Name of the image backend. one of {'PIL', 'accimage'}.
- The :mod:`accimage` package uses the Intel IPP library. It is
- generally faster than PIL, but does not support as many operations.
- """
- global_image_backend
- ifbackendnotin['PIL','accimage']:
- raiseValueError("Invalid backend '{}'. Options are 'PIL' and 'accimage'"
- .format(backend))
- _image_backend=backend
-
-
-
[docs]defget_image_backend():
- """
- Gets the name of the package used to load images
- """
- return_image_backend
-
-
-
[docs]defset_video_backend(backend):
- """
- Specifies the package used to decode videos.
-
- Args:
- backend (string): Name of the video backend. one of {'pyav', 'video_reader'}.
- The :mod:`pyav` package uses the 3rd party PyAv library. It is a Pythonic
- binding for the FFmpeg libraries.
- The :mod:`video_reader` package includes a native C++ implementation on
- top of FFMPEG libraries, and a python API of TorchScript custom operator.
- It generally decodes faster than :mod:`pyav`, but is perhaps less robust.
-
- .. note::
- Building with FFMPEG is disabled by default in the latest `main`. If you want to use the 'video_reader'
- backend, please compile torchvision from source.
- """
- global_video_backend
- ifbackendnotin["pyav","video_reader"]:
- raiseValueError(
- "Invalid video backend '%s'. Options are 'pyav' and 'video_reader'"%backend
- )
- ifbackend=="video_reader"andnotio._HAS_VIDEO_OPT:
- message=(
- "video_reader video backend is not available."
- " Please compile torchvision from source and try again"
- )
- warnings.warn(message)
- else:
- _video_backend=backend
-
-
-
[docs]defget_video_backend():
- """
- Returns the currently active video backend used to decode videos.
-
- Returns:
- str: Name of the video backend. one of {'pyav', 'video_reader'}.
- """
-
- return_video_backend
To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: Cookies Policy.