Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ architectures, and common image transformations for computer vision.
:caption: Package Reference

datasets
io
models
ops
transforms
utils

Expand Down
16 changes: 16 additions & 0 deletions docs/source/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
torchvision.io
==============

.. currentmodule:: torchvision.io

The :mod:`torchvision.io` package provides functions for performing IO
operations. They are currently specific to reading and writing video.

Video
-----

.. autofunction:: read_video

.. autofunction:: read_video_timestamps

.. autofunction:: write_video
17 changes: 17 additions & 0 deletions docs/source/ops.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
torchvision.ops
===============

.. currentmodule:: torchvision.ops

:mod:`torchvision.ops` implements operators that are specific for Computer Vision.

.. note::
Those operators currently do not support TorchScript.


.. autofunction:: nms
.. autofunction:: roi_align
.. autofunction:: roi_pool

.. autoclass:: RoIAlign
.. autoclass:: RoIPool
61 changes: 38 additions & 23 deletions torchvision/io/video.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,14 @@ def write_video(filename, video_array, fps, video_codec='libx264', options=None)
"""
Writes a 4d tensor in [T, H, W, C] format in a video file

Arguments:
filename (str): path where the video will be saved
video_array (Tensor[T, H, W, C]): tensor containing the individual frames,
as a uint8 tensor in [T, H, W, C] format
fps (Number): frames per second
Parameters
----------
filename : str
path where the video will be saved
video_array : Tensor[T, H, W, C]
tensor containing the individual frames, as a uint8 tensor in [T, H, W, C] format
fps : Number
frames per second
"""
_check_av_available()
video_array = torch.as_tensor(video_array, dtype=torch.uint8).numpy()
Expand Down Expand Up @@ -135,18 +138,25 @@ def read_video(filename, start_pts=0, end_pts=None):
Reads a video from a file, returning both the video frames as well as
the audio frames

Arguments:
filename (str): path to the video file
start_pts (int, optional): the start presentation time of the video
end_pts (int, optional): the end presentation time

Returns:
vframes (Tensor[T, H, W, C]): the `T` video frames
aframes (Tensor[K, L]): the audio frames, where `K` is the number of channels
and `L` is the number of points
info (Dict): metadata for the video and audio. Can contain the fields
- video_fps (float)
- audio_fps (int)
Parameters
----------
filename : str
path to the video file
start_pts : int, optional
the start presentation time of the video
end_pts : int, optional
the end presentation time

Returns
-------
vframes : Tensor[T, H, W, C]
the `T` video frames
aframes : Tensor[K, L]
the audio frames, where `K` is the number of channels and `L` is the
number of points
info : Dict
metadata for the video and audio. Can contain the fields video_fps (float)
and audio_fps (int)
"""
_check_av_available()

Expand Down Expand Up @@ -201,13 +211,18 @@ def read_video_timestamps(filename):

Note that the function decodes the whole video frame-by-frame.

Arguments:
filename (str): path to the video file
Parameters
----------
filename : str
path to the video file

Returns
-------
pts : List[int]
presentation timestamps for each one of the frames in the video.
video_fps : int
the frame rate for the video

Returns:
pts (List[int]): presentation timestamps for each one of the frames
in the video.
video_fps (int): the frame rate for the video
"""
_check_av_available()
container = av.open(filename, metadata_errors='ignore')
Expand Down
60 changes: 36 additions & 24 deletions torchvision/ops/boxes.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,23 @@ def nms(boxes, scores, iou_threshold):
IoU greater than iou_threshold with another (higher scoring)
box.

Arguments:
boxes (Tensor[N, 4]): boxes to perform NMS on. They
are expected to be in (x1, y1, x2, y2) format
scores (Tensor[N]): scores for each one of the boxes
iou_threshold (float): discards all overlapping
boxes with IoU < iou_threshold

Returns:
keep (Tensor): int64 tensor with the indices
of the elements that have been kept
by NMS, sorted in decreasing order of scores
Parameters
----------
boxes : Tensor[N, 4])
boxes to perform NMS on. They
are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N]
scores for each one of the boxes
iou_threshold : float
discards all overlapping
boxes with IoU < iou_threshold

Returns
-------
keep : Tensor
int64 tensor with the indices
of the elements that have been kept
by NMS, sorted in decreasing order of scores
"""
_C = _lazy_import()
return _C.nms(boxes, scores, iou_threshold)
Expand All @@ -34,19 +40,25 @@ def batched_nms(boxes, scores, idxs, iou_threshold):
Each index value correspond to a category, and NMS
will not be applied between elements of different categories.

Arguments:
boxes (Tensor[N, 4]): boxes where NMS will be performed. They
are expected to be in (x1, y1, x2, y2) format
scores (Tensor[N]): scores for each one of the boxes
idxs (Tensor[N]): indices of the categories for each
one of the boxes.
iou_threshold (float): discards all overlapping boxes
with IoU < iou_threshold

Returns:
keep (Tensor): int64 tensor with the indices of
the elements that have been kept by NMS, sorted
in decreasing order of scores
Parameters
----------
boxes : Tensor[N, 4]
boxes where NMS will be performed. They
are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N]
scores for each one of the boxes
idxs : Tensor[N]
indices of the categories for each one of the boxes.
iou_threshold : float
discards all overlapping boxes
with IoU < iou_threshold

Returns
-------
keep : Tensor
int64 tensor with the indices of
the elements that have been kept by NMS, sorted
in decreasing order of scores
"""
if boxes.numel() == 0:
return torch.empty((0,), dtype=torch.int64, device=boxes.device)
Expand Down