New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stacking tensors without same size #56
Comments
First of all, @aofrancani sorry for not answering your question. I have not looked into the problem, if I find some time for it or encounter the same problem I will let you know. Sencodnly, ss the title of this issue is appropriate seems appropriate for my request I will file my request as a comment here. I want to work with video data with clips of different lengt, i.e. different number of frames. As of now, in the default_collate(batch) in .../torch/utils/data/_utils/collate.py the elements of the batch are transformed into a tensor using torch.stack(batch, 0, out=out). Is there any plans for introducing nestedtensors https://github.com/pytorch/nestedtensor in the near future to be able to work video clips of varying length? Thanks in advance and thank you for the library, it is a pleasure to work with. |
Hi everyone! I solved it...
val_transform = Compose(
[
ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(8),
Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
ShortSideScale(size=256),
CenterCrop(244),
]
),
),
]
) After changing it, I had another issue:
Hence, my final code is the following: class KineticsDataModule(pytorch_lightning.LightningDataModule):
"""
This LightningDataModule implementation constructs a PyTorchVideo Kinetics dataset for both
the train and val partitions. It defines each partition's augmentation and
preprocessing transforms and configures the PyTorch DataLoaders.
"""
# Dataset configuration
_DATA_PATH = '/content/drive/MyDrive/Datasets/Kinetics400/'
_CLIP_DURATION = 2 # Duration of sampled clip for each video
_BATCH_SIZE = 4
_NUM_WORKERS = 2 # Number of parallel processes fetching data
def train_dataloader(self):
"""
Create the Kinetics train partition from the list of video labels
in {self._DATA_PATH}/train.csv. Add transform that subsamples and
normalizes the video before applying the scale, crop and flip augmentations.
"""
train_transform = Compose(
[
ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(8),
Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
RandomShortSideScale(min_size=256, max_size=320),
RandomCrop(244),
RandomHorizontalFlip(p=0.5),
]
),
),
]
)
train_dataset = pytorchvideo.data.Kinetics(
data_path=os.path.join(self._DATA_PATH, "train.csv"),
clip_sampler=pytorchvideo.data.make_clip_sampler("random", self._CLIP_DURATION),
decode_audio=False,
transform=train_transform
)
return torch.utils.data.DataLoader(
train_dataset,
batch_size=self._BATCH_SIZE,
num_workers=self._NUM_WORKERS,
)
def val_dataloader(self):
"""
Create the Kinetics val partition from the list of video labels
in {self._DATA_PATH}/val.csv. Add transform that subsamples and
normalizes the video before applying the scale.
"""
val_transform = Compose(
[
ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(8),
Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
ShortSideScale(size=256),
CenterCrop(244),
]
),
),
]
)
val_dataset = pytorchvideo.data.Kinetics(
data_path=os.path.join(self._DATA_PATH, "val.csv"),
clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", self._CLIP_DURATION),
decode_audio=False,
transform=val_transform
)
return torch.utils.data.DataLoader(
val_dataset,
batch_size=self._BATCH_SIZE,
num_workers=self._NUM_WORKERS,
) Once I've solved it, I will close this issue! Thanks for this amazing library! |
Hi, I'm following the tutorial Training a PyTorchVideo classification model and I believe I can't load the data correctly.
I'm using Google Colab and my Kinetics400 is in my Google Drive. I've preprocessed the Kinetics such that all the videos are rescaled to height=256 pixels.
My Dataloader is implemented in the same way as described in the tutorial:
I built a default ResNet just like the tutorial. Following the tutorial until the training step, I'm running a cell in Google Colab with only
train()
to run the functiondef train()
.Even though I'm randomly cropping to 224x224 in Transforms, I'm getting the following error:
I was expecting something like [3, 8, 244, 244] due to
RandomCrop(244)
in the DataLoader. What am I missing? Thanks in advance for your help!The text was updated successfully, but these errors were encountered: