Load the entire dataset into memory

### 🚀 The feature

I plan to add a feature for loading datasets into memory to the built-in datasets in the datasets module in torchvision. Maybe we can add a parameter to the initialization function of the datasets, such as "to_memory"?

I have now implemented the prototype of this feature by modifying ImageFolder:
```python
# ATTENTION: This code is UNFINISHED!
def __init__(
    self,
    root: str,
    loader: Callable[[str], Any],
    extensions: Optional[Tuple[str, ...]] = None,
    transform: Optional[Callable] = None,
    target_transform: Optional[Callable] = None,
    is_valid_file: Optional[Callable[[str], bool]] = None,
) -> None:
    super().__init__(root, transform=transform, target_transform=target_transform)
    classes, class_to_idx = self.find_classes(self.root)
    samples = self.make_dataset(self.root, class_to_idx, extensions, is_valid_file)

    self.loader = loader
    self.extensions = extensions

    self.classes = classes
    self.class_to_idx = class_to_idx
    self.samples = samples
    self.targets = [s[1] for s in samples]
    self.images = []

    for s in samples:
        sample = self.loader(s[0])
        if self.transform is not None:
            sample = self.transform(sample)
        self.images.append(sample)

def __getitem__(self, index: int) -> Tuple[Any, Any]:
    """
    Args:
        index (int): Index

    Returns:
        tuple: (sample, target) where target is class_index of the target class.
    """
    path, target = self.samples[index]
    sample = self.images[index]
    if self.target_transform is not None:
        target = self.target_transform(target)
    return sample, target
```

### Motivation, pitch

I found that training was unusually slow when I was using a larger dataset for one of my projects (I was using ImageFolder to build my dataset).
After looking at the data given by pytorch_profiler, I found that aten::copy_ takes up most of the time. 
I thought this meant I had an IO bottleneck, so I wanted to see if I could load the entire dataset into memory, but torchvision didn't seem to support it.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load the entire dataset into memory #8059

🚀 The feature

Motivation, pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Load the entire dataset into memory #8059

Description

🚀 The feature

Motivation, pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions