Skip to content

A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class.

License

Notifications You must be signed in to change notification settings

lorenzoh/DataLoaders.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataLoaders.jl

Documentation (latest)

A Julia package implementing performant data loading for deep learning on out-of-memory datasets that. Works like PyTorch's DataLoader.

What does it do?

  • Uses multi-threading to load data in parallel while keeping the primary thread free for the training loop
  • Handles batching and collating
  • Is simple to extend for custom datasets
  • Integrates well with other packages in the ecosystem
  • Allows for inplace loading to reduce memory load

When should you use it?

  • You have a dataset that does not fit into memory
  • You want to reduce the time your training loop is waiting for the next batch of data

How do you use it?

Install like any other Julia package using the package manager (see setup):

]add DataLoaders

After installation, import it, create a DataLoader from a dataset and batch size, and iterate over it:

using DataLoaders
# 10.000 observations of inputs with 128 features and one target feature
data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)

for (xs, ys) in dataloader
    @assert size(xs) == (128, 16)
    @assert size(ys) == (1, 16)
end

Next, you may want to read

About

A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class.

Resources

License

Stars

Watchers

Forks

Languages