You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the main bottleneck in our training speed is the training data loader. This is probably because we are getting cache-misses when collecting the data at random indices. To speed this up, we will probably need to store the pre-shuffled data on disk.
I tried writing some code to do this, but I was running out of memory on the last step:
fromitertoolsimportproductfromrandomimportshuffleimportnumpyasnpimportxarrayasxrds=xr.open_dataset("./data/processed/training.nc")
# construct the indicesx=range(len(ds.x))
y=range(len(ds.y))
z=range(len(ds.z))
time=range(len(ds.time))
indices=list(product(x, y, time))
shuffle(indices)
transposed=list(zip(*indices))
# construct xarray indexers following # http://xarray.pydata.org/en/stable/indexing.html#more-advanced-indexingdims= ['x', 'y', 'time']
indexers= {
dim: xr.DataArray(
np.array(index),
dims="sample",
coords={'sample': np.arange(len(indices))})
fordim, indexinzip(dims, transposed)
}
# This step runs out of memoryshuffled_ds=ds.isel(**indexers)
To speed this up, we will probably need to do a couple of steps, with on-disk caching for each step:
Transpose the data (time, z, y, x) --> (time, y, x, z)
Reshape (time, y, x, z) --> (batch, time_and_next, z)
Shuffle along batch dimension.
The text was updated successfully, but these errors were encountered:
cc @sarenehan
I think the main bottleneck in our training speed is the training data loader. This is probably because we are getting cache-misses when collecting the data at random indices. To speed this up, we will probably need to store the pre-shuffled data on disk.
I tried writing some code to do this, but I was running out of memory on the last step:
To speed this up, we will probably need to do a couple of steps, with on-disk caching for each step:
(time, z, y, x) --> (time, y, x, z)
(time, y, x, z) --> (batch, time_and_next, z)
The text was updated successfully, but these errors were encountered: