Skip to content

Wall time much greater than CPU time #516

@wesleybowman

Description

@wesleybowman

I have a very large data set. It is multiple files, and each file is internally compressed. I have it all locally downloaded unto my external hard drive (the data is so large, this is the only place I can currently hold it, and I think this is some effect, but I am not sure if it is the entire effect. It is a USB 3 external though, which helps. Not ideal either way).

By loading in my data with
xray.open_mfdataset(filename, chunks={'time':365, 'lat':180, 'lon':360})

and doing a simple call along one of the dimensions, it takes:

%time data.tasmax[:, 360, 720].values
CPU times: user 31.4 s, sys: 4.07 s, total: 35.4 s
Wall time: 8min 38s

So, i decided that maybe the internal compression was really my limiting factor. So I saved my entire data set using

data.to_netcdf('nc_test.nc')

which resulted in a 268 Gb files, and it took between 12 and 16 hours to run. Now, I can load in that new dataset, using xray.open_dataset instead of open_mfdataset.

So, i tried a few different loads to see if chunking helped the issue.

data = xray.open_mfdataset('/mnt/usb/CANESM2/tasmax*rcp45*Can*', chunks={'time':365, 'lat':180, 'lon':360})

new = xray.open_dataset('/mnt/usb/CANESM2/nasa_CANESM2_prediction.nc', chunks={'time':365, 'lat':180, 'lon':360})

new2 = xray.open_dataset('/mnt/usb/CANESM2/nasa_CANESM2_prediction.nc', chunks={'time':365, 'lat':360, 'lon':720})

new3 = xray.open_dataset('/mnt/usb/CANESM2/nasa_CANESM2_prediction.nc', chunks={'time':365, 'lat':720, 'lon':1440})

new4 = xray.open_dataset('/mnt/usb/CANESM2/nasa_CANESM2_prediction.nc', chunks={'time':34675, 'lat':720, 'lon':1440})

new5 = xray.open_mfdataset('/mnt/usb/CANESM2/nasa_CANESM2_prediction.nc', chunks={'time':365, 'lat':180, 'lon':360})

and the resulting times (note that I change the index in each one, to avoid file caching):

%time data.tasmax[:, 360, 720].values
CPU times: user 31.4 s, sys: 4.07 s, total: 35.4 s
Wall time: 8min 38s

%time new.tasmax[:, 360, 720].values
CPU times: user 1.53 s, sys: 2.9 s, total: 4.43 s
Wall time: 5min 35s

%time new2.tasmax[:, 362, 721].values
CPU times: user 817 ms, sys: 2.89 s, total: 3.71 s
Wall time: 4min 7s

%time new3.tasmax[:, 361, 720].values
CPU times: user 987 ms, sys: 3.34 s, total: 4.33 s
Wall time: 4min 17s

%time new4.tasmax[:, 360, 720].values
CPU times: user 713 ms, sys: 2.68 s, total: 3.4 s
Wall time: 6min 5s

%time new5.tasmax[:, 361, 720].values
CPU times: user 1.25 s, sys: 2.79 s, total: 4.04 s
Wall time: 5min 2s

The wall time is always greater than the CPU time (and CPU and Sys combined). Any insight? Can provide more info on request.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions