# Simple example notebook that causes the largest OOI.pangeo.oi server to run out of memory when trying to lazily load and concatenate AMPS netcdfs



In [1]:
import xarray as xr
import fsspec
import gcsfs
from tqdm import tqdm
xr.set_options(display_style="html");

### list the netcdf files

In [2]:
fs = gcsfs.GCSFileSystem(project='ldeo-glaciology', mode='ab', cache_timeout = 0)
NCs = fs.ls('gs://ldeo-glaciology/AMPS/wrf_d03_20161222_week-cf')
len(NCs)

55

### Loop through all the netcdf files in the AMPS directory and conat each one.
This cell causes the kernel to restart when you try to loop over all the netcdfs (len (NCs) = 55) we currently have in GCS.



In [None]:
## load the first file to inialize the xarray
url = 'gs://' + NCs[0]
with  fsspec.open(url, mode='rb')  as openfile:  
    AMPS = xr.open_dataset(openfile, chunks={})  # these chunk sizes produce chunks of reasonable data volumes and which stretch through all time

## load the other files, each time concaternating them onto an xarray (AMPS) that grows in the time dimension each iteration. 
for i in tqdm(range(1, len(NCs)-1)):  
    url = 'gs://' + NCs[i]
    with  fsspec.open(url, mode='rb')  as openfile:  
        temp = xr.open_dataset(openfile, chunks={})  # these chunk sizes produce chunks of reasonable data volumes and which stretch through all time
    AMPS = xr.concat([AMPS,temp],'time')

 81%|████████  | 43/53 [03:00<03:56, 23.69s/it]

I am interested to know why the cell above fills the memory, when my intention was to only be loading things lazily. Note that when I load a smaller number of files (e.g., 45), instead of the full 55, I produce an xarray which seems to be made up of dask arrays as intended, so I dont why does it take up so much space on disk? Is it something to do with the chunking? increasing chunk size with AMPS.chunk doesnt seems to change the memory usage (monitored with top). 