# Write ApRES xarrays to zarrs 
This notebook 

- loads the individual zarrs created from each .dat file (using to_individual_zarr.ipynb), 
- computes the stacked profiles and adds them to the dataset
- rechunks to a reasonable chunk size in the time dimension, and
- write the whole thing to a zarr store in the ldeo-glaciology bucket.

Import packages

In [1]:
import numpy as np
from dask.distributed import performance_report
import xarray as xr
import fsspec
import json

with open('../../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file:
    token = json.load(token_file)

In [2]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:34303")
client

0,1
Connection method: Direct,
Dashboard: /user/glugeorge/proxy/8787/status,

0,1
Comm: tcp://127.0.0.1:34303,Workers: 8
Dashboard: /user/glugeorge/proxy/8787/status,Total threads: 32
Started: Just now,Total memory: 117.75 GiB

0,1
Comm: tcp://127.0.0.1:44091,Total threads: 4
Dashboard: /user/glugeorge/proxy/42745/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:41421,
Local directory: /tmp/dask-worker-space/worker-8blwyp0d,Local directory: /tmp/dask-worker-space/worker-8blwyp0d
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 156.00 MiB,Spilled bytes: 0 B
Read bytes: 17.29 kiB,Write bytes: 15.40 kiB

0,1
Comm: tcp://127.0.0.1:34481,Total threads: 4
Dashboard: /user/glugeorge/proxy/33301/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:46087,
Local directory: /tmp/dask-worker-space/worker-8wvw7eoh,Local directory: /tmp/dask-worker-space/worker-8wvw7eoh
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 155.76 MiB,Spilled bytes: 0 B
Read bytes: 17.31 kiB,Write bytes: 15.42 kiB

0,1
Comm: tcp://127.0.0.1:34671,Total threads: 4
Dashboard: /user/glugeorge/proxy/46457/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:37871,
Local directory: /tmp/dask-worker-space/worker-j4ddwrb8,Local directory: /tmp/dask-worker-space/worker-j4ddwrb8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 156.27 MiB,Spilled bytes: 0 B
Read bytes: 17.31 kiB,Write bytes: 15.42 kiB

0,1
Comm: tcp://127.0.0.1:37599,Total threads: 4
Dashboard: /user/glugeorge/proxy/44625/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:39073,
Local directory: /tmp/dask-worker-space/worker-qakt17qf,Local directory: /tmp/dask-worker-space/worker-qakt17qf
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 156.27 MiB,Spilled bytes: 0 B
Read bytes: 20.38 kiB,Write bytes: 15.69 kiB

0,1
Comm: tcp://127.0.0.1:44215,Total threads: 4
Dashboard: /user/glugeorge/proxy/41947/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:34101,
Local directory: /tmp/dask-worker-space/worker-be5fxcu8,Local directory: /tmp/dask-worker-space/worker-be5fxcu8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 156.39 MiB,Spilled bytes: 0 B
Read bytes: 17.34 kiB,Write bytes: 15.45 kiB

0,1
Comm: tcp://127.0.0.1:37973,Total threads: 4
Dashboard: /user/glugeorge/proxy/42675/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:34255,
Local directory: /tmp/dask-worker-space/worker-ti2t42s3,Local directory: /tmp/dask-worker-space/worker-ti2t42s3
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 6.0%,Last seen: Just now
Memory usage: 156.72 MiB,Spilled bytes: 0 B
Read bytes: 17.26 kiB,Write bytes: 15.38 kiB

0,1
Comm: tcp://127.0.0.1:37967,Total threads: 4
Dashboard: /user/glugeorge/proxy/43337/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:41923,
Local directory: /tmp/dask-worker-space/worker-kofsrpvh,Local directory: /tmp/dask-worker-space/worker-kofsrpvh
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 156.21 MiB,Spilled bytes: 0 B
Read bytes: 17.29 kiB,Write bytes: 15.40 kiB

0,1
Comm: tcp://127.0.0.1:40429,Total threads: 4
Dashboard: /user/glugeorge/proxy/35291/status,Memory: 14.72 GiB
Nanny: tcp://127.0.0.1:34705,
Local directory: /tmp/dask-worker-space/worker-ludp9lzt,Local directory: /tmp/dask-worker-space/worker-ludp9lzt
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 155.77 MiB,Spilled bytes: 0 B
Read bytes: 17.26 kiB,Write bytes: 15.37 kiB


In [4]:
def zarrs_to_onezarr(site):
    ds = xr.open_mfdataset(f'gs://ldeo-glaciology/apres/greenland/2022/{site}/individual_zarrs_prechunked_3/dat_*',
                               chunks = {}, 
                               engine = 'zarr', 
                               consolidated = False, 
                               parallel = True)
    ds['attenuator'] = ds.attenuator[500]
    ds['AFGain'] = ds.AFGain[500]
        
    for var in ds:
        del ds[var].encoding['chunks']

    profile_stacked = ds.profile.mean(dim='chirp_num')
    ds_stacked = ds.assign({'profile_stacked':profile_stacked})
    ds_stacked_rechunked = ds_stacked.chunk({'time':20})
    
    #encoding = {i: {"dtype": "float64"} for i in ds_stacked_rechunked.data_vars}
    encoding = {
        'time': {
            'units': 'seconds since 1970-01-01'
        }

    filename = f'gs://ldeo-glaciology/apres/greenland/2022/single_zarrs_noencode/{site}' 
    mapper = fsspec.get_mapper(filename, mode='w', token=token) 
    with performance_report(f'ds_stacked_rechunked_{site}.html'):
        ds_stacked_rechunked.to_zarr(mapper, consolidated=True, safe_chunks=False, encoding=encoding)

In [5]:
zarrs_to_onezarr("A101")

In [9]:
zarrs_to_onezarr("A103")

  return to_zarr(  # type: ignore


In [10]:
zarrs_to_onezarr("A104")

In [2]:
def reload(site):
    filename = f'gs://ldeo-glaciology/apres/greenland/2022/single_zarrs_noencode/{site}'
    ds = xr.open_dataset(filename,
        engine='zarr', 
        consolidated=True, 
        chunks={}) 
    return ds

In [6]:
A101 = reload("A101")
A103 = reload("A103_fixed")
A104 = reload("A104")


In [None]:
print('it all finished!')

In [18]:
ds = xr.open_mfdataset(f'gs://ldeo-glaciology/apres/greenland/2022/A103/individual_zarrs_prechunked_3/dat_*',
                               chunks = {}, 
                               engine = 'zarr', 
                               consolidated = False, 
                               parallel = True)
ds = ds.isel(time=range(0,500))
#ds['attenuator'] = ds.attenuator[1]
#ds['AFGain'] = ds.AFGain[1]

for var in ds:
    del ds[var].encoding['chunks']

profile_stacked = ds.profile.mean(dim='chirp_num')
ds_stacked = ds.assign({'profile_stacked':profile_stacked})
ds_stacked_rechunked = ds_stacked.chunk({'time':20})

In [20]:
ds_stacked_rechunked

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,320 B
Shape,"(500, 2)","(20, 2)"
Dask graph,25 chunks in 388 graph layers,25 chunks in 388 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 7.81 kiB 320 B Shape (500, 2) (20, 2) Dask graph 25 chunks in 388 graph layers Data type int64 numpy.ndarray",2  500,

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,320 B
Shape,"(500, 2)","(20, 2)"
Dask graph,25 chunks in 388 graph layers,25 chunks in 388 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,320 B
Shape,"(500, 2)","(20, 2)"
Dask graph,25 chunks in 388 graph layers,25 chunks in 388 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.81 kiB 320 B Shape (500, 2) (20, 2) Dask graph 25 chunks in 388 graph layers Data type float64 numpy.ndarray",2  500,

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,320 B
Shape,"(500, 2)","(20, 2)"
Dask graph,25 chunks in 388 graph layers,25 chunks in 388 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type int64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,162.11 kiB,6.48 kiB
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,,
"Array Chunk Bytes 162.11 kiB 6.48 kiB Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type",500  1,

Unnamed: 0,Array,Chunk
Bytes,162.11 kiB,6.48 kiB
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.96 GiB,244.15 MiB
Shape,"(500, 40001, 20, 2)","(20, 40001, 20, 2)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 5.96 GiB 244.15 MiB Shape (500, 40001, 20, 2) (20, 40001, 20, 2) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1  2  20  40001,

Unnamed: 0,Array,Chunk
Bytes,5.96 GiB,244.15 MiB
Shape,"(500, 40001, 20, 2)","(20, 40001, 20, 2)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.98 GiB,81.27 MiB
Shape,"(500, 6658, 20, 2)","(20, 6658, 20, 2)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,complex128 numpy.ndarray,complex128 numpy.ndarray
"Array Chunk Bytes 1.98 GiB 81.27 MiB Shape (500, 6658, 20, 2) (20, 6658, 20, 2) Dask graph 25 chunks in 265 graph layers Data type complex128 numpy.ndarray",500  1  2  20  6658,

Unnamed: 0,Array,Chunk
Bytes,1.98 GiB,81.27 MiB
Shape,"(500, 6658, 20, 2)","(20, 6658, 20, 2)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,complex128 numpy.ndarray,complex128 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.91 kiB 160 B Shape (500,) (20,) Dask graph 25 chunks in 265 graph layers Data type float64 numpy.ndarray",500  1,

Unnamed: 0,Array,Chunk
Bytes,3.91 kiB,160 B
Shape,"(500,)","(20,)"
Dask graph,25 chunks in 265 graph layers,25 chunks in 265 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,101.59 MiB,4.06 MiB
Shape,"(500, 6658, 2)","(20, 6658, 2)"
Dask graph,25 chunks in 267 graph layers,25 chunks in 267 graph layers
Data type,complex128 numpy.ndarray,complex128 numpy.ndarray
"Array Chunk Bytes 101.59 MiB 4.06 MiB Shape (500, 6658, 2) (20, 6658, 2) Dask graph 25 chunks in 267 graph layers Data type complex128 numpy.ndarray",2  6658  500,

Unnamed: 0,Array,Chunk
Bytes,101.59 MiB,4.06 MiB
Shape,"(500, 6658, 2)","(20, 6658, 2)"
Dask graph,25 chunks in 267 graph layers,25 chunks in 267 graph layers
Data type,complex128 numpy.ndarray,complex128 numpy.ndarray
