# Adding the MEASURES data to the google bucket
In the ice flow chapter we loaded some widely used antarcitc surface velocity data from the cloud. This notebook demonstrates how to download that data, manipulate it so that it is in a format that makes cloud computing efficient, and upload it to a google bucket. 

To actually download and upload these data you will need your own NSIDC credentials (for the download) and google bucket token (for the upload).

## Download
To download the data from NSIDC to your local machine, run the following command. You will need an free account with NASA Earthdata Login account. More details can be found [here](https://urs.earthdata.nasa.gov/profile). Then replace USERNAME and PASSWORD in the command below with your Earthdata Login username and password.

In [8]:
!wget --http-user=USERNAME --http-password=PASSWORD https://n5eil01u.ecs.nsidc.org/MEASURES/NSIDC-0484.002/1996.01.01/antarctica_ice_velocity_450m_v2.nc

--2022-12-12 10:46:24--  https://n5eil01u.ecs.nsidc.org/MEASURES/NSIDC-0484.002/1996.01.01/antarctica_ice_velocity_450m_v2.nc
Resolving n5eil01u.ecs.nsidc.org (n5eil01u.ecs.nsidc.org)... 128.138.97.102
Connecting to n5eil01u.ecs.nsidc.org (n5eil01u.ecs.nsidc.org)|128.138.97.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?app_type=401&client_id=_JLuwMHxb2xX6NwYTb4dRA&response_type=code&redirect_uri=https%3A%2F%2Fn5eil01u.ecs.nsidc.org%2FOPS%2Fredirect&state=aHR0cHM6Ly9uNWVpbDAxdS5lY3MubnNpZGMub3JnL01FQVNVUkVTL05TSURDLTA0ODQuMDAyLzE5OTYuMDEuMDEvYW50YXJjdGljYV9pY2VfdmVsb2NpdHlfNDUwbV92Mi5uYw [following]
--2022-12-12 10:46:24--  https://urs.earthdata.nasa.gov/oauth/authorize?app_type=401&client_id=_JLuwMHxb2xX6NwYTb4dRA&response_type=code&redirect_uri=https%3A%2F%2Fn5eil01u.ecs.nsidc.org%2FOPS%2Fredirect&state=aHR0cHM6Ly9uNWVpbDAxdS5lY3MubnNpZGMub3JnL01FQVNVUkVTL05TSURDLTA0ODQuMDAyLzE5OTYuMDEuMDEvYW50YXJjdGlj

## Load

Load the data lazily (so that it isnt all loaded into memory at once) using xarray

In [1]:
import xarray as xr
ds = xr.open_dataset('antarctica_ice_velocity_450m_v2.nc')

Inspect the size of the dataset and take a look at the coordinates, variables and dimensions. 

In [2]:
print(f"the dataset is {ds.nbytes/1e9} Gb")

the dataset is 6.814832221 Gb


In [3]:
ds

## Rechunk
Zarr stores are ways of stored multi-dimensional data in a way this is optimized for fast access from distributed cloud computing. Zarr stores use a concept called chunks. Chunks are the smallest units of data that can be downloaded one-at-a-time. It is best to make them smaller than the total size fo the dataset, because then you can avoid downloading ~7 Gb every time, but making them too small introduces overheads that slow things down. The chunk size that the dataset has by default after loading from a netcdf (as we did above) may not be ideal, so one needs to inspect the chunk size and 'rechunk' is nessesary. 

For this dataset, it turns out that if you split each variable into four chunks you get about the right size of chunk. The following cell does this. 

In [4]:
import numpy as np
nx = ds.x.shape[0]
ny = ds.y.shape[0]
ds_rechunked = ds.chunk({'y': np.ceil(ny/2), 'x': np.ceil(nx/2)})
ds_rechunked

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 295.45 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 295.45 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type int32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,int32,numpy.ndarray


## Write to bucket
We will write both the default (small chunked) dataset and the rechunked dataset to the google bucket, for use elsewhere in the book. 

To write this to the google bucket, we require an authentication token, that is private. To do yourself you will need your own google bucket and token specific to that bucket. 

In [5]:
import zarr
import json
import gcsfs
import xarray as xr 

The cell below uses the token to generate a 'file-like object' called `mapper`, which can then be used with the xarray method `to_zarr` to write the dataset to the zarr store.  

In [6]:
with open('/Users/jkingslake/Documents/science/ldeo-glaciology-bc97b12df06b.json') as token_file:
    token = json.load(token_file)
gcs = gcsfs.GCSFileSystem(token=token)
mapper = gcs.get_mapper('gs://ldeo-glaciology/measures/measures') 
mapper_rechunked = gcs.get_mapper('gs://ldeo-glaciology/measures/measures_rechunked') 

In [7]:
ds.to_zarr(mapper)
ds_rechunked.to_zarr(mapper_rechunked)

<xarray.backends.zarr.ZarrStore at 0x145e98270>

## Reload
To check that the data was uploaded correctly, reload both the dataset using the syntax that will be used in the main page making use of these data.

In [8]:
import fsspec
mapper_reload = fsspec.get_mapper('gs://ldeo-glaciology/measures/measures')
ds_reloaded = xr.open_zarr(mapper_reload) 
ds_reloaded

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,2.31 MiB
Shape,"(12445, 12445)","(389, 778)"
Count,2 Graph Layers,512 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 2.31 MiB Shape (12445, 12445) (389, 778) Count 2 Graph Layers 512 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,2.31 MiB
Shape,"(12445, 12445)","(389, 778)"
Count,2 Graph Layers,512 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,2.31 MiB
Shape,"(12445, 12445)","(389, 778)"
Count,2 Graph Layers,512 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 2.31 MiB Shape (12445, 12445) (389, 778) Count 2 Graph Layers 512 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,2.31 MiB
Shape,"(12445, 12445)","(389, 778)"
Count,2 Graph Layers,512 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type int32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 2.31 MiB Shape (12445, 12445) (778, 778) Count 2 Graph Layers 256 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,2.31 MiB
Shape,"(12445, 12445)","(778, 778)"
Count,2 Graph Layers,256 Chunks
Type,float32,numpy.ndarray


In [9]:
mapper_reload = fsspec.get_mapper('gs://ldeo-glaciology/measures/measures_rechunked')
ds_rechunked_reloaded = xr.open_zarr(mapper_reload) 
ds_rechunked_reloaded

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 295.45 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.15 GiB 295.45 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float64 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,1.15 GiB,295.45 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type int32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 590.81 MiB 147.73 MiB Shape (12445, 12445) (6223, 6223) Count 2 Graph Layers 4 Chunks Type float32 numpy.ndarray",12445  12445,

Unnamed: 0,Array,Chunk
Bytes,590.81 MiB,147.73 MiB
Shape,"(12445, 12445)","(6223, 6223)"
Count,2 Graph Layers,4 Chunks
Type,float32,numpy.ndarray
