# COG vs Zarr

COG vs zarr is a big point of contention in the community right now. Let's look at raster data and how it is stored in the Cloud-Optimized GeoTIFF (COG) format, and how that compares to what zarr does with the same data.

Before we get into this we need to get some imports out of the way.

In [1]:
import json
import shutil
import struct
import urllib.request
import zlib

from hashlib import sha256
from pathlib import Path

import numpy as np
import rioxarray
import xarray

from crc32c import crc32c
from numcodecs.zarr3 import Zlib, Delta
from numcodecs.zstd import Zstd
from rio_cogeo.cogeo import cog_translate
from rio_cogeo.profiles import cog_profiles
from tifffile import TiffFile

from IPython.display import JSON

And we will want to be comparing some byte strings. An easy way to identify a sequence of bytes is to use a SHASUM hash, so let's make a little function to generate a hash in hex format and display that along with the on-disk size of the byte string.

In [2]:
def describe_bytes(b: bytes) -> str:
    h = sha256()
    h.update(b)
    print(f"size: {len(b) / 2**20:.3f} MiB | shasum: {h.digest().hex()}")

To organize our data we'll use a specific directory:

In [3]:
OUTDIR = Path('test_data')
OUTDIR.mkdir(exist_ok=True)

## Downloading a COG

We're going to use a Sentinel 2 L2A scene for this exploration. To keep things simple we'll just use one band, and we can link directly to a COG. Specifically, we'll use the red band of the scene `S2B_T10TFR_20231223T190950_L2A`. Let's download that file now.

<div class="alert alert-block alert-info">
    
NOTE: This is a simplified workflow. In practice you might get this href by searching for the scene of interest using the [earth-search STAC API](https://earth-search.aws.element84.com/v1)
    
</div>

In [4]:
COG_HREF = 'https://e84-earth-search-sentinel-data.s3.us-west-2.amazonaws.com/sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif'
COG_FILE = OUTDIR / 'red.tif'

# check if we are rerunning this cell to not download the COG if we already have it
if not COG_FILE.exists():
    with urllib.request.urlopen(urllib.request.Request(COG_HREF)) as response:
        COG_FILE.write_bytes(response.read())

## Reading the COG metadata

Now that we have the COG, we can open it with `tifffile` and read the tags out to see what kind of metadata the file has. Note here that the TIFF file has mutliple "pages" in it, or "images". This is because the COG includes overviews, which internally in the TIFF are stored as what TIFF terms individual images. `tifffile` uses the term "pages" instead of images; the first page here is the full resolution data, and each successive page is the next overview up from the last.

We'll store the tags in a dictionary so we can use the values later as needed.

In [5]:
cog_tags = {}
with TiffFile(COG_FILE) as tif:
    for tag in tif.pages[0].tags:
        cog_tags[tag.name] = tag.value

JSON(cog_tags, root="Default COG")

<IPython.core.display.JSON object>

## Extracting a tile

The tags contain a bunch of great stuff we could discuss at length, but the first thing I think it is worth pointing out is that the tiles in this COG are stored at various offsets within the file. We can see the `TileOffsets` tag contains a sequence of those offsets. Similarly, `TileByteCounts` contains a sequence of the lengths of each of those tiles. Thus, we can read tile (0,0) by grabbing the first value in each of those arrays and using them to seek to and read the tile's bytes. We can hash those bytes to get a unique identifer for those bytes we can print out.

In [6]:
with (COG_FILE).open('rb') as tif:
    tif.seek(cog_tags['TileOffsets'][0])
    cog_tile_bytes = tif.read(cog_tags['TileByteCounts'][0])

describe_bytes(cog_tile_bytes)

size: 1.381 MiB | shasum: 2c02e7e60074d6767ccb4c44de2da249d331fd82e107431e41cfe4069bae0d62


We can see the size of that tile on disk by looking at the number of bytes.

## Creating a zarr from the COG

The library `rio-xarray` contains a number of conveniences for working with geospatial raster data with `xarray`, and `xarray` can write zarrs. So if we use `rio-xarray` to open our COG as an `xarray.Dataset`, we can transform our COG into a zarr. First things first though, let's start by simply opening the file as an `xarray.Dataset`.

In [7]:
ds = rioxarray.open_rasterio(COG_FILE, band_as_variable=True).rename_vars(band_1="red")
ds

Notice that we have the same dimensions as we saw in the COG metadata! That's a good sign. Now let's try writing out our data as zarr v3 using the default settings. We can open it back up to see it is in fact a zarr, too.

In [8]:
ZARR_DEFAULTS = OUTDIR / 'defaults.zarr'

# just in case we've run this before
if ZARR_DEFAULTS.exists():
    shutil.rmtree(ZARR_DEFAULTS)

ds.to_zarr(
    ZARR_DEFAULTS,
    zarr_format=3,
)
xarray.open_dataset(ZARR_DEFAULTS, engine='zarr')



### Reading the zarr metadata

The metadata for zarr is a bit of a contentious topic due to consolidated metadata vs not--we're going to sidestep an opinion in that discussion for the moment and just say that our relevant metadata in this v3 store is stored in the file `red/zarr.json`. Let's open that up and dump its contents to compare with our TIFF tag metadata.

In [9]:
JSON(filename=(ZARR_DEFAULTS / 'red' / 'zarr.json'), root="Default Zarr (zstd)", expanded=True)

<IPython.core.display.JSON object>

Wow, for the most part that looks kinda similar. It is missing spatial reference information directly, which is an important note. But probably the first thing that looks like a problem is the chunk size.

Zarr stores data very similarly to COG in that it spilts the data up into smaller pieces. Zarr terms these "chunks", as opposed to the COG nomenclature of "tiles", but they are functionally equivalent. Here we see that the default chunk size used is 687 x 1373, not 1024 x 1024. To ensure we can effectively compare the data in a zarr chunk to one of our COG tiles we should make these match in size.

## Writing zarr with a specific chunk size

Turns out we can use the `encoding` argument of `to_zarr` to specify the chunk size we want to use. We have to do that on a per-data-variable-basis, so we have to nest our encoding settings within a dictionary keyed off the name of our data variable, which we previously specified was `red`.

In [10]:
ZARR_TILED = OUTDIR / 'tiled.zarr'

# just in case we've run this before
if ZARR_TILED.exists():
    shutil.rmtree(ZARR_TILED)

ds.to_zarr(
    ZARR_TILED,
    zarr_format=3,
    encoding={
        'red': {
            "chunks": (1024, 1024),
        },
    },
)



<xarray.backends.zarr.ZarrStore at 0x78d98c997c70>

The metadata is in the same file within this new zarr store, so we can read it out the same as before to see if that looks any better.

In [11]:
JSON(filename=(ZARR_TILED / 'red' / 'zarr.json'), root="Tiled Zarr", expanded=True)

<IPython.core.display.JSON object>

Indeed it does! Now that we have the same tile/chunk sizing between our COG and our zarr, how do the bytes compare between the two?

### Reading a zarr chunk

Unlike COG, where everything is stored within a single file, by default zarr uses a separate file for metadata, as we have seen, as well as a separate file per chunk. Navigate through the zarr directory tree, and notice that we have a directory tree like `red/c/[0..10]/[0..10]`, as we have 11 chunks in each spatial dimension x and y. This breaks down such that the data for our tile (0,0) is located at the path `red/c/0/0` within the store. We can open that up, read its bytes, and find its length and hash to compare to our COG tile (0,0).

In [12]:
zarr_tile_bytes = (ZARR_TILED / 'red' / 'c' / '0' / '0').read_bytes()

describe_bytes(zarr_tile_bytes)
describe_bytes(cog_tile_bytes)

size: 1.387 MiB | shasum: 6bad8a3594bbdf9300c7f823a5969ece06d6f596d9139908c2f01de51e564af8
size: 1.381 MiB | shasum: 2c02e7e60074d6767ccb4c44de2da249d331fd82e107431e41cfe4069bae0d62


Hmm, those are close lengths, but the bytes are not the same. Should they be?

## Compression codecs

Turns out, maybe! But to understand better why that could maybe be true we need to look more at the COG and zarr metadata to see what other differences we can spot beyond the original tile/chunk size difference.

Notice in the tiff tags we see `Compression` has the value 8. Similarly, we see in the zarr metadata under the `codecs` that `zstd` is specified. Both of these bits of metadata are indicating how the tiles/chunks are compressed. In the case of the COG, the value of the `Compression` tag requires we have an external lookup table to interpret. [Wikipedia has a great lookup table](https://en.wikipedia.org/wiki/TIFF#TIFF_Compression_Tag) we can use to see what this value of 8 indicates--turns out it means the tiles are each individually compressed using the `DEFLATE` algorithm.

On the zarr side, the metadata is more verbose and is self documenting: `zstd` is a different compression algorithm. So we see that using the zarr defaults we did not end up using the same compression that our COG uses. If we correct that mismatch will our tile bytes be the same?

### Predictors

Before we waste too much time trying out `DEFLATE` on our zarr just to find another difference, let me shortcut this process by pointing out one more difference. Our COG metadata indicates a `Predictor` of 2. What is this?

Predictors are filters that can be run on the data prior to compression to increase the compressibility of the data. The TIFF specification has support for a few different predictors, but the main three values are as follows:

* 1: no predictor used prior to compression
* 2: delta predictor; this calculates the horizontal difference between cells in each tile row, only useful for integer data
* 3: [floating point byte reordering](http://chriscox.org/TIFFTN3d1.pdf), only useful for floating point data

Zarr also supports predictors, but does so via the more flexible paradigm of "filters". Filters can be used for any data transformations that need to happen prior to compressing data, or be reversed after decompressing data. Predictors are just one type of a filter; another example could be like scaling, offsetting, and casting values to allow transforming floats and/or signed values into smaller, less complicated unsigned integers (and in fact our COG has scaled/offset values, with no standard way (as in, within the TIFF specification) to apply those transformations).

So putting all this together, we need to change our zarr configuration to use `DEFLATE` compression and a predictor to see if we can match our COG data. Let's try it out!

In [13]:
ZARR_DEFLATE = OUTDIR / 'deflate.zarr'

# just in case we've run this before
if ZARR_DEFLATE.exists():
    shutil.rmtree(ZARR_DEFLATE)

ds.to_zarr(
    ZARR_DEFLATE,
    zarr_format=3,
    encoding={
        'red': {
            'filters': [Delta(dtype='uint16', astype='uint16')],
            # We just happen to know the compression level on Earth Search is max
            'compressors': [Zlib(level=9)],
            'chunks': (1024, 1024),
        },
    },
    safe_chunks=False,
)

  super().__init__(**codec_config)
  super().__init__(**codec_config)
  super().__init__(**codec_config)
  super().__init__(**codec_config)


<xarray.backends.zarr.ZarrStore at 0x78d97ff6edd0>

Cool! Let's see how our new chunk bytes compare to our COG.

In [14]:
zarr_deflate_tile_bytes = (ZARR_DEFLATE / 'red' / 'c' / '0' / '0').read_bytes()

describe_bytes(zarr_deflate_tile_bytes)
describe_bytes(cog_tile_bytes)

size: 1.381 MiB | shasum: 90426e3d6b12faa3813fad720b7537ab329873d97893c019a8b6f6cc78ceae14
size: 1.381 MiB | shasum: 2c02e7e60074d6767ccb4c44de2da249d331fd82e107431e41cfe4069bae0d62


Hmm, well, the bytes still don't match. Again, should they???

It turns out they should. Except we're still running into a difference, one that is not captured in the metadata of either file. The COG files served by Earth Search are written using a GDAL binary built against [libdeflate](https://github.com/ebiggers/libdeflate), a more modern and optimized `DEFLATE` implementation than the zlib library used by the standard library, wrapped here by the zarr `Zlib` codec class. As difference algorithms are responsible for writing the compressed data in each case, we're inevitablely going to have differences in our output.

The incessant differences betwene the two software stacks is disappointing when I am trying to show to you, dear reader, that COG and zarr data bytes should be identical, assuming equal compression, tile size, etc. But this is still a good lesson: even when the software stacks seem like they are doing the same thing, it's always possible for variablities at a lower layer to lead to unanticipated differences in performance or outputs. Note that we also tried going the other way, recompressing the full COG with zstd, and we ran into the same small differences in file bytes.

Remember this lesson when hearing others talk about the performance of COG vs zarr. Drawing accurate and universal conclusions from such a comparison is fraught due to many subtle difference between software stacks. Really, the conclusion of any differences in performance between file formats with the same data encoding can only be considered a test of the software stacks and their differences. Any performance benefit one direction or the other could be resolved in the next software release.

### Are we sure the bytes really should be the same?

Yes, and we can try to prove it. If we extract the COG bytes into a numpy array and do the same with the zarr bytes, we should be able to compare the two for equality and see they are the same.

In [15]:
# extract the COG bytes
cog_tile_bytes_extracted = zlib.decompress(cog_tile_bytes, 0)

# make an array
cog_tile_array = np.frombuffer(cog_tile_bytes_extracted, dtype=np.uint16).reshape(cog_tags['TileLength'], cog_tags['TileWidth'])

# also need to reverse predictor
cog_tile_array = np.cumsum(cog_tile_array, axis=1, dtype=np.uint16)

# extract our original zstd zarr bytes and make an array
zstd = Zstd()
zarr_bytes_extracted = zstd.decode(zarr_tile_bytes)
zarr_chunk_array = np.frombuffer(zarr_bytes_extracted, dtype=np.uint16).reshape(cog_tags['TileLength'], cog_tags['TileWidth'])

# compare
print(f'It is {bool((cog_tile_array == zarr_chunk_array).all())} that the arrays are equal')

It is True that the arrays are equal


Maybe you still don't believe me that these bytes can be stored the same in each format. To help make the point a different way, we can compress the COG tile bytes using the same zstd codec that zarr uses and compare the compressed bytes.

In [16]:
cog_bytes_zstd = zstd.encode(cog_tile_array.tobytes())

describe_bytes(cog_bytes_zstd)
describe_bytes(zarr_tile_bytes)

size: 1.387 MiB | shasum: 6bad8a3594bbdf9300c7f823a5969ece06d6f596d9139908c2f01de51e564af8
size: 1.387 MiB | shasum: 6bad8a3594bbdf9300c7f823a5969ece06d6f596d9139908c2f01de51e564af8


## Okay, for reals, let's make the COG and zarr match

The problems we've seen so far trying to demonstrate that the data is the same between COG and zarr has come down to differences in compression. So what happens if we remove compression? Let's try rewriting our COG without compression, and writing _another_ zarr from the original COG without compression. If the hypothesis is right and the data is the same, then this test should prove it.

### Rewriting our COG

We can leverage the GDAL/rasterio wrapper `rio-cogeo` to easily rewrite our COG file. Let's do that, then use `tifffile` again to grab its metadata.

In [17]:
COG_RAW = OUTDIR / 'red_raw.tif'

profile = cog_profiles.get('raw')
profile.update({
    'blockxsize': 1024,
    'blockysize': 1024,
})

cog_translate(COG_FILE, COG_RAW, profile)

raw_tags = {}
with TiffFile(COG_RAW) as tif:
    for tag in tif.pages[0].tags:
        tag_name, tag_value = tag.name, tag.value
        raw_tags[tag_name] = tag_value
JSON(raw_tags, root="Raw COG")

Reading input: test_data/red.tif

Adding overviews...
Updating dataset tags...
Writing output to: test_data/red_raw.tif


<IPython.core.display.JSON object>

We now see the `Compression` tag has a value of 1, which indicates no compression. We also don't see the `Predictor` anymore; as `Predictor` is only relevant with compression, it is omitted from the tags now.

### Writing an uncompressed zarr

We'll continue to use our dataset `ds`, opened from the original COG file, to write out a zarr store. To have the data be uncompressed we can specify `compressors` like we did before, except this time we'll simply leave the list empty.

In [18]:
ZARR_RAW = OUTDIR / 'raw.zarr'

# just in case we've run this before
if ZARR_RAW.exists():
    shutil.rmtree(ZARR_RAW)

ds.to_zarr(
    ZARR_RAW,
    zarr_format=3,
    encoding={
        'red': {
            "chunks": (1024, 1024),
            "compressors": [],
        },
    },
)



<xarray.backends.zarr.ZarrStore at 0x78d97ff6c670>

Once more, let's check the metadata to confirm we don't have any compression in use: 

In [19]:
JSON(filename=(ZARR_RAW / 'red' / 'zarr.json'), root="Raw Zarr", expanded=True)

<IPython.core.display.JSON object>

Indeed, we see under `codecs` only one listed, `"bytes"`. No compression!

### Comparing our data bytes

We already know how to grab a tile from a COG, and a chunk from a zarr. Let's get the first COG tile and zarr chunk from our uncompressed versions of each and compare them.

In [20]:
with (COG_RAW).open('rb') as tif:
    tif.seek(raw_tags['TileOffsets'][0])
    cog_raw_tile_bytes = tif.read(raw_tags['TileByteCounts'][0])

zarr_raw_tile_bytes = (ZARR_RAW / 'red' / 'c' / '0' / '0').read_bytes()

describe_bytes(zarr_raw_tile_bytes)
describe_bytes(cog_raw_tile_bytes)

size: 2.000 MiB | shasum: 793e95ed73d0ac5170a5960a6e25e49045353c3d2cd59dd8781a820cc9b23cd0
size: 2.000 MiB | shasum: 793e95ed73d0ac5170a5960a6e25e49045353c3d2cd59dd8781a820cc9b23cd0


Look at that! The bytes match. This proves that the bytes representing the data stored in a zarr chunk are _exactly the same_ as the bytes in the portion of a COG file storing the same data, when controling for variables like compression and other filters.

## What does this mean?

Wait, if this is the same data...

Zarr v3 introduced a concept called "shards". The idea with shards is to take some number of an array's chunks and write them into a single file on disk. Why? Shards have some distinct advantages at scale, such as reducing the number of files that need to be managed within the store, and by allowing for more efficient reads when needing consecutive chunks.

But if shards are just chunks put into the same file, what is the difference between a COG and a shard, aside from the COG storing its metadata in the file as well? Turns out, maybe nothing.

### We can build a COG that is a zarr that is a COG!

We're going to need to do some hacking. First, let's build a metadata config for a sharded zarr array for our image data, where all the chunks are stored in a single shard.

In [21]:
DEFLATE_SHARDED_ZARR_CONF = {
    "shape": [
        10980,
        10980
    ],
    "data_type": "uint16",
    "chunk_grid": {
        "name": "regular",
        "configuration": {
            "chunk_shape": [
                11264,
                11264
            ]
        }
    },
    "chunk_key_encoding": {
        "name": "default",
        "configuration": {
            "separator": "/"
        }
    },
    "fill_value": 0,
    "codecs": [
        {
            "name": "sharding_indexed",
            "configuration": {
                "chunk_shape": [
                    1024,
                    1024
                ],
                "codecs": [
                    {
                        "name": "numcodecs.delta",
                        "configuration": {
                            "dtype": "uint16",
                            "astype": "uint16"
                        }
                    },
                    {
                        "name": "bytes",
                        "configuration": {
                            "endian": "little"
                        }
                    },
                    {
                        "name": "numcodecs.zlib",
                        "configuration": {
                            "level": 9
                        }
                    }
                ],
                "index_codecs": [
                    {
                        "name": "bytes",
                        "configuration": {
                            "endian": "little"
                        }
                    },
                    {
                        "name": "crc32c"
                    }
                ],
                "index_location": "end"
            }
        }
    ],
    "attributes": {
        "OVR_RESAMPLING_ALG": "AVERAGE",
        "AREA_OR_POINT": "Area",
        "STATISTICS_MAXIMUM": 17408,
        "STATISTICS_MEAN": 1505.1947339533,
        "STATISTICS_MINIMUM": 294,
        "STATISTICS_STDDEV": 659.24503616433,
        "STATISTICS_VALID_PERCENT": 99.999,
        "scale_factor": 0.0001,
        "add_offset": -0.1,
        "_FillValue": 0
    },
    "dimension_names": [
        "y",
        "x"
    ],
    "zarr_format": 3,
    "node_type": "array",
    "storage_transformers": []
}

RAW_SHARDED_ZARR_CONF = {
    "shape": [
        10980,
        10980
    ],
    "data_type": "uint16",
    "chunk_grid": {
        "name": "regular",
        "configuration": {
            "chunk_shape": [
                11264,
                11264
            ]
        }
    },
    "chunk_key_encoding": {
        "name": "default",
        "configuration": {
            "separator": "/"
        }
    },
    "fill_value": 0,
    "codecs": [
        {
            "name": "sharding_indexed",
            "configuration": {
                "chunk_shape": [
                    1024,
                    1024
                ],
                "codecs": [
                    {
                        "name": "bytes",
                        "configuration": {
                            "endian": "little"
                        }
                    },
                ],
                "index_codecs": [
                    {
                        "name": "bytes",
                        "configuration": {
                            "endian": "little"
                        }
                    },
                    {
                        "name": "crc32c"
                    }
                ],
                "index_location": "end"
            }
        }
    ],
    "attributes": {
        "OVR_RESAMPLING_ALG": "AVERAGE",
        "AREA_OR_POINT": "Area",
        "STATISTICS_MAXIMUM": 17408,
        "STATISTICS_MEAN": 1505.1947339533,
        "STATISTICS_MINIMUM": 294,
        "STATISTICS_STDDEV": 659.24503616433,
        "STATISTICS_VALID_PERCENT": 99.999,
        "scale_factor": 0.0001,
        "add_offset": -0.1,
        "_FillValue": 0
    },
    "dimension_names": [
        "y",
        "x"
    ],
    "zarr_format": 3,
    "node_type": "array",
    "storage_transformers": []
}

Now, we're going to do some frankenstein operations, starting by copying our uncompressed (raw) zarr to a new directory tree--so creatively called `zog`. Then we are going to remove all our `red` data chunks, and instead copy in our COG file as the shard at the path of `red/c/0/0`. With all this in place we'll overwrite our `red` metadata, both in its `zarr.json` and in the consolidated metadata in the top-level `zarr.json`.

After all that we're still not done. See, zarr expects to find a shard index within the shard. Looking at the metadata we constructed above, we see the `index_location` is at the end of the shard file. So we need to construct that index and write it to the end of the file.

The format of that index is rather interesting. It is itself an array, with the dimensions of our shard plus an extra dimension on the end. That is, our shard is shape (11, 11), and our index must be (11, 11, 2). This is because we need two values per chunk in the shard, the byte offset of the start of the chunk and its length in bytes. Just like we already have in our COG tags, just paired up and stored in a multidimensional array.

The array we can cast to bytes so we can append it to the end of our COG. Except we have to do one more thing. Notice in the `index_codecs` in the metadata above we have a codec listed as `crc32c`. This is actually optional, and we could have omitted it here, but it is best practice to include it, so for completeness we will include it. This is a checksum of the index, and helps us verify its integrity. We can calculate it on the bytes from our array; the output is a 32-bit unsigned int, so we can use `struct.pack` to take the Python int value and encode it as a 32-bit unsigned int in bytes, and append that to our index bytes.

Now our index bytes are ready, so we'll write them to the end of our COG.

In [22]:
ZOG = OUTDIR / 'zog'
COG_SHARD = (ZOG / 'red' / 'c' / '0' / '0')

# just in case we've run this before
if ZOG.exists():
    shutil.rmtree(ZOG)

shutil.copytree(ZARR_RAW, ZOG)
shutil.rmtree(COG_SHARD.parent.parent)
COG_SHARD.parent.parent.mkdir()
COG_SHARD.parent.mkdir()
shutil.copy(COG_RAW, COG_SHARD)
(ZOG / 'red' / 'zarr.json').write_text(json.dumps(RAW_SHARDED_ZARR_CONF, indent=4))
consolidated = json.loads((ZOG / 'zarr.json').read_text())
consolidated['consolidated_metadata']['metadata']['red'] = RAW_SHARDED_ZARR_CONF
(ZOG / 'zarr.json').write_text(json.dumps(consolidated, indent=4))

index_bytes = np.array(list(zip(raw_tags['TileOffsets'], raw_tags['TileByteCounts']))).reshape((11, 11, 2)).tobytes()
index_bytes += struct.pack('I', crc32c(index_bytes))

with COG_SHARD.open('ab') as fh:
    fh.write(index_bytes)

Phew, that was a lot. Let's try this out and see what we get if we read a slice of values from the file. We'll compare this to the original zarr we created back at the beginning to verify the values look correct. To ensure we have our index correct beyond the first tile, let's take a slice on a tile corner to have to load four chunks.

In [23]:
ds_zog = xarray.open_dataset(ZOG, engine='zarr')
ds_zog['red'][1020:1030,1020:1030].values

array([[-0.0029,  0.0003, -0.0015, -0.0017, -0.0003, -0.0053, -0.007 ,
        -0.0085, -0.0038, -0.0017],
       [ 0.0026,  0.0019, -0.0013, -0.0041, -0.0043, -0.0066, -0.0073,
        -0.0081, -0.0037, -0.0016],
       [ 0.0036, -0.0009, -0.0037, -0.006 , -0.007 , -0.0043, -0.0018,
        -0.0039, -0.0051, -0.0004],
       [-0.0042, -0.0041, -0.0055, -0.0053, -0.0088, -0.0071, -0.0021,
        -0.0034, -0.003 ,  0.0005],
       [ 0.0025,  0.0026, -0.0013,  0.0004, -0.0077, -0.0045, -0.0018,
        -0.0039, -0.0023,  0.0009],
       [-0.0056, -0.0044, -0.0044, -0.0044, -0.0028, -0.0013, -0.0023,
        -0.0007,  0.0027,  0.0005],
       [-0.013 , -0.0137, -0.0072, -0.0032, -0.0018, -0.0022, -0.003 ,
         0.0001,  0.0031, -0.0028],
       [-0.0108, -0.0101, -0.0144, -0.009 , -0.0042, -0.0016, -0.0058,
        -0.0079, -0.0035, -0.0069],
       [-0.0078, -0.0051, -0.0088, -0.0113, -0.0075, -0.003 , -0.0046,
        -0.0028, -0.0068, -0.0051],
       [-0.0083, -0.0041, -0.0047, -0

In [24]:
ds_zog = xarray.open_dataset(ZOG, engine='zarr')
ds_zog['red'][1,:1024].values

array([0.0246, 0.035 , 0.0271, ..., 0.0027, 0.0046, 0.0041], shape=(1024,))

In [25]:
ds_zarr = xarray.open_dataset(ZARR_DEFAULTS, engine='zarr')
ds_zarr['red'][1,:1024].values

array([0.0246, 0.035 , 0.0271, ..., 0.0027, 0.0046, 0.0041], shape=(1024,))

Wow! It worked. There you have it, you can use a COG as a shard in a zarr array. Here we didn't use any compression, but we could have compressed the COG as long as we had used a zarr-compatible codec.