<a href="https://colab.research.google.com/github/mridul-sahu/tensorstore-tutorial/blob/main/TensorStore_Tutorial_From_Zero_to_Hero_%F0%9F%9A%80.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TensorStore Tutorial: From Zero to Hero** 🚀

Welcome! This notebook is your complete guide to TensorStore. We'll start from the absolute basics and build up to advanced concepts, explaining *why* we need each feature along the way.

---

## **Module 1: The Problem with Big Data & Your First TensorStore**

### **The "Why": The Limits of In-Memory Computing** 🤔

Imagine you're a neuroscientist with a 5-terabyte dataset of high-resolution brain scans. Your laptop probably has 16 or 32 GB of RAM. What happens when you try to load your data?

```python
# import numpy as np
# huge_array = np.load('my_5_terabyte_dataset.npy') # <-- This will crash!
```

The program will immediately crash with an OutOfMemoryError. This is the fundamental problem that tools like TensorStore are designed to solve.

### **Key Pain Points of "Big Data":**

1. **Data Exceeds RAM**: You cannot load the entire dataset into memory at once.

2. **You Only Need a Piece**: You don't want to analyze all 5TB at once. You might only need to inspect a single patient's scan, or even just a small region of that scan.

3. **Data Lives Elsewhere**: The data is almost never on your local machine. It's usually stored in the cloud (like Google Cloud Storage or Amazon S3).

**TensorStore's Solution**: TensorStore acts as a "universal remote" for your array data. It provides a familiar NumPy-like interface (array[x, y, z]) but intelligently reads only the specific bytes you request from the underlying storage, whether it's a local file or a cloud object.

In [None]:
!pip install tensorstore



### **The "Spec": Your Data's Recipe Card** 📝

Every TensorStore begins with a **`spec`** (specification). This is a simple Python dictionary that acts as a recipe card, telling TensorStore everything it needs to know about your data's structure and location.

* `driver`: The storage **format**, like `zarr` (a popular, modern format for chunked arrays).
* `kvstore`: Short for "key-value store", this dictionary specifies the **location** where the `zarr` chunks will be saved. We'll start with a `file` driver to save to our local disk.
* `metadata`: A dictionary containing the crucial properties of the array:
    * `dtype`: The data type, specified in Zarr's precise format (e.g., `'|u1'` for `uint8`).
    * `shape`: The dimensions of the full array.
    * `chunks`: The dimensions of the small blocks the array is broken into.

In [None]:
import tensorstore as ts
import numpy as np
import os

# Define the recipe for our local data store.
# This doesn't create any files yet, it's just a plan.
spec = {
    'driver': 'zarr', # The format is Zarr
    'kvstore': {
        'driver': 'file', # The location is the local filesystem
        'path': '/tmp/my_zarr_dataset'
    },
    'metadata': {
        'dtype': '|u1',
        'shape': [100, 256, 256],  # 100 "images", each 256x256 pixels
        'chunks': [16, 64, 64] # For Zarr, the key is 'chunks'
    }
}

print("Our TensorStore Spec:")
print(spec)

Our TensorStore Spec:
{'driver': 'zarr', 'kvstore': {'driver': 'file', 'path': '/tmp/my_zarr_dataset'}, 'metadata': {'dtype': '|u1', 'shape': [100, 256, 256], 'chunks': [16, 64, 64]}}


### **"Hello, World!": Creating Your First Store**

Now that we have the recipe, let's bring our data store to life. We use `ts.open()` and pass it our `spec`. We also include two important flags:

* `create=True`: This tells TensorStore to create the dataset if it doesn't already exist.
* `delete_existing=True`: This will clear any old data at that path, ensuring we start fresh.

In [None]:
# Cell 6: Opening the Store

# This operation will create the directory /tmp/my_zarr_dataset
# and write some metadata files inside it.
store = ts.open(spec, create=True, delete_existing=True).result()

print("Successfully created a TensorStore object!")
print("\n--- Store Details ---")
print("Data Type (dtype):", store.dtype)
print("Dimensions (ndim):", store.ndim)
print("Shape:", store.shape)
# The 'domain' includes the shape and coordinate system information
print("Domain:", store.domain)
print("Schema", store.schema)

Successfully created a TensorStore object!

--- Store Details ---
Data Type (dtype): dtype("uint8")
Dimensions (ndim): 3
Shape: (100, 256, 256)
Domain: { [0, 100*), [0, 256*), [0, 256*) }
Schema Schema({
  'chunk_layout': {
    'grid_origin': [0, 0, 0],
    'inner_order': [0, 1, 2],
    'read_chunk': {'shape': [16, 64, 64]},
    'write_chunk': {'shape': [16, 64, 64]},
  },
  'codec': {
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'driver': 'zarr',
    'filters': None,
  },
  'domain': {
    'exclusive_max': [[100], [256], [256]],
    'inclusive_min': [0, 0, 0],
  },
  'dtype': 'uint8',
  'rank': 3,
})


## **Module 2: Core Mechanics - Reading, Writing, and Asynchronous Magic**

### **The "Why": The I/O Bottleneck & The Power of Chunking** 🐢💨

Think of your 5TB dataset on disk as a giant library. If you wanted one sentence from one book, you wouldn't copy the entire library to your desk first. You'd go to the right shelf, pull the right book, and read just that one sentence.

This is the core idea behind **chunking**. TensorStore breaks your giant array into small, independent blocks on disk, as defined by the `chunk_layout` in our spec. When you ask for a small piece of the array, TensorStore calculates exactly which chunks it needs and reads only them. This is why it's so efficient.

Furthermore, reading from a disk or a network is thousands of times slower than reading from RAM. This delay is the **I/O bottleneck**. To prevent your entire program from freezing while waiting, TensorStore performs all I/O operations **asynchronously**.

### **Writing Data to the Store** ✍️

Writing data is as intuitive as assigning values to a NumPy array. You use standard slicing to specify the region you want to write to and then call the `.write()` method. Because of chunking, you don't have to worry about the rest of the array; TensorStore will locate and modify only the affected chunks.

In [None]:
# Cell 9: Writing a Data Patch

# First, let's reopen our existing store. We don't need 'create=True' anymore.
# We can just pass the spec directly.
spec['create'] = False
spec['open'] = True
store = ts.open(spec).result()

# Create a small 64x64 numpy array (a "patch") filled with the value 255 (white).
patch_data = np.full(shape=[64, 64], fill_value=255, dtype='uint8')

# Write this patch into the 50th "image" (index 49) at a specific location.
# Note: The write operation is queued in the background.
write_future = store[49, 100:164, 100:164].write(patch_data)

# To ensure the write is complete before we move on, we call .result().
# This "waits" for the future to resolve.
write_future.result()

print("✅ Patch written successfully!")

# Let's see what was created on disk. Zarr creates metadata files like '.zarray'.
print("\n--- Files on Disk ---")
!ls /tmp/my_zarr_dataset

✅ Patch written successfully!

--- Files on Disk ---
3.1.1  3.1.2  3.2.1  3.2.2


### **Reading Data from the Store** 🧐

Reading works the same way. You specify the region you want with slicing and call the `.read()` method. This will return a `Future` object that, once its result is ready, will contain a standard NumPy array with your requested data.

In [None]:
# Cell 11: Reading and Verifying

# Read the same 64x64 region back from the store.
read_future = store[49, 100:164, 100:164].read()

# Get the result, which will be a NumPy array.
read_data = read_future.result()

print("Shape of read data:", read_data.shape)
print("Is the data we read back the same as our original patch?", np.array_equal(patch_data, read_data))

# What happens if we read from a location we never wrote to?
# TensorStore returns a block of zeros (the default fill value).
empty_region = store[0, 0:5, 0:5].read().result()

print("\nAn unwritten region contains:")
print(empty_region)

Shape of read data: (64, 64)
Is the data we read back the same as our original patch? True

An unwritten region contains:
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]


### **Asynchronous Magic (`Future`)** ⚡

We've been using `.result()` to wait for a single operation. But what if you have hundreds of writes to perform? Waiting for each one sequentially would be slow and defeat the purpose of asynchronous I/O.

When you call `.write()` or `.read()`, TensorStore gives you a **`Future`** object *immediately*. This object is a promise that the work will be done. Your Python script can continue running and doing other things. You can gather many of these `Future` objects and then wait for all of them to complete at once. This allows TensorStore to schedule all your I/O operations in the most efficient way possible in the background.

In [None]:
# Cell 13: Working with Futures

import time

# Let's write three different patches to three different locations.
patch_1 = np.full([10, 10], 1, dtype='uint8')
patch_2 = np.full([10, 10], 2, dtype='uint8')
patch_3 = np.full([10, 10], 3, dtype='uint8')

start_time = time.time()

# We start the write operations but DON'T wait for them individually.
# This queues them up.
future1 = store[0, 0:10, 0:10].write(patch_1)
future2 = store[1, 0:10, 0:10].write(patch_2)
future3 = store[2, 0:10, 0:10].write(patch_3)

print("All three write operations have been queued.")

# Now, we wait for each one to finish by calling .result() on each.
# This ensures all are complete before we proceed.
future1.result()
future2.result()
future3.result()

end_time = time.time()

print(f"\nAll three writes completed in {end_time - start_time:.4f} seconds.")

# Let's verify one of them.
print("Data from the second write:", store[1, 5, 5].read().result())

All three write operations have been queued.

All three writes completed in 0.0293 seconds.
Data from the second write: 2


## **Module 3: Moving to the Cloud** ☁️

### **The "Why": Science and ML Live in the Cloud**

So far, we've only worked with data on our local disk. But modern large-scale datasets for science and machine learning almost always live in cloud storage (like Google Cloud Storage, Amazon S3, or Azure Blob Storage).

A key challenge is accessing this data efficiently without rewriting your analysis code every time you switch storage providers. You need a tool that can speak the language of cloud storage natively. This is where TensorStore truly shines.

### **The `kvstore` Abstraction: Separating "What" from "Where"**

TensorStore uses a brilliant abstraction to achieve this. It separates the **data format** from the **storage location**.

1.  **The `driver`:** This defines the *format* of your array on disk. Popular formats for chunked arrays include `zarr` and `n5`. This is the "what".
2.  **The `kvstore`:** This stands for "key-value store". It's a field inside the `spec` that tells the `driver` *where* to store its chunks. This `kvstore` has its own driver, like `file` for local disk or `gcs` for Google Cloud Storage. This is the "where".

By nesting the `kvstore` spec inside the main spec, you can mix and match formats and storage locations with incredible flexibility.

In [None]:
# Cell 16: Revisiting the `spec` for Cloud Storage

# Let's see a spec for a Zarr-formatted array on the local filesystem.
# The 'zarr' driver uses a 'kvstore' to know where to save its files.
local_zarr_spec = {
    'driver': 'zarr',
    'kvstore': {
        'driver': 'file',
        'path': '/tmp/my_local_zarr_dataset'
    },
    'metadata': {
        'shape': [1000, 1000],
        'dtype': 'float32',
    }
}

print("--- Spec for a Local Zarr array ---")
print(local_zarr_spec)


# Now, to move this to Google Cloud Storage, we only change the 'kvstore'.
# The top-level 'driver' remains 'zarr'.
#
# THIS CELL IS AN EXAMPLE AND WILL NOT RUN unless you have a GCS bucket
# and have authenticated your environment.
gcs_zarr_spec = {
    'driver': 'zarr',
    'kvstore': {
        'driver': 'gcs',
        'bucket': 'your-gcs-bucket-name' # <-- You would change this
    },
    'path': 'path/to/your/data', # <-- And this path within the bucket
    'metadata': {
        'shape': [1000, 1000],
        'dtype': 'float32',
    }
}

print("\n\n--- Spec for a Zarr array on Google Cloud Storage ---")
print(gcs_zarr_spec)

--- Spec for a Local Zarr array ---
{'driver': 'zarr', 'kvstore': {'driver': 'file', 'path': '/tmp/my_local_zarr_dataset'}, 'metadata': {'shape': [1000, 1000], 'dtype': 'float32'}}


--- Spec for a Zarr array on Google Cloud Storage ---
{'driver': 'zarr', 'kvstore': {'driver': 'gcs', 'bucket': 'your-gcs-bucket-name'}, 'path': 'path/to/your/data', 'metadata': {'shape': [1000, 1000], 'dtype': 'float32'}}


### **The Power of Portability**

Notice what happened in the cell above. The only part that changed was the `kvstore` block.

This is the most powerful takeaway: **your Python code for reading and writing data does not change at all.**

```python
# This code works IDENTICALLY for both local and cloud stores:
#
# store = ts.open(local_zarr_spec, create=True).result()
# store[0:100, 0:100].write(some_numpy_array).result()
#
# store = ts.open(gcs_zarr_spec, create=True).result()
# store[0:100, 0:100].write(some_numpy_array).result()
```

## **Module 4: Advanced Superpowers - Views and Transactions**

### **The "Why": The Cost of Duplication and The Danger of Concurrency**

As your data workflows become more complex, two new challenges emerge:

1.  **Transformations:** What if you need to work with a transformed version of your data? For example, a down-sampled version for visualization, or a transposed view for a specific algorithm. Creating a full copy for each transformation is incredibly expensive in terms of storage and computation.
2.  **Concurrency:** What if multiple programs—or even multiple threads in the same program—try to write to the same dataset at the same time? This can lead to a "race condition," where the final state of the data is corrupted and incorrect because the operations interleave in an unpredictable way.

TensorStore provides elegant solutions for both of these problems: **Virtual Views** and **Transactions**.

### **Virtual Views 👓: Zero-Cost Transformations**

A **virtual view** is one of TensorStore's most powerful features. It's like putting a special "lens" on your data that transforms it on the fly, without ever creating a second copy in storage. The transformations are applied as you read or write data through the view.

This is extremely efficient. You can create views that select, slice, reverse, or reorder dimensions, and the only cost is a tiny amount of memory for the view object itself.

In [None]:
# Let's reopen the store from the previous modules
spec = {
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': '/tmp/my_zarr_dataset'},
    'open': True
}
store = ts.open(spec).result()

# Create our 2x3 pixel letter 'L' patch
l_patch = np.array([[255, 0, 0],
                    [255, 255, 255]], dtype='uint8')


# Apply the full transformation (slice and flip) in a single operation
# directly on the base `store` object for both writing and reading.

# 1. WRITE to the transformed location.
# The slice [255:252:-1] is the flipped version of [0:3].
store[0, 0:2, 255:252:-1].write(l_patch).result()
print("✅ Wrote an 'L' patch directly to the flipped location in the store.")


# 2. READ from the exact same transformed location to verify.
read_data = store[0, 0:2, 255:252:-1].read().result()
print("\nData read back from the transformed location:")
print(read_data)

# 3. VERIFY that what we read is what we wrote.
print("\nIs the data we read the same as the patch we wrote?", np.array_equal(l_patch, read_data))

✅ Wrote an 'L' patch directly to the flipped location in the store.

Data read back from the transformed location:
[[255   0   0]
 [255 255 255]]

Is the data we read the same as the patch we wrote? True


### **Transactions ⚛️: Ensuring Data Integrity**

A **transaction** is a sequence of operations that is guaranteed to be **atomic**. Atomicity is an all-or-nothing promise: either every single operation within the transaction succeeds, or the entire set of operations is rolled back as if it never happened.

This is crucial for preventing data corruption. For example, in a "read-modify-write" cycle, a transaction ensures that no other process can modify the data between the time you read it and the time you write your changes back. TensorStore achieves this using an optimistic concurrency model, which is highly efficient for most workloads.

In [None]:
import tensorstore as ts

# Let's reopen the store
spec = {
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': '/tmp/my_zarr_dataset'},
    'open': True
}
store = ts.open(spec).result()

# Let's imagine we want to "increment" a value in our store.
pixel_coord = (10, 20, 30)

# 1. Create an instance of the Transaction class.
txn = ts.Transaction()

# 2. Open the store within the context of the transaction.
#    All subsequent operations on `store_in_txn` are part of this transaction.
store_in_txn = store.with_transaction(txn)

# 3. Perform the read-modify-write sequence.
#    These operations are staged within the transaction but not yet committed.
current_value_future = store_in_txn[pixel_coord].read()
current_value = current_value_future.result()
new_value = current_value + 1
write_future = store_in_txn[pixel_coord].write(new_value)

# Wait for the write to be staged.
write_future.result()

# 4. Explicitly commit the transaction.
#    This makes all the changes (the increment) final and atomic.
txn.commit_async().result()

print(f"Transaction committed. Incremented pixel at {pixel_coord}.")

# Let's verify the new value outside the transaction
final_value = store[pixel_coord].read().result()
print(f"Verified new value: {final_value}")

# If you run this cell again, the value will correctly increment to 2.

Transaction committed. Incremented pixel at (10, 20, 30).
Verified new value: 1


## **Module 5: The Grand Finale - How and Why Orbax Uses TensorStore**

### **The "Why": The ML Checkpointing Challenge** 💾

In modern machine learning, models can be enormous, with billions of parameters. During training, you need to periodically save the state of your model (the "weights" or "parameters") to a file. This is called **checkpointing**.

Checkpointing is critical for two reasons:
1.  **Fault Tolerance:** If your training job crashes (which is common in large-scale environments), you can resume from the last saved checkpoint instead of starting over.
2.  **Inference:** Once the model is trained, you use the final checkpoint to run it on new data.

The challenge is that a model isn't one big array; it's a complex structure (often called a `PyTree` in JAX) containing thousands of individual arrays. Saving each one as a separate file is incredibly slow and inefficient, especially on cloud file systems. This is the exact problem Orbax is designed to solve, using TensorStore as its engine.

### **Orbax: A Checkpointing Library Powered by TensorStore**

**Orbax** is a library designed specifically for robust and performant checkpointing in JAX. Instead of reinventing the wheel for writing array data to storage, Orbax delegates this critical task to TensorStore.

Think of it this way:
* **You** tell Orbax to save your JAX model.
* **Orbax** figures out the structure of your model and organizes the checkpointing process.
* **TensorStore** does the heavy lifting of efficiently writing each individual array from your model to the physical storage (local disk or cloud).

In [None]:
# First, we need to install Orbax and JAX.
!pip install orbax-checkpoint jax jaxlib

In [None]:
import jax
import jax.numpy as jnp
from orbax import checkpoint as ocp
import os

# 1. Create a sample JAX PyTree (like a simple ML model's parameters).
# This is a dictionary of named arrays.
my_model_params = {
    'layer1': {
        'weights': jnp.ones((128, 64)),
        'bias': jnp.zeros((64,))
    },
    'layer2': {
        'weights': jnp.ones((64, 32)),
        'bias': jnp.zeros((32,))
    }
}

# 2. Set up an Orbax Checkpointer. This object manages saving and restoring.
# We're telling it to save to a directory named '/tmp/my_orbax_checkpoint/'.
ckpt_dir = '/tmp/my_orbax_checkpoint/'
checkpointer = ocp.StandardCheckpointer()

# 3. Save the PyTree. Orbax uses TensorStore under the hood to write the data.
checkpointer.save(os.path.join(ckpt_dir, '1'), my_model_params)

checkpointer.wait_until_finished()

print(f"✅ Checkpoint saved to {ckpt_dir}")
!ls -R {ckpt_dir}

# 4. Restore the PyTree. Orbax uses TensorStore to read the data back.
restored_params = checkpointer.restore(os.path.join(ckpt_dir, '1'))
print("\nRestored successfully! Bias of layer 2:", restored_params['layer2']['bias'])

✅ Checkpoint saved to /tmp/my_orbax_checkpoint/
/tmp/my_orbax_checkpoint/:
1

/tmp/my_orbax_checkpoint/1:
array_metadatas       d		      _METADATA        _sharding
_CHECKPOINT_METADATA  manifest.ocdbt  ocdbt.process_0

/tmp/my_orbax_checkpoint/1/array_metadatas:
process_0

/tmp/my_orbax_checkpoint/1/d:
66f0d80ee063dbc7b43ac58d5c68cfb5

/tmp/my_orbax_checkpoint/1/ocdbt.process_0:
d  manifest.ocdbt

/tmp/my_orbax_checkpoint/1/ocdbt.process_0/d:
1a46bfaecca1c3a8982f784f315d0a27  77b9243ba07118f7b62c7b543e74baa3
1ba2075dc8422d37c6f1d17b20ff38bc  ab9b36aec28474627571e7cbc4592f06





Restored successfully! Bias of layer 2: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.]


### **Under the Hood: What Is Orbax Really Doing?**

When you call `checkpointer.save(...)`, Orbax doesn't just dump the files. It performs a sophisticated dance with TensorStore:

1.  **Iteration:** Orbax traverses your PyTree (`my_model_params`).
2.  **Spec Generation:** For each array (like `'weights'` and `'bias'`), Orbax creates a TensorStore `spec`. This spec tells TensorStore where to save the array within the checkpoint directory and what its properties are.
3.  **Asynchronous Writes:** Orbax then uses TensorStore's asynchronous API to issue write commands for all arrays, allowing them to be written in parallel and with maximum efficiency.
4.  **Specialized Driver:** For cloud storage, Orbax often uses a special TensorStore driver called `ocdbt` (Optimized Concurrent Database Transaction). This driver is designed to aggregate many small array chunks into fewer, larger files, which dramatically improves performance on cloud filesystems that penalize many small I/O operations.

Essentially, Orbax acts as the smart manager, while TensorStore is the high-performance worker handling the data.

### **Advanced Topic: Reading an Orbax Checkpoint with Pure TensorStore**

You've seen how Orbax provides a convenient `restore()` method. But what if you don't have Orbax, or you want to inspect a single array from a checkpoint for debugging?

Since Orbax uses TensorStore as its engine, you can bypass Orbax entirely and read the data directly with TensorStore. This requires you to manually build the `spec` that Orbax would normally create for you.

The process has two steps:
1.  Treat the checkpoint directory as an `ocdbt` **Key-Value Store** to list all the arrays it contains.
2.  Use the name of a specific array to build a full `zarr`-over-`ocdbt` spec to read its data.

In [None]:
import tensorstore as ts
import os

# --- Step 1: List all files and find the unique array names ---

# Path to the specific checkpoint step directory created by Orbax
ckpt_step_dir = '/tmp/my_orbax_checkpoint/1'

print("Orbax checkpoint:")
!ls {ckpt_step_dir}

# Create a spec for the OCDBT Key-Value Store
kvstore_spec = {
    'driver': 'ocdbt',
    'base': { 'driver': 'file', 'path': ckpt_step_dir }
}

# Open it as a KvStore and list all file keys
kvstore = ts.KvStore.open(kvstore_spec).result()
all_file_keys = [key.decode() for key in kvstore.list().result()]

print("\nAll Keys found inside the Orbax checkpoint:")
print(all_file_keys)

# We need to find the root of each Zarr array by looking
# for the '.zarray' metadata files and stripping that suffix.
# Orbax also replaces '/' in PyTree names with '.'
array_names = sorted(list(set([
    key.removesuffix('/.zarray') for key in all_file_keys if key.endswith('.zarray')
])))

print("\nClean array names found inside the Orbax checkpoint:")
print(array_names)


# --- Step 2: Read one specific array using its correct name ---

# Let's target the weights of the first layer using its corrected name
target_array_name = 'layer1.weights' # CORRECTED: Use '.' instead of '/'

# Construct the full spec to read the Zarr array from within the OCDBT store
array_spec = {
    'driver': 'zarr',
    'kvstore': {
        'driver': 'ocdbt',
        'base': { 'driver': 'file', 'path': ckpt_step_dir },
    },
    'path': target_array_name,  # Specifies which array to open
    'open': True
}

# Open the specific array using the spec
single_array_store = ts.open(array_spec).result()

# Read the data into a NumPy array
data = single_array_store.read().result()

print(f"\n--- Reading '{target_array_name}' directly with TensorStore ---")
print("Shape:", data.shape)
print("Data type:", data.dtype)
print("A small slice of the data:\n", data[0:3, 0:3])

Orbax checkpoint:
array_metadatas       d		      _METADATA        _sharding
_CHECKPOINT_METADATA  manifest.ocdbt  ocdbt.process_0

All Keys found inside the Orbax checkpoint:
['layer1.bias/.zarray', 'layer1.bias/0', 'layer1.weights/.zarray', 'layer1.weights/0.0', 'layer2.bias/.zarray', 'layer2.bias/0', 'layer2.weights/.zarray', 'layer2.weights/0.0']

Clean array names found inside the Orbax checkpoint:
['layer1.bias', 'layer1.weights', 'layer2.bias', 'layer2.weights']

--- Reading 'layer1.weights' directly with TensorStore ---
Shape: (128, 64)
Data type: float32
A small slice of the data:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


### **Bonus: Converting from OCDBT to a Standard Format**

You've seen that Orbax uses the highly-optimized `ocdbt` driver for writing checkpoints. While this is fast, `ocdbt` is a specialized format. What if you want to share your checkpoint with a colleague who uses a different tool that only understands standard Zarr, or you want to browse the files manually?

You can easily convert the checkpoint to a standard, non-ocdbt format using TensorStore itself! The process involves reading from the `ocdbt` source and writing to a new destination with a simple `file` driver for each array.

In [None]:
import tensorstore as ts
import os
import shutil
import numpy as np

# --- Define Source and Destination ---
source_ckpt_dir = '/tmp/my_orbax_checkpoint/1'
dest_ckpt_dir = '/tmp/my_converted_zarr_checkpoint'

# Clean up previous runs
if os.path.exists(dest_ckpt_dir):
    shutil.rmtree(dest_ckpt_dir)
os.makedirs(dest_ckpt_dir)


# --- Get the list of array names ---
kvstore_spec = {
    'driver': 'ocdbt',
    'base': { 'driver': 'file', 'path': source_ckpt_dir }
}
kvstore = ts.KvStore.open(kvstore_spec).result()
all_file_keys = [key.decode() for key in kvstore.list().result()]
array_names = sorted(list(set([
    key.removesuffix('/.zarray') for key in all_file_keys if key.endswith('.zarray')
])))

print(f"Found {len(array_names)} arrays to convert.")


# --- Loop through each array and convert it ---
for array_name in array_names:
    print(f"Converting '{array_name}'...")

    # 1. Define and open the SOURCE store
    source_spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'ocdbt',
            'base': { 'driver': 'file', 'path': source_ckpt_dir },
        },
        'path': array_name
    }
    source_store = ts.open(source_spec).result()

    # 2. Define the DESTINATION spec
    dest_spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': os.path.join(dest_ckpt_dir, array_name)
        }
    }

    # 3. Open the destination store, passing the entire schema from the source.
    #    THE FIX: This preserves chunk size, compression, dtype, domain, etc.
    dest_store = ts.open(
        dest_spec,
        create=True,
        delete_existing=True,
        schema=source_store.schema  # <-- Copies all metadata
    ).result()

    # 4. Perform the copy
    dest_store.write(source_store).result()

print("\n✅ Conversion complete with all metadata preserved!")
print(f"Standard Zarr checkpoint saved at: {dest_ckpt_dir}")

# Let's inspect the new directory structure.
!ls -R {dest_ckpt_dir}

Found 4 arrays to convert.
Converting 'layer1.bias'...
Converting 'layer1.weights'...
Converting 'layer2.bias'...
Converting 'layer2.weights'...

✅ Conversion complete with all metadata preserved!
Standard Zarr checkpoint saved at: /tmp/my_converted_zarr_checkpoint
/tmp/my_converted_zarr_checkpoint:
layer1.bias  layer1.weights  layer2.bias  layer2.weights

/tmp/my_converted_zarr_checkpoint/layer1.bias:
0

/tmp/my_converted_zarr_checkpoint/layer1.weights:
0.0

/tmp/my_converted_zarr_checkpoint/layer2.bias:
0

/tmp/my_converted_zarr_checkpoint/layer2.weights:
0.0


## What if You Want to Customize?

While copying the whole schema is great for an exact replica, you can also use this step to change parameters. For example, if you wanted to convert the data and simultaneously change its chunking, you could override just that part of the schema:

```python
# Example of overriding the chunk layout during conversion
# new_chunk_layout = ts.ChunkLayout(write_chunk_shape=[32, 32])
# dest_store = ts.open(
#     dest_spec,
#     create=True,
#     schema=source_store.schema,
#     chunk_layout=new_chunk_layout  # <-- Override just the chunking
# ).result()
```

### **Conclusion: The Journey** ✅

Congratulations! You've completed the journey from zero to TensorStore hero.

You started by understanding a fundamental problem—data being too large for memory. You then learned how TensorStore solves this with its core concepts:

* The declarative **`spec`** to define data.
* **Chunking** and **asynchronous I/O** for performance.
* The **`kvstore`** abstraction for cloud portability.
* Powerful **virtual views** and **transactions** for advanced workflows.

Finally, you saw how all these concepts come together to power a critical library in the modern ML ecosystem, **Orbax**, turning a complex checkpointing problem into a simple and efficient operation. You now have the foundation to use TensorStore for your own large-scale data challenges.