In [1]:
%load_ext autoreload
%autoreload 2

The central data structure provided by the library is the `BlobPath` type  
This type would abstract away the internals of how the file is stored and works in a cloud agnostic manner  

Lets initialise an `S3BlobPath`, which handles storage to AWS S3. Before that, you need to define `IMPLICIT_BLOB_PATH_TMPDIR` in your environment variable, this is the location of the tmpdir used by `blob-store`, which is required for most operations

In [2]:
import os

os.environ["IMPLICIT_BLOB_PATH_TMPDIR"] = "/tmp"

In [3]:
! pip install blob-path==0.1.1

Collecting blob-path==0.1.1
  Downloading blob_path-0.1.1-py3-none-any.whl.metadata (2.2 kB)
Collecting pydantic>=2.10.1 (from blob-path==0.1.1)
  Downloading pydantic-2.10.3-py3-none-any.whl.metadata (172 kB)
Collecting annotated-types>=0.6.0 (from pydantic>=2.10.1->blob-path==0.1.1)
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.27.1 (from pydantic>=2.10.1->blob-path==0.1.1)
  Downloading pydantic_core-2.27.1-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.6 kB)
Downloading blob_path-0.1.1-py3-none-any.whl (18 kB)
Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)
Downloading pydantic_core-2.27.1-cp312-cp312-macosx_11_0_arm64.whl (1.8 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m[31m3.9 MB/s[0m eta [36m0:00:01[0m
[?25hDownloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Installing collected packages: pydantic-core, annotated-types, pydan

In [5]:
! pip install 'blob-path[aws]'

Collecting boto3>=1.35.68 (from blob-path[aws])
  Downloading boto3-1.35.76-py3-none-any.whl.metadata (6.7 kB)
Collecting botocore<1.36.0,>=1.35.76 (from boto3>=1.35.68->blob-path[aws])
  Downloading botocore-1.35.76-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.35.68->blob-path[aws])
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3>=1.35.68->blob-path[aws])
  Downloading s3transfer-0.10.4-py3-none-any.whl.metadata (1.7 kB)
Collecting urllib3!=2.2.0,<3,>=1.25.4 (from botocore<1.36.0,>=1.35.76->boto3>=1.35.68->blob-path[aws])
  Using cached urllib3-2.2.3-py3-none-any.whl.metadata (6.5 kB)
Downloading boto3-1.35.76-py3-none-any.whl (139 kB)
Downloading botocore-1.35.76-py3-none-any.whl (13.2 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.2/13.2 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m[31m3.9 MB/s[0m eta [36m0:00:01[0m
[?25hDownloading jmespath

In [6]:
from blob_path.backends.s3 import S3BlobPath

bucket_name = "narang-public-s3"
object_key = "hello_world.txt"
region = "us-east-1"
blob_path = S3BlobPath(bucket_name, region, object_key)

The blob path is simply a path representation, like `pathlib.Path`, its not required that the file should exist or not  
You can check for existence using `exists`

In [7]:
blob_path.exists()

RecursionError: maximum recursion depth exceeded

The main method that `BlobPath` provides is `open`, it mimicks the builtin `open` function to some extent  
This method is the central abstraction, many operations are handled in a generic way using this method

Lets write something to the object in our bucket

In [5]:
with blob_path.open("w") as f:
    f.write("hello world")

# the file would exist in S3 now, you should check it out
blob_path.exists()

True

S3 and other cloud storage blob paths can be fully serialised and deserialised.  
You can pass around these path objects across processes (and servers) and easily locate the file

In [6]:
# a single blob path can be serialised using the method `serialise`
blob_path.serialise()

{'kind': 'blob-store-aws',
 'payload': {'bucket': 'narang-public-s3',
  'region': 'us-east-1',
  'object_key': 'hello_world.txt'}}

In [None]:
# lets deserialise them
# deserialise is a separate function and you can pass it any kind of blob path and it would correctly deserialise it

from blob_path.deserialise import deserialise

deserialised_s3_blob = deserialise(
    {
        "kind": "blob-store-aws",
        "payload": {
            "bucket": "narang-public-s3",
            "region": "us-east-1",
            "object_key": "hello_world.txt",
        },
    }
)

deserialised_s3_blob

kind=blob-store-aws bucket=narang-public-s3 region=us-east-1 object_key=hello_world.txt

Lets try another path backend, the `LocalRelativeBlobPath`, this path models a local FS relative path, which is always rooted at a single root directory  
Consider you store all the application files inside a single path "/tmp/my-apps-files"  
In this case, instead of using `pathlib.Path`, you could use `LocalRelativeBlobPath` (this allows you to easily switch between using a cloud storage or a local storage for your files)  

In [None]:
from blob_path.backends.local_relative import LocalRelativeBlobPath

# PurePath is a simple path representation, but it does not care whether its actually a path or not in your FS
# Its useful for logically representing various data structures, as an example, you could represent S3 object keys as `PurePaths`
from pathlib import PurePath

relpath = PurePath("local") / "storage.txt"
local_blob = LocalRelativeBlobPath(relpath)

In [9]:
local_blob.exists()

Exception: tried fetching implicit variable from environment but the var os.environ['IMPLICIT_BLOB_STORE_LOCAL_RELATIVE_BASE_DIR'] does not exist

Uh oh, we got an error, that too really early ;_;
It says that we have not defined `IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR` in our environment  

This environment variable stores the root directory of your relative paths

In [None]:
from pathlib import Path

os.environ["IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR"] = str(
    Path.home() / "tmp" / "local_fs_root"
)

# it passes now, and says that the file does not exist
local_blob.exists()

True

So why is `LocalRelativeBlobPath` taking the root directory as an environment variable? Could we pass it in `__init__`?  
We could argue about this, but then the path is pretty much the same as any absolute path. Even the serialised representation of `LocalRelativeBlobPath` leaves out the root directory (its not part of the path representation)  

# Implict variables
These variables which modify the behavior of `BlobPath` are called implicit variables. They are by default, picked from the environment  
Fetching the root directory from environment has multiple benefits
- You could mount the same path between multiple containers at **different** mount points and still pass around the serialised representation correctly (assuming you provide the implicit variables correctly)
- Same for servers mounted with an NFS
- This also works well for presigned URLs, where you can simply start an nginx server and pass that server's base URL as an implicit variable to the path

Implicit variables will change the behavior and location of your blobs implicitly (hah! perfect naming). Every implicit variable follows the naming convention: `IMPLICIT_BLOB_PATH_<BACKEND>_...`  
Currently, only `LocalRelativeBlobPath` has implicit variables  

Let's do a simple copy operation between an S3 path and a local path

In [11]:
import shutil

# the long way
with deserialised_s3_blob.open("r") as fr:
    with local_blob.open("w") as fw:
        shutil.copyfileobj(fr, fw)

with local_blob.open("r") as f:
    print(f.read())

hello world


Lets use a shortcut now.  
Whenever possible, prefer shortcuts from the library for your operations  
Currently, they only provide ease-of-use, but we can later optimise away special cases (like copying between two S3 blobs can be triggered using a remote copy with boto3, without copying data in your local machine)

In [None]:
# delete first for the example
local_blob.delete()

deserialised_s3_blob.cp(local_blob)
with local_blob.open("r") as f:
    print(f.read())


# using a shortcut from the library
# this shortcut provides more convenience, any of the `src` or `dest` can be `pathlib.Path` too
# this makes it easy to deal with normal paths in your FS
from blob_path.shortcuts import copy_blob

local_blob.delete()
copy_blob(deserialised_s3_blob, local_blob)
with local_blob.open("r") as f:
    print(f.read())

hello world
hello world
