# Dynamic Tensors

In this notebook, we will see how to handle data of variable shape and sizes.

In [1]:
from hub.schema import Primitive, Audio, ClassLabel
from hub import transform, schema

import librosa
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

from glob import glob
from time import time

## What if our dataset contains data with varying sizes?

In [10]:
fnames = glob("./Data/audio/*")

for fname in fnames:
    print("n_samples:", librosa.load(fname, sr=None)[0].shape[0])

n_samples: 32000
n_samples: 115542
n_samples: 176400
n_samples: 176400
n_samples: 176400
n_samples: 176400
n_samples: 176400
n_samples: 176400
n_samples: 192000
n_samples: 176400
n_samples: 176400
n_samples: 176400
n_samples: 64589
n_samples: 23373


## (A) Defining a "Dynamic" Schema
A schema is a python `dicts` that contains metadata about our dataset. 

In this example, we tell Hub that our files are variable in duration by passing in `shape=(None,)`. In return, we tell Hub that our files could be as large as 192,000 samples with `max_shape=(192000,)`

In [7]:
my_schema = {
    "wav": Audio(shape=(None,), max_shape=(192000,), file_format="wav")
}

## (B) Defining Transforms
Transforms for dynamic tensors look the seame as transforms for static tensors.

In [8]:
@transform(schema=my_schema)
def load_transform(sample):
    
    audio = librosa.load(sample, sr=None)[0]
    
    return {
        "wav": audio
    }

In [11]:
ds = load_transform(fnames) # returns a transform object
type(ds)

hub.compute.transform.Transform

## (C) Finally, Execution!
Hub lazily executes, so nothing happens until we invoke `store`. By invoking `store`, we apply `load_transform` to our dataset and push everything.

In [12]:
start = time()

tag = "mynameisvinn/vibrations"
ds2 = ds.store(tag)
type(ds2)

end = time()
print("Elapsed time:", end - start)

  warn('ignoring keyword argument %r' % k)
Computing the transormation: 100%|██████████| 14.0/14.0 [00:00<00:00, 20.9 items/s]

Elapsed time: 3.316145896911621



