# Getting the embeddings

> This notebook gets the embeddings (or latent space) from a multivariate time series 
given by a encoder (e.g., autoencoder)

In [12]:
from tchub.all import *
from tsai.data.preparation import SlidingWindow
from fastcore.all import *
import wandb
wandb_api = wandb.Api()
from yaml import load, FullLoader

## Config parameters

Put here everything that could be needed if this notebook

In [13]:
config = AttrDict(
    use_wandb = False, # Whether to use or not wandb for experiment tracking
    wandb_group = 'embeddings', # Whether to group this run in a wandb group
    wandb_entity = 'pacmel',
    wandb_project = 'tchub',
    enc_artifact = 'pacmel/tchub/mvp:run-tchub-2juemwx8', # Name:version of the encoder artifact
    input_ar = None,
    cpu = False
)

## Run

In [14]:
run = wandb.init(entity=config.wandb_entity,
                    project=config.wandb_project if config.use_wandb else 'work-nbs', 
                    group=config.wandb_group,
                    job_type='embeddings', 
                    mode='online' if config.use_wandb else 'disabled',
                    anonymous = 'never' if config.use_wandb else 'must',
                    config=config,
                    #id = 'embeddingsProvider',
                    resume='allow')

In [15]:
# Botch to use artifacts offline
artifacts_gettr = run.use_artifact if config.use_wandb else wandb_api.artifact

Restore the encoder model and its associated configuration

In [16]:
enc_artifact = artifacts_gettr(config.enc_artifact, type='learner')

In [17]:
# TODO: This only works when you run it two timeS! WTF?
try:
    enc_learner = enc_artifact.to_obj()
except:
    enc_learner = enc_artifact.to_obj()
enc_learner

<fastai.learner.Learner at 0x7f57ff058220>

Restore the dataset artifact used for training the encoder. Even if we do not compute the dimensionality reduction over this dataset, we need to know the metadata of the encoder training set, to check that 
it matches with the dataset that we want to reduce.

In [18]:
enc_run = enc_artifact.logged_by()
enc_artifact_train = artifacts_gettr(enc_run.config['train_artifact'], type='dataset')
enc_artifact_valid = artifacts_gettr(enc_run.config['valid_artifact'], type='dataset')
enc_artifact_train.name, enc_artifact_valid.name

('taxi:v0', 'taxi:v0')

Now we specify the dataset artifact that we want to get the embeddings from. If no 
artifact is defined, the artifact to reduce will be the one used for validate the encoder.

In [19]:
input_ar_name = ifnone(config.input_ar, 
                       f'{enc_artifact_valid.entity}/{enc_artifact_valid.project}/{enc_artifact_valid.name}')
wandb.config.update({'input_ar': input_ar_name}, allow_val_change=True)
input_ar = artifacts_gettr(input_ar_name)
input_ar.name

'taxi:v0'

Now we need to check whether the artifact that is going to be used fort the dimensionality reduction matches the artifact used to train the encoder. Matching means having the same number of variables, the same window size and stride, and the same frequency.

In [20]:
#export
def check_compatibility(input_ar:TSArtifact, enc_ar:TSArtifact):
    "Function to check that the artifact used to train the encoder model and the artifact that is \
    going to be passed to do inference"
    try:
        # Check that both artifacts have the same variables
        chk_vars = input_ar.metadata['TS']['vars'] == enc_ar.metadata['TS']['vars']
        # Check that both artifacts have the same freq
        chk_freq = input_ar.metadata['TS']['freq'] == enc_ar.metadata['TS']['freq']
        # Check that the dr artifact is not normalized (not normalized data has not the key normalization)
        chk_norm = input_ar.metadata['TS'].get('normalization') is None
        # Check that the dr artifact has not missing values
        chk_miss = input_ar.metadata['TS']['has_missing_values'] == "False"
        # Check all logical vars.
        if chk_vars and chk_freq and chk_norm and chk_miss:
            print("Artifacts are compatible.")
        else:
            raise Exception
    except Exception as e:
        print("Artifacts are not compatible.")
        raise e
    return None


In [21]:
df = input_ar.to_df()
df

Unnamed: 0_level_0,value
timestamp,Unnamed: 1_level_1
2014-10-01 00:00:00,12751
2014-10-01 00:30:00,8767
2014-10-01 01:00:00,7005
2014-10-01 01:30:00,5257
2014-10-01 02:00:00,4189
...,...
2014-12-14 21:30:00,16344
2014-12-14 22:00:00,15913
2014-12-14 22:30:00,14327
2014-12-14 23:00:00,12060


In [22]:
df.shape

(3600, 1)

In [23]:
enc_input, _ = SlidingWindow(window_len=enc_run.config['w'], 
                             stride=enc_run.config['stride'], 
                             get_y=[])(df)
enc_input.shape

(3553, 1, 48)

In [24]:
%%time
embs = get_enc_embs(enc_input, enc_learner, cpu=config.cpu, to_numpy=True)



CPU times: user 4.8 s, sys: 1.42 s, total: 6.22 s
Wall time: 5.81 s


In [25]:
if config.use_wandb: 
    run.log_artifact(ReferenceArtifact(embs, 'embeddings', metadata=dict(run.config)), 
                     aliases=f'run-{run.project}-{run.id}')

In [26]:
run.finish()