## Description
I implemented an autoencoder for 3D point cloud (number of points x 3) data,
* using TensorFlow
* applying (modified) Self-Attention Layer of Transformer  

Though this notebook was created for my own study, I wish this notebook would help someone.  

Note: This code requires GPU usage. Pretrained weights are input by default.

## Refferences
[1] https://arxiv.org/abs/2012.09688  
  Details of the architecture are described and the core code is uploaded to github. It was so helpful for me. 
  
[2] https://arxiv.org/abs/2012.09164  
  SOTA model for 3D Point Cloud classification at Jan 2021. "Vector attention" is adopted.  
  
[3] https://arxiv.org/abs/1712.07262  
  Point cloud Auto-encoder

# Results Visualization  
Below is an animation in which one shape of airplane changes into another shape and repeats that. Intermediate images of the animation are made by interpolation, and two interpolation methods are compared:
* left: interpolate the coordinates of point clouds directly
* right: interpolate encoded features and decode to point cloud

In [None]:
from IPython.display import Image
nbdatapath = "../input/transformerbased-autoencoder-for-3d-point-cloud"
Image(filename=nbdatapath+"/Transformer_AE_v003.gif",
      format='png')

Since the feature extracted in the learned autoencoder has important information about the shape, interpolating it provides natural shapeshift animation (right) even fed random order data.

## Preparation
* https://github.com/AnTao97/PointCloudDatasets  

I Used "ShapeNetPart" dataset. It consists of 16 categories: 

In [None]:
Image(nbdatapath+"/shapenetpart_shapes.jpg")

In [None]:
cat_list = ['airplane','bag','cap','car','chair',
            'earphone','guitar','knife','lamp',
            'laptop','motorbike','mug','pistol',
            'rocket','skateboard','table']

"Farthest point sampling" algorithm is used in this code. Processing on the GPU needs to run the shell script.

In [None]:
### "tf_ops" folder was downloaded from https://github.com/dgriffiths3/pointnet2-tensorflow2
### The following command must be run with the GPU on.
!sh ../input/pointnet2-tf-ops/tf_ops/compile_ops.sh

In [None]:
import os
# random.seed(0)
feat_dims = 1024
batch_size = 32
ver_label = 'Transformer_AE_v003'
checkpoint_dir = 'CPs_'+ver_label

train_epochs = 0 # if ==0, skip training
load_weight = True # if True, start from epoch 200

if not os.path.exists(checkpoint_dir):
    os.makedirs(checkpoint_dir)

In [None]:
import numpy as np
import tensorflow as tf
import gc, sys, glob
from tqdm import tqdm

from tensorflow.keras import Input
from tensorflow.keras import models as M
from tensorflow.keras import layers as L
from tensorflow.keras import backend as keras
from tensorflow.keras.utils import plot_model

import matplotlib.pyplot as plt
import matplotlib.cm as cm
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib.animation import FuncAnimation
import seaborn as sns
sns.set(style='darkgrid')

import h5py
from sklearn.manifold import TSNE

In [None]:
sampling_module=tf.load_op_library('./tf_sampling_so.so')
from tensorflow.python.framework import ops
ops.NoGradient('FarthestPointSample')

In [None]:
def load_h5(h5_filename):
    f = h5py.File(h5_filename, 'r')
    data = f['data'][:]
    label = f['label'][:]
    return (data, label)

def read_file(x_str):
    
    h5_fns = sorted(glob.glob('../input/shapenetpart-hdf5-2048/'
                              +x_str+'*.h5'))
    list_temp1, list_temp2 = [],[]
    for h5_filename in tqdm(h5_fns):
        h5 = load_h5(h5_filename)
        list_temp1.append(h5[0])
        list_temp2.append(h5[1])
    X = np.concatenate(list_temp1, axis=0)
    X_label = np.concatenate(list_temp2, axis=0).squeeze()
    
    return X, X_label

X, X_label = read_file('train')
Xv, Xv_label = read_file('val')
Xt, Xt_label = read_file('test')
    
print(f'train:{X.shape}, val:{Xv.shape}, test:{Xt.shape}')
num_points = X.shape[1]

Shuffle the order of data in each cloud in advance. This clarifies that the model is not affected by data order.

In [None]:
np.random.seed(0)
for items in [X, Xv, Xt]:
    for values in tqdm(items):
        np.random.shuffle(values)

In [None]:
cat_label = [cat_list[X_label[i]]
             for i in range(X_label.shape[0])]
cat_label_v = [cat_list[Xv_label[i]]
               for i in range(Xv_label.shape[0])]
cat_label_t = [cat_list[Xt_label[i]]
               for i in range(Xt_label.shape[0])]

In [None]:
def PCPlot3d(ax, pts):

    ax.set_xlabel("X")
    ax.set_ylabel("Y")
    ax.set_zlabel("Z")

    X,Y,Z = pts[:,0], -pts[:,2], pts[:,1]
    ax.set_xlim(-1., 1.)
    ax.set_ylim(-1., 1.)
    ax.set_zlim(-1., 1.)

    ax.plot(X,Y,Z, marker="o", markersize=1, linestyle='None')

%matplotlib inline
fig = plt.figure(figsize=(5,5))
fig.tight_layout()
ax = Axes3D(fig)
PCPlot3d(ax, X[333])

# Encoder Construction
I tried to follow the architecture of [1] almost closely, but I avoided using BatchNorm layers due to performance degradation.  
Please note the encoder is not completely consistent in structure with the original Transoformer.

Since point cloud data contains no information in  order,
* It is suitable to apply attention layer which is permutation-invariant.
* Positional embedding which is applied in the original Transformer can be discarded.

In [None]:
def pairwise_distance(xyz1, xyz2):
    n = xyz1.shape[1]
    c = xyz1.shape[2]
    m = xyz2.shape[1]
    xyz1 = tf.tile(tf.reshape(xyz1, (-1,1,n,c)), [1,m,1,1])
    xyz2 = tf.tile(tf.reshape(xyz2, (-1,m,1,c)), [1,1,n,1])
    dist = tf.reduce_sum((xyz1-xyz2)**2, -1)
    return dist

def knn_point(k, xyz1, xyz2):
    dist = -pairwise_distance(xyz1, xyz2)
    val, idx = tf.math.top_k(dist, k)
    return -val, idx

In [None]:
def LayerLinBnRelu(tensor, C, seq_name,
                   use_bias=True, activation=None, LeakyAlpha=0.0):
    x_in = Input(shape=tensor.shape[1:], name=seq_name+'_input')
    x = L.Dense(C, use_bias=use_bias, activation=activation,
                name=seq_name+'_lin')(x_in)
#     x = L.BatchNormalization(name=seq_name+'_bn')(x)
    if LeakyAlpha==0.0:
        x_out = L.ReLU(name=seq_name+'_ReLU')(x)
    else:  
        x_out = L.LeakyReLU(alpha=LeakyAlpha,
                            name=seq_name+'_ReLU')(x)
    model = M.Model(inputs=x_in, outputs=x_out, name=seq_name)
    return model(tensor)

In [None]:
def sample_and_group(args, nsample):
    xyz, pts, fps_idx = args

    new_xyz = tf.gather_nd(xyz, tf.expand_dims(fps_idx,-1), batch_dims=1)
    new_pts = tf.gather_nd(pts, tf.expand_dims(fps_idx,-1), batch_dims=1)
    _, idx = knn_point(nsample, xyz, new_xyz)

#     grouped_xyz = tf.gather_nd(xyz, tf.expand_dims(idx,-1), batch_dims=1)
#     grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2),
#                            (1,1,nsample,1))
    grouped_pts = tf.gather_nd(pts, tf.expand_dims(idx,-1), batch_dims=1)
    grouped_pts -= tf.tile(tf.expand_dims(new_pts, 2),
                           (1,1,nsample,1))

    new_pts = tf.concat([grouped_pts,
                         tf.tile(tf.expand_dims(new_pts, 2),
                                 (1,1,nsample,1))],
                        axis=-1)
    
    return new_xyz, new_pts

In [None]:
def LayerSelfAttention(tensor, seq_name):
    x_in = Input(shape=tensor.shape[1:], name=seq_name+'_input')

    C = x_in.shape[2]
    W_q = L.Dense(C//4, use_bias=False, activation=None,
                  name=seq_name+'_Q')
    W_k = L.Dense(C//4, use_bias=False, activation=None,
                  name=seq_name+'_K')
    # W_v has bias in the original code,
    # but set no bias here as well as W_q and W_k.
    W_v = L.Dense(C, use_bias=False, activation=None,
                  name=seq_name+'_V')

    x_q = W_q(x_in)
    x_k = W_k(x_in)
    W_k.set_weights(W_q.get_weights())
    x_k = L.Lambda(lambda t: tf.transpose(t, perm=(0,2,1)),
                   name=seq_name+'_KT')(x_k)
    x_v = W_v(x_in)

    energy = L.Lambda(lambda ts: tf.matmul(ts[0],ts[1]),
                      name=seq_name+'_matmul1')([x_q, x_k])
    attention = L.Softmax(axis=1, name=seq_name+'_softmax')(energy)
    attention = L.Lambda(lambda t:
                         t / (1e-9 + tf.reduce_sum(t, axis=2,
                                                   keepdims=True)),
                         name=seq_name+'_l1norm')(attention)

    x_r = L.Lambda(lambda ts: tf.matmul(ts[0],ts[1]),
                   name=seq_name+'_matmul2')([attention, x_v])
    x_r = L.Lambda(lambda ts: tf.subtract(ts[0],ts[1]),
                   name=seq_name+'_subtract')([x_in, x_r])
    x_r = LayerLinBnRelu(x_r, C, seq_name+'_LBR', use_bias=True)
    x_out = L.Lambda(lambda ts: tf.add(ts[0],ts[1]),
                     name=seq_name+'_add')([x_in, x_r])

    model = M.Model(inputs=x_in, outputs=x_out, name=seq_name)
    return model(tensor)

In [None]:
## ENCODER ##

xyz = Input(shape=(num_points, 3), name='input_points')
x = LayerLinBnRelu(xyz, 64, 'E-IN_LBR1', use_bias=False)
x = LayerLinBnRelu(x, 64, 'E-IN_LBR2', use_bias=False)

# sample and group(SG) Module #1
fps_idx = L.Lambda(sampling_module.farthest_point_sample,
                   arguments={'npoint':512},
                   name='E-FPS_IDX1')(xyz)
new_xyz, new_feature = L.Lambda(sample_and_group,
                                arguments={'nsample':32},
                                name='E-SG1')([xyz, x, fps_idx])
x = LayerLinBnRelu(new_feature, 128, 'E-SG1_LBR1', use_bias=False)
x = LayerLinBnRelu(x, 128, 'E-SG1_LBR2', use_bias=False)
x = L.Lambda(lambda t: tf.reduce_max(t, axis=2),
             name='E-SG1_MaxPool')(x)

# sample and group(SG) Module #2
fps_idx = L.Lambda(sampling_module.farthest_point_sample,
                   arguments={'npoint':256},
                   name='E-FPS_IDX2')(new_xyz)
new_xyz, new_feature = L.Lambda(sample_and_group,
                                arguments={'nsample':32},
                                name='E-SG2')([new_xyz, x, fps_idx])
x = LayerLinBnRelu(new_feature, 256, 'E-SG2_LBR1', use_bias=False)
x = LayerLinBnRelu(x, 256, 'E-SG2_LBR2', use_bias=False)
x = L.Lambda(lambda t: tf.reduce_max(t, axis=2),
             name='E-SG2_MaxPool')(x)

# Self Attention
x1 = LayerSelfAttention(x, 'E-SA1')
x2 = LayerSelfAttention(x1, 'E-SA2')
x3 = LayerSelfAttention(x2, 'E-SA3')
x4 = LayerSelfAttention(x3, 'E-SA4')
x0 = L.Lambda(lambda ts: tf.concat(ts, axis=2),
              name='E-SA_Concat')([x1,x2,x3,x4])

# In the original code, input embedding is also concatenated.
# (but this wasn't illustrated in Figure 2 in the paper.)
x = L.Lambda(lambda ts: tf.concat(ts, axis=2),
             name='E-OUT_Concat')([x0,x])

x = LayerLinBnRelu(x, feat_dims, 'E-OUT_LBR',
                   use_bias=False, LeakyAlpha=0.2)
output_feats = L.Lambda(lambda t: tf.reduce_max(t, axis=1, keepdims=True),
                        name='E-OUT_MaxPool')(x)

PCT_Encoder = M.Model(inputs=xyz, outputs=output_feats)
PCT_Enc_list = [layer.name for layer in PCT_Encoder.layers]

# Encoder Visualization
plot_model(1): Visualize the architecture of whole encoder.

In [None]:
plot_model(PCT_Encoder, show_shapes=True, dpi=96,
           to_file='model_enc1_whole.png')

plot_model(2): Visualize inside of the "LinBnRelu(LBR)" layer. (Actually "LinRelu" is correct...)

In [None]:
# plotting inside of the layer: "LinBnRelu(LBR)" layer
i = PCT_Enc_list.index('E-IN_LBR1')
plot_model(PCT_Encoder.layers[i], show_shapes=True, dpi=96,
           to_file='model_enc2_LBR.png')

plot_model(3): Visualize inside of the Self Attention layer.

In [None]:
# plotting inside of the layer: Self Attention layer
i = PCT_Enc_list.index('E-SA1')
plot_model(PCT_Encoder.layers[i], show_shapes=True, dpi=96,
           to_file='model_enc3_SelfAttention.png')

# Decoder Construction
Conventional decoder part of autoencoder seems to be constructed of cascaded fully connected layers. But I tried to adopt source-target attention layer, following the original Transformer.  

In the paper [3] the fixed grid points are input to decoder part, so I thought feeding some fixed value into the decoder can be available for transformer autoencoder as well. I chose stacked identity matrix (tf.eye) almost intuitively.

In [None]:
def LayerSrcTrgtAttention(args, seq_name):
    E_tensor, D_tensor = args
    xE_in = Input(shape=E_tensor.shape[1:], name=seq_name+'_input-E')
    C = xE_in.shape[2]
    
    xD_in = Input(shape=D_tensor.shape[1:], name=seq_name+'_input-D')
    out_dim = xD_in.shape[2]

    W_q = L.Dense(C//4, use_bias=False, activation=None,
                  name=seq_name+'_Q')
    W_k = L.Dense(C//4, use_bias=False, activation=None,
                  name=seq_name+'_K')
    # W_v has bias in the original code,
    # but set no bias here as well as W_q and W_k.
    W_v = L.Dense(out_dim, use_bias=False, activation=None,
                  name=seq_name+'_V')

    x_q = W_q(xD_in)
    x_k = W_k(xE_in)
#     W_k.set_weights(W_q.get_weights())
    x_k = L.Lambda(lambda t: tf.transpose(t, perm=(0,2,1)),
                   name=seq_name+'_KT')(x_k)
    x_v = W_v(xE_in)

    energy = L.Lambda(lambda ts: tf.matmul(ts[0],ts[1]),
                      name=seq_name+'_matmul1')([x_q, x_k])
    attention = L.Softmax(axis=1, name=seq_name+'_softmax')(energy)
    attention = L.Lambda(lambda t:
                         t / (1e-9 + tf.reduce_sum(t, axis=2,
                                                   keepdims=True)),
                         name=seq_name+'_l1norm')(attention)

    x_r = L.Lambda(lambda ts: tf.matmul(ts[0],ts[1]),
                   name=seq_name+'_matmul2')([attention, x_v])
    x_r = L.Lambda(lambda ts: tf.subtract(ts[0],ts[1]),
                   name=seq_name+'_subtract')([xD_in, x_r])
    x_r = LayerLinBnRelu(x_r, out_dim, seq_name+'_LBR',
                         use_bias=True)
    x_out = L.Lambda(lambda ts: tf.add(ts[0],ts[1]),
                     name=seq_name+'_add')([xD_in, x_r])

    model = M.Model(inputs=[xE_in,xD_in],
                    outputs=x_out, name=seq_name)
    return model([E_tensor,D_tensor])

In [None]:
def copy_and_mapping(tensor, nmul, seq_name):
    x_in = Input(shape=tensor.shape[1:], name=seq_name+'_input')
    x = L.Lambda(lambda t: tf.expand_dims(t, 2),
                 name=seq_name+'_expand')(x_in)
    C = x.shape[-1]//nmul
    x1 = L.Conv2DTranspose(C,(1,nmul),(1,nmul),
                           use_bias=True, activation=None,
                           name=seq_name+'_convT')(x)
    x2 = L.Dense(C, use_bias=True, activation=None,
                 name=seq_name+'_lin')(x)
    x2 = L.Lambda(lambda t: tf.tile(t, [1,1,nmul,1]),
                  name=seq_name+'_tile')(x2)
    x = L.Lambda(lambda ts: tf.add(ts[0],ts[1]),
                 name=seq_name+'_add')([x1, x2])
    npoint = x.shape[1]*x.shape[2]
    x_out = L.Lambda(lambda t: tf.reshape(t, [-1,npoint,t.shape[3]]),
                     name=seq_name+'_reshape')(x)
    model = M.Model(inputs=x_in, outputs=x_out, name=seq_name)
    return model(tensor)

In [None]:
### DECODER ###
input_feats = Input(shape=(1, feat_dims), name='input_feats')
m_feats = L.Lambda(lambda x: tf.tile(x, [1,256,1]),
                   name = 'D-IN_replicate')(input_feats)

input_eye_seed = Input(shape=(1,1), name='input_eye_seed')
# Make batch_size*eye tensor using broadcast
input_eye = input_eye_seed + tf.eye(256,256)
x = L.Dense(feat_dims//4, use_bias=False, activation=None,
            name='D-IN')(input_eye)

# Source Target Attention
x1 = LayerSrcTrgtAttention([m_feats,x], 'D-STA1')
x2 = LayerSrcTrgtAttention([m_feats,x1], 'D-STA2')
x3 = LayerSrcTrgtAttention([m_feats,x2], 'D-STA3')
x4 = LayerSrcTrgtAttention([m_feats,x3], 'D-STA4')

x0 = L.Lambda(lambda ts: tf.concat(ts, axis=2),
              name='D-STA_Concat')([x1,x2,x3,x4])
x = L.Lambda(lambda ts: tf.concat(ts, axis=2),
             name='D-OUT_Concat')([x0,x])

x = copy_and_mapping(x, 8, 'D-OUT_CopyAndMapping')
x = LayerLinBnRelu(x, 64, 'D-OUT_LBR1', use_bias=False)
x = LayerLinBnRelu(x, 64, 'D-OUT_LBR2', use_bias=False,
                   LeakyAlpha=0.2)

output_points = L.Dense(3, activation=None, name='D-OUT_lin')(x)

PCT_Decoder = M.Model(inputs=[input_feats,input_eye_seed],
                      outputs=output_points)
PCT_Dec_list = [layer.name for layer in PCT_Decoder.layers]

# Decoder Visualization
plot_model(4):  Visualize the architecture of whole decoder.

In [None]:
plot_model(PCT_Decoder, show_shapes=True, dpi=96,
           to_file='model_dec1_whole.png')

plot_model(5): Visualize inside of the Source-Target Attention layer.

In [None]:
# plotting inside of the layer: Source Target Attention layer
i = PCT_Dec_list.index('D-STA1')
plot_model(PCT_Decoder.layers[i], show_shapes=True, dpi=96,
           to_file='model_dec2_SourceTargetAttention.png')

## Model summary

In [None]:
PCT_Encoder.summary()

In [None]:
PCT_Decoder.summary()

# Training and Prediction

In [None]:
AE = M.Model(inputs=[PCT_Encoder.input, input_eye_seed],
             outputs=PCT_Decoder([PCT_Encoder.output, input_eye_seed]))

In many cases of evaluating similarity between original and reconstructed point cloud, chamfer distance is adopted as loss function, which is not affected by data order.

In [None]:
# https://stackoverflow.com/questions/47060685/chamfer-distance-between-two-point-clouds-in-tensorflow/54767428
# modified as follows:
# 1) respond to cases that the size of array1 and array2 are different
# 2) dtype=tf.float64 --> dtype=tf.float32
def distance_matrix(array1, array2):
    _, num_features = array1.shape
    expanded_array1 = tf.tile(array1, (array2.shape[0], 1))
    expanded_array2 = tf.reshape(
            tf.tile(tf.expand_dims(array2, 1), 
                    (1, array1.shape[0], 1)),
            (-1, num_features))
    distances = tf.norm(expanded_array1-expanded_array2, axis=1)
    distances = tf.reshape(distances, (array2.shape[0], array1.shape[0]))
    return distances

def av_dist(array1, array2):
    distances = distance_matrix(array1, array2)
    distances1 = tf.reduce_min(distances, axis=0)
    distances1 = tf.reduce_mean(distances1)
    distances2 = tf.reduce_min(distances, axis=1)
    distances2 = tf.reduce_mean(distances2)
    return distances1, distances2

def av_dist_sum(arrays):
    array1, array2 = arrays
    av_dist1, av_dist2 = av_dist(array1, array2)
    return av_dist1+av_dist2

def chamfer_distance_tf(array1, array2):
    # batch_size, num_point, num_features = array1.shape

    dist = tf.reduce_mean(
               tf.map_fn(av_dist_sum, elems=(array1, array2), dtype=tf.float32)
           )
    return dist

In [None]:
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-3,
    decay_steps=2000000//batch_size,
    decay_rate=0.1)
opt = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
AE.compile(optimizer=opt,
           loss=chamfer_distance_tf)

In [None]:
if load_weight:
    load_dir = nbdatapath+'/pretrained'
    latest = tf.train.latest_checkpoint(load_dir)
    AE.load_weights(latest)
    initial_epoch = int(os.path.basename(latest).lstrip('cp-').rstrip('.ckpt'))
else:
    initial_epoch = 0

In [None]:
checkpoint_path = checkpoint_dir + '/cp-{epoch:04d}.ckpt'
def eye_seed(X):
    return tf.zeros([X.shape[0],1,1])

if train_epochs>0:
    cp = tf.keras.callbacks.ModelCheckpoint
    cp_callback = cp(checkpoint_path, save_weights_only=True,
                     verbose=1, period=10)
    AE.fit([X, eye_seed(X)], X, 
           validation_data = ([Xv, eye_seed(Xv)], Xv),
           batch_size=batch_size,
           initial_epoch=initial_epoch,
           epochs=train_epochs,
           verbose=1, callbacks=[cp_callback])
    latest = tf.train.latest_checkpoint(checkpoint_dir)

print('evaluation <train data>:')
AE.evaluate([X, eye_seed(X)], X,
            batch_size=batch_size, verbose=1)
print('evaluation <val data>:')
AE.evaluate([Xv, eye_seed(Xv)], Xv,
            batch_size=batch_size, verbose=1)

In [None]:
def AE_predict(X):
    X_feat = PCT_Encoder.predict(X, batch_size=batch_size,
                                 verbose=1)
    X_pred = PCT_Decoder.predict([X_feat, eye_seed(X)],
                                 batch_size=batch_size,
                                 verbose=1)
    X_feat = X_feat.squeeze()
    return X_feat, X_pred

print('prediction <train data>:')
X_feat, X_pred = AE_predict(X)
print('prediction <val data>:')
Xv_feat, Xv_pred = AE_predict(Xv)
print('prediction <test data>:')
Xt_feat, Xt_pred = AE_predict(Xt)

## Postprocess

In [None]:
def compare_plot(Xi, Xi_pred):
    fig = plt.figure(figsize=(12,5))
    fig.tight_layout()
    ax = fig.add_subplot(121, projection='3d')
    PCPlot3d(ax, Xi)
    ax = fig.add_subplot(122, projection='3d')
    PCPlot3d(ax, Xi_pred)
    plt.subplots_adjust(left=0.05, right=0.95, bottom=0, top=1)
    plt.show()

%matplotlib inline
i=15
compare_plot(X[i], X_pred[i])
plt.show()

# Feature Visualization with t-SNE
By memory limitation in kaggle environment, t-SNE requires data sampling.

In [None]:
%%time
Xall_feat = np.concatenate([X_feat[:4000],
                            Xv_feat[:600],
                            Xt_feat[:900]])
Xall_embd = TSNE(n_components=2, random_state=0,
                 verbose=1).fit_transform(Xall_feat)
X_embd, Xv_embd, Xt_embd = np.split(Xall_embd,
                                    [4000,4000+600])
del Xall_feat, Xall_embd; gc.collect()

In [None]:
%matplotlib inline

fig = plt.figure(figsize=(6,18))
fig.tight_layout()

axs = [fig.add_subplot(311+i) for i in range(3)]

for ax, title, item1, item2, marker in zip(
    axs,
    ['train', 'val', 'test'],
    [X_embd, Xv_embd, Xt_embd],
    [cat_label[:4000], cat_label_v[:600],
     cat_label_t[:900]],
    ['.', 'x', 'x']):
    sns.scatterplot(item1[:,0], item1[:,1], item2,
                    hue_order = cat_list,
                    marker=marker, palette='tab20_r', ax=ax)
    ax.set_title(title,fontsize=20)
    ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.savefig(ver_label+'_tsne.png', bbox_inches='tight')

This scatterplot looks like a world map of Dragon Quest and I like it.  
The autoencoder framework belongs to unsupervised learning and categories hasn't used for training, but the arrangements of categories in the map of train, val, and test are in good agreement.

## Animation
Finally, here is the code to create the animation at the beginning of this notebook.

In [None]:
id_list = [3411, 1929, 8910, 8727, 3556,
           7956, 3793, 11334, 3411]
X_feat_inter = X_feat[id_list[0]]
X_feat_inter = np.expand_dims(X_feat_inter, 0)

inter_num=40
for i in range(1,len(id_list)):
    list_temp=[]
    for j in range(inter_num):
        a = (j+1)/float(inter_num)
        list_temp.append(X_feat[id_list[i-1]]*(1-a)+
                         X_feat[id_list[i]]*a)
    list_temp += [X_feat[id_list[i]]]*20
    X_feat_inter = np.concatenate([X_feat_inter,
                                   np.stack(list_temp, axis=0)])
X_feat_inter = np.expand_dims(X_feat_inter, axis=1)    
print(X_feat_inter.shape)

In [None]:
frames = X_feat_inter.shape[0]
X_pred_inter = PCT_Decoder.predict([X_feat_inter,
                                    eye_seed(X_feat_inter)],
                                   batch_size=batch_size,
                                   verbose=1)
print(X_pred_inter.shape)

In [None]:
X_inter_direct = np.expand_dims(X[id_list[0]],0)

for i in range(1,len(id_list)):
    list_temp=[]
    for j in range(inter_num):
        a = (j+1)/float(inter_num)
        list_temp.append(X[id_list[i-1]]*(1-a)+
                         X[id_list[i]]*a)
    list_temp += [X[id_list[i]]]*20
    X_inter_direct = np.concatenate([X_inter_direct,
                                   np.stack(list_temp, axis=0)])
print(X_inter_direct.shape)

In [None]:
def make_gif(X_frames, gif_name):
    frames = X_frames[0].shape[0]
    assert frames <= 1000, f'frame_size={frames} is too large.'
    fig = plt.figure(figsize=(12,5))
    fig.tight_layout()
    axs = [fig.add_subplot(121+i, projection='3d')
           for i in range(2)]

    def update(f):
        for i, ax in enumerate(axs):
            ax.cla()
            ax.view_init(elev=20, azim=30+float(f)/frames*360)
            PCPlot3d(ax, X_frames[i][f])
            
    plt.subplots_adjust(left=0.05, right=0.95, bottom=0, top=1)            
    anim = FuncAnimation(fig, update, frames=frames,
                         interval=100)
    anim.save(gif_name, writer="pillow")

In [None]:
%%time
%matplotlib inline
make_gif([X_inter_direct, X_pred_inter], ver_label+".gif")