<div style="display: flex; background-color: #3F579F;">
    <h1 style="margin: auto; font-weight: bold; padding: 30px 30px 0px 30px; color:#fff;" align="center">Automatically classify consumer goods - P6</h1>
</div>
<div style="display: flex; background-color: #3F579F; margin: auto; padding: 5px 30px 0px 30px;" >
    <h3 style="width: 100%; text-align: center; float: left; font-size: 24px; color:#fff;" align="center">| Notebook - 3D visualization |</h3>
</div>
<div style="display: flex; background-color: #3F579F; margin: auto; padding: 10px 30px 30px 30px;">
    <h4 style="width: 100%; text-align: center; float: left; font-size: 24px; color:#fff;" align="center">Data Scientist course - OpenClassrooms</h4>
</div>

<div class="alert alert-block alert-info">
    <p>This notebook it is only to plot in 3D through Tensorboard, the reductions to 3 components of T-SNE</p>
    <p>So, we are going to plot only 2 datasets listed below</p>
    <ul style="list-style-type: square;">
        <li><b>tfidf_lemma_price_pca_tsne_3c</b> (NO features from images) - texts (Lemmatization + TF-IDF) and price</li>
        <li><b>sift_price_bow_stemmed_pca_tsne_3c</b> (WITH features from images) text (Stemmatization + BoW), price and images</li>
    </ul> 
</div>

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">1. Libraries and functions</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">1.1. Libraries and functions</h3>
</div>

In [1]:
## General
import os
import pandas as pd
import numpy as np

## TensorFlow
import tensorflow as tf
from tensorboard.plugins import projector

## Own specific functions 
from functions import *

%load_ext tensorboard

# Path to save the embedding and checkpoints generated
LOG_DIR = "./logs/projections/"

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">2. Importing files and Initial analysis</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">2.1. Importing and preparing files</h3>
</div>

<div class="alert alert-block alert-info">
    We are going to load two datesets to plot them in 3D
</div>

In [2]:
df_text = pd.read_csv(r"datasets\tfidf_lemma_price_pca_tsne_3c.csv", index_col=[0])
df_sift = pd.read_csv(r"datasets\sift_price_tfidf_stemmed_pca_tsne_3c.csv", index_col=[0])

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">3. Tensorboard projection</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">3.1. Features from text (Lemmatization + TF-IDF) and price</h3>
</div>

<div class="alert alert-block alert-info">
    <p> In this case, we are going to plot the features from text features, it means that we don't use the descriptors and keypoints from the images</p>
</div>

In [3]:
df_text.head()

Unnamed: 0,tsne1,tsne2,tsne3,class_encode,class,cluster
0,0.667675,-2.220376,-18.22473,4,Home Furnishing,5
1,6.882042,2.125256,-10.104879,0,Baby Care,1
2,7.703305,1.982806,-10.603261,0,Baby Care,1
3,5.802301,-3.787119,-10.801121,4,Home Furnishing,5
4,6.965024,-4.912904,-10.644814,4,Home Furnishing,5


<div class="alert alert-block alert-info">
    <p> Creating a file with only the features</p>
</div>

In [4]:
features = df_text[["tsne1", "tsne2", "tsne3"]].copy()
features.to_csv(LOG_DIR + "features.txt", sep='\t', index=False, header=False)

<div class="alert alert-block alert-info">
    <p> Creating a file with only the cluters (labels) as metadata</p>
</div>

In [5]:
metadata = df_text[["cluster"]].copy()
metadata.to_csv(LOG_DIR + "metadata.tsv", sep='\t', index=False, header=False)
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

<div class="alert alert-block alert-info">
    <p>Defining the vectos and weights</p>
</div>

In [6]:
features_vector = np.loadtxt(LOG_DIR + "features.txt")
features_vector

array([[  0.6676755 ,  -2.2203763 , -18.22473   ],
       [  6.8820424 ,   2.1252558 , -10.104879  ],
       [  7.703305  ,   1.9828062 , -10.603261  ],
       ...,
       [  0.36503768,   5.1988535 ,   6.311583  ],
       [ -0.6093114 ,   7.3941503 ,   7.6580634 ],
       [  0.09924615,   5.6943855 ,   6.15933   ]])

In [7]:
weights = tf.Variable(features_vector)
weights

<tf.Variable 'Variable:0' shape=(1050, 3) dtype=float64, numpy=
array([[  0.6676755 ,  -2.2203763 , -18.22473   ],
       [  6.8820424 ,   2.1252558 , -10.104879  ],
       [  7.703305  ,   1.9828062 , -10.603261  ],
       ...,
       [  0.36503768,   5.1988535 ,   6.311583  ],
       [ -0.6093114 ,   7.3941503 ,   7.6580634 ],
       [  0.09924615,   5.6943855 ,   6.15933   ]])>

<div class="alert alert-block alert-info">
    <p>Setting up the checkpoints</p>
</div>

In [8]:
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(LOG_DIR, "embedding.ckpt"))

'./logs/projections/embedding.ckpt-1'

<div class="alert alert-block alert-info">
    <p>Setting up config</p>
</div>

In [9]:
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()

<div class="alert alert-block alert-info">
    <p>Defining embeddings</p>
</div>

In [10]:
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = "metadata.tsv"

<div class="alert alert-block alert-info">
    <p>Initializing the projector based on the setup defined</p>
</div>

In [11]:
projector.visualize_embeddings(LOG_DIR, config)

<div class="alert alert-block alert-info">
    <p>Now run tensorboard against on log data we just saved.</p>
</div>

In [15]:
%tensorboard --logdir {LOG_DIR}

Reusing TensorBoard on port 6006 (pid 21972), started 11:22:58 ago. (Use '!kill 21972' to kill it.)

<div class="alert alert-block alert-info">
    <p>Below, a GIF with the visualization result.</p>
</div>

![3D visualization](P6_08_images/tfidf_lemma_price.gif)

<div class="alert alert-block alert-success">
    <p><b>Observations / Conclusions</b></p>
    <p>It is clear the clusters in the plot. Also we can notice the inertia in each cluster</p>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">3.2. Features from images (SIFT), text (Stemmatization + BoW) and price</h3>
</div>

<div class="alert alert-block alert-info">
    <p> In this case, we are going to plot the features from images features, text and price, it means that we use the descriptors and keypoints from the images</p>
</div>

In [16]:
df_sift.head()

Unnamed: 0,tsne1,tsne2,tsne3,class_encode,class,cluster
0,14.249583,-46.480534,4.669492,4,Home Furnishing,5
1,43.736507,-3.998729,-2.715959,0,Baby Care,4
2,38.040276,2.531593,-12.732166,0,Baby Care,4
3,21.627436,26.856392,55.65478,4,Home Furnishing,1
4,40.226837,-1.034995,30.369673,4,Home Furnishing,4


<div class="alert alert-block alert-info">
    <p> Creating a file with only the features</p>
</div>

In [17]:
features = df_sift[["tsne1", "tsne2", "tsne3"]].copy()
features.to_csv(LOG_DIR + "features.txt", sep='\t', index=False, header=False)

<div class="alert alert-block alert-info">
    <p> Creating a file with only the cluters (labels) as metadata</p>
</div>

In [18]:
metadata = df_sift[["cluster"]].copy()
metadata.to_csv(LOG_DIR + "metadata.tsv", sep='\t', index=False, header=False)
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

<div class="alert alert-block alert-info">
    <p>Defining the vectos and weights</p>
</div>

In [19]:
features_vector = np.loadtxt(LOG_DIR + "features.txt")
features_vector

array([[ 14.249583 , -46.480534 ,   4.6694922],
       [ 43.736507 ,  -3.9987288,  -2.7159588],
       [ 38.040276 ,   2.5315928, -12.732166 ],
       ...,
       [  6.2428493, -27.447577 ,  20.61281  ],
       [  3.7224538,  34.849216 , -23.339249 ],
       [ -8.112251 , -24.873575 ,  37.54836  ]])

In [20]:
weights = tf.Variable(features_vector)
weights

<tf.Variable 'Variable:0' shape=(1050, 3) dtype=float64, numpy=
array([[ 14.249583 , -46.480534 ,   4.6694922],
       [ 43.736507 ,  -3.9987288,  -2.7159588],
       [ 38.040276 ,   2.5315928, -12.732166 ],
       ...,
       [  6.2428493, -27.447577 ,  20.61281  ],
       [  3.7224538,  34.849216 , -23.339249 ],
       [ -8.112251 , -24.873575 ,  37.54836  ]])>

<div class="alert alert-block alert-info">
    <p>Setting up the checkpoints</p>
</div>

In [21]:
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(LOG_DIR, "embedding.ckpt"))

'./logs/projections/embedding.ckpt-1'

<div class="alert alert-block alert-info">
    <p>Setting up config</p>
</div>

In [22]:
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()

<div class="alert alert-block alert-info">
    <p>Defining embeddings</p>
</div>

In [23]:
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = "metadata.tsv"

<div class="alert alert-block alert-info">
    <p>Initializing the projector based on the setup defined</p>
</div>

In [24]:
projector.visualize_embeddings(LOG_DIR, config)

<div class="alert alert-block alert-info">
    <p>Now run tensorboard against on log data we just saved.</p>
</div>

In [25]:
%tensorboard --logdir {LOG_DIR}

Reusing TensorBoard on port 6006 (pid 21972), started 11:27:57 ago. (Use '!kill 21972' to kill it.)

<div class="alert alert-block alert-info">
    <p>Below, a GIF with the visualization result.</p>
</div>

![3D visualization](P6_08_images/sift_price_tfidf_stemmed.gif)

<div class="alert alert-block alert-success">
    <p><b>Observations / Conclusions</b></p>
    <p>The clusters are not clear in the plot, they are dispersed</p>
</div>