<div style="display: flex; background-color: #3F579F;">
    <h1 style="margin: auto; font-weight: bold; padding: 30px 30px 0px 30px; color:#fff;" align="center">Automatically classify consumer goods - P6</h1>
</div>
<div style="display: flex; background-color: #3F579F; margin: auto; padding: 5px 30px 0px 30px;" >
    <h3 style="width: 100%; text-align: center; float: left; font-size: 24px; color:#fff;" align="center">| Notebook - 3D visualization |</h3>
</div>
<div style="display: flex; background-color: #3F579F; margin: auto; padding: 10px 30px 30px 30px;">
    <h4 style="width: 100%; text-align: center; float: left; font-size: 24px; color:#fff;" align="center">Data Scientist course - OpenClassrooms</h4>
</div>

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">1. Libraries and functions</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">1.1. Libraries and functions</h3>
</div>

In [1]:
## General
import os
import pandas as pd
import numpy as np

## TensorFlow
import tensorflow as tf
from tensorboard.plugins import projector

## Own specific functions 
from functions import *

%load_ext tensorboard

# Path to save the embedding and checkpoints generated
LOG_DIR = "./logs/projections/"

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">2. Importing files and Initial analysis</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">2.1. Importing and preparing files</h3>
</div>

<div class="alert alert-block alert-info">
    We are going to load two datesets to plot them in 3D
</div>

In [2]:
df_text = pd.read_csv(r"datasets\tfidf_lemma_price_pca_tsne_3c.csv", index_col=[0])
df_sift = pd.read_csv(r"datasets\sift_price_bow_stemmed_pca_tsne_3c.csv", index_col=[0])

<div style="background-color: #506AB9;" >
    <h2 style="margin: auto; padding: 20px; color:#fff; ">3. Tensorboard projection</h2>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">3.1. Features from text (Lemmatization + TF-IDF) and price</h3>
</div>

<div class="alert alert-block alert-info">
    <p> In this case, we are going to plot the features from text features, it means that we don't use the descriptors and keypoints from the images</p>
</div>

In [24]:
df_text.head()

Unnamed: 0,tsne1,tsne2,tsne3,class_encode,class,cluster
0,0.598403,-5.262136,-18.087257,4,Home Furnishing,0
1,9.759707,-2.279524,-7.2742,0,Baby Care,0
2,10.362164,-3.132142,-7.469657,0,Baby Care,0
3,8.143699,-0.299828,-11.685761,4,Home Furnishing,0
4,9.323577,0.523113,-12.82989,4,Home Furnishing,0


<div class="alert alert-block alert-info">
    <p> Creating a file with only the features</p>
</div>

In [25]:
features = df_text[["tsne1", "tsne2", "tsne3"]].copy()
features.to_csv(LOG_DIR + "features.txt", sep='\t', index=False, header=False)

<div class="alert alert-block alert-info">
    <p> Creating a file with only the cluters (labels) as metadata</p>
</div>

In [26]:
metadata = df_text[["cluster"]].copy()
metadata.to_csv(LOG_DIR + "metadata.tsv", sep='\t', index=False, header=False)
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

<div class="alert alert-block alert-info">
    <p>Defining the vectos and weights</p>
</div>

In [27]:
features_vector = np.loadtxt(LOG_DIR + "features.txt")
features_vector

array([[  0.5984031,  -5.262136 , -18.087257 ],
       [  9.759707 ,  -2.2795243,  -7.2742004],
       [ 10.362164 ,  -3.132142 ,  -7.4696574],
       ...,
       [ -0.7556431,   2.2882307,   8.618675 ],
       [ -1.5104922,   4.792037 ,   9.938412 ],
       [ -1.1745269,   2.4400692,   9.097878 ]])

In [28]:
weights = tf.Variable(features_vector)
weights

<tf.Variable 'Variable:0' shape=(1050, 3) dtype=float64, numpy=
array([[  0.5984031,  -5.262136 , -18.087257 ],
       [  9.759707 ,  -2.2795243,  -7.2742004],
       [ 10.362164 ,  -3.132142 ,  -7.4696574],
       ...,
       [ -0.7556431,   2.2882307,   8.618675 ],
       [ -1.5104922,   4.792037 ,   9.938412 ],
       [ -1.1745269,   2.4400692,   9.097878 ]])>

<div class="alert alert-block alert-info">
    <p>Setting up the checkpoints</p>
</div>

In [29]:
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(LOG_DIR, "embedding.ckpt"))

'./logs/projections/embedding.ckpt-1'

<div class="alert alert-block alert-info">
    <p>Setting up config</p>
</div>

In [30]:
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()

<div class="alert alert-block alert-info">
    <p>Defining embeddings</p>
</div>

In [31]:
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = "metadata.tsv"

<div class="alert alert-block alert-info">
    <p>Initializing the projector based on the setup defined</p>
</div>

In [32]:
projector.visualize_embeddings(LOG_DIR, config)

<div class="alert alert-block alert-info">
    <p>Now run tensorboard against on log data we just saved.</p>
</div>

In [33]:
%tensorboard --logdir {LOG_DIR}

Reusing TensorBoard on port 6006 (pid 10136), started 0:38:25 ago. (Use '!kill 10136' to kill it.)

<div class="alert alert-block alert-info">
    <p>Below, a GIF with the visualization result.</p>
</div>

![3D visualization](images/text_analysis/3D-text-and-price.gif)

<div class="alert alert-block alert-success">
    <p><b>Observations / Conclusions</b></p>
    <p>It is clear the clusters in the plot. Also we can notice the inertia in each cluster</p>
</div>

<div style="background-color: #6D83C5;" >
    <h3 style="margin: auto; padding: 20px; color:#fff; ">3.2. Features from images (SIFT), text (Stemmatization + BoW) and price</h3>
</div>

<div class="alert alert-block alert-info">
    <p> In this case, we are going to plot the features from images features, text and price, it means that we use the descriptors and keypoints from the images</p>
</div>

In [13]:
df_sift.head()

Unnamed: 0,tsne1,tsne2,tsne3,class_encode,class,cluster
0,18.040327,-27.914232,21.225372,4,Home Furnishing,3
1,25.596672,-14.931697,-10.460277,0,Baby Care,1
2,25.068176,-9.576262,-17.14871,0,Baby Care,6
3,-1.152819,44.02373,9.688732,4,Home Furnishing,2
4,17.697859,26.595018,24.333076,4,Home Furnishing,2


<div class="alert alert-block alert-info">
    <p> Creating a file with only the features</p>
</div>

In [14]:
features = df_sift[["tsne1", "tsne2", "tsne3"]].copy()
features.to_csv(LOG_DIR + "features.txt", sep='\t', index=False, header=False)

<div class="alert alert-block alert-info">
    <p> Creating a file with only the cluters (labels) as metadata</p>
</div>

In [15]:
metadata = df_sift[["cluster"]].copy()
metadata.to_csv(LOG_DIR + "metadata.tsv", sep='\t', index=False, header=False)
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

<div class="alert alert-block alert-info">
    <p>Defining the vectos and weights</p>
</div>

In [16]:
features_vector = np.loadtxt(LOG_DIR + "features.txt")
features_vector

array([[ 18.040327 , -27.914232 ,  21.225372 ],
       [ 25.596672 , -14.931697 , -10.460277 ],
       [ 25.068176 ,  -9.576262 , -17.14871  ],
       ...,
       [ 14.725035 ,   1.4500479,  20.14525  ],
       [-10.671568 , -25.556484 , -10.067908 ],
       [  4.5293903, -15.778702 ,  26.165033 ]])

In [17]:
weights = tf.Variable(features_vector)
weights

<tf.Variable 'Variable:0' shape=(1050, 3) dtype=float64, numpy=
array([[ 18.040327 , -27.914232 ,  21.225372 ],
       [ 25.596672 , -14.931697 , -10.460277 ],
       [ 25.068176 ,  -9.576262 , -17.14871  ],
       ...,
       [ 14.725035 ,   1.4500479,  20.14525  ],
       [-10.671568 , -25.556484 , -10.067908 ],
       [  4.5293903, -15.778702 ,  26.165033 ]])>

<div class="alert alert-block alert-info">
    <p>Setting up the checkpoints</p>
</div>

In [18]:
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(LOG_DIR, "embedding.ckpt"))

'./logs/projections/embedding.ckpt-1'

<div class="alert alert-block alert-info">
    <p>Setting up config</p>
</div>

In [19]:
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()

<div class="alert alert-block alert-info">
    <p>Defining embeddings</p>
</div>

In [20]:
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = "metadata.tsv"

<div class="alert alert-block alert-info">
    <p>Initializing the projector based on the setup defined</p>
</div>

In [21]:
projector.visualize_embeddings(LOG_DIR, config)

<div class="alert alert-block alert-info">
    <p>Now run tensorboard against on log data we just saved.</p>
</div>

In [22]:
%tensorboard --logdir {LOG_DIR}

Reusing TensorBoard on port 6006 (pid 10136), started 0:21:39 ago. (Use '!kill 10136' to kill it.)

<div class="alert alert-block alert-info">
    <p>Below, a GIF with the visualization result.</p>
</div>

![3D visualization](images/text_analysis/3D-sift-text-price.gif)

<div class="alert alert-block alert-success">
    <p><b>Observations / Conclusions</b></p>
    <p>The clusters are not clear in the plot, they are dispersed</p>
</div>

<div class="alert alert-block alert-danger">
    <h1>>>>> FLAG POSITION &lt;&lt;&lt;&lt; </h1>
</div>

In [23]:
raise SystemExit("Stop right there!")

SystemExit: Stop right there!

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
