In [None]:
# Author: Zhengxiang (Jack) Wang 
# Date: 2021-07-27
# GitHub: https://github.com/jaaack-wang 
# About: embedding visualization using paddlepaddle tool

# Overview

In previous notebooks, we learned how to [visualize embeddings using VisualDL](https://colab.research.google.com/drive/1B9pcYR9fVvmB1pPWiIqb0u_WmxlY--T8?usp=sharing), a paddlepaddle Deep Learning Visualization Toolkit. However, it turned out that [VisualDL](https://github.com/PaddlePaddle/VisualDL) can be less reliable to use (in Canada at least), especially when we are to visualize a large embedding matrix (defined by the number of embeddings times each embedding dimension). <br><br>

In this notebook, we will learn an alternative way to visualize embeddings using [tensorboard](https://www.tensorflow.org/tensorboard), a tensorflow version of visualization toolkit. Luckily, we can do this just in a browser without downloading tensorboard (although it is pre-installed in Google Colab).


<br>


<table align="right">
  <td>
    <a target="_blank" href="https://colab.research.google.com/drive/1HZdDA_TzdJhGo_uIUSa84rP6yQjHvMT3?usp=sharing"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run in Google Colab </a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/jaaack-wang"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> Author's GitHub </a>
  </td>
  <td>
    <a href="https://docs.google.com/uc?export=download&id=1HZdDA_TzdJhGo_uIUSa84rP6yQjHvMT3"><img src="https://www.tensorflow.org/images/download_logo_32px.png" /> Download this notebook </a>
  </td>
</table> 


<br>


- [1. General use of `tensorboard` for embedding visualization](#1)
- [2. Another example: Visualizing word embeddings](#2)
- [3. References](#3)

<a name="1"></a>
# 1. General use of `tensorboard` for embedding visualization

The general use of `tensorboard` can be found on the [official documents](https://www.tensorflow.org/tensorboard/get_started). Please note that, if you run the official jupyter notebook guide named [Visualizing Data using the Embedding Projector in TensorBoard](https://www.tensorflow.org/tensorboard/tensorboard_projector_plugin), you may find that this actually yields a "No dashboards are active for the current data set" warning as shown below:

<img src='https://drive.google.com/uc?export=view&id=1KDSDLO11K7GIWJVVb81O_Knvxt5Yl9a7' width="500" height="300">

<br>

To get around this, you can save the embeddings (named weights) by running the following code (remeber that the `metadata.tsv` is already saved):

```python
with open(os.path.join(log_dir, 'embeddings.tsv'), "w") as f:
  for weight in weights.numpy():
    f.write('\t'.join([str(i) for i in weight.tolist()]))
    f.write('\n')
  f.close()
```

<br>

And then, you can go to tensorboard's [embedding projector website](http://projector.tensorflow.org) and load the `embeddings.tsv` and `metadata.tsv` files by clicking "Load" and "Choose File":


<img src='https://drive.google.com/uc?export=view&id=1fWEaDN_pdiDBYlnH7FRk_YJHwRUSZf2M' width="600" height="400">

<br>

Finally, you will see the following visualized embeddings:

<img src='https://drive.google.com/uc?export=view&id=1nbrJ7XsvUl7tl5bgivwOzCKSg4_GyEq1' width="800" height="600">

<br>

**You can also download this [tensorboard_visualization_example_files.zip](https://drive.google.com/file/d/1XCwW_lH0ncwOVDrSVorl_75tLFYv4lvR/view?usp=sharing) file and upload the two files inside to tensorboard's [embedding projector website](http://projector.tensorflow.org) to replicate the visualization above.**

<a name="2"></a>
# 2. Another example: Visualizing word embeddings

Like what we did in [embedding visualization using paddlepaddle tool.ipynb](https://colab.research.google.com/drive/1B9pcYR9fVvmB1pPWiIqb0u_WmxlY--T8?usp=sharing), here we will visualize the first 10,000 words of `glove.6B.50d.txt` from [GloVe website](https://nlp.stanford.edu/projects/glove/) for illustration. Alternatively, for your convience, you can also download the [glove.6B.50d_reduced.txt](https://drive.google.com/file/d/1wU0LLC3KcZleSsT-eRrYQ0puC1_ykXjJ/view?usp=sharing) for replicating this notebook. If you want to only download `glove.6B.50d.txt`, you can also click [here](https://drive.google.com/file/d/1o1fUeoAt260P90FeP_L5eICiQowHIcvY/view?usp=sharing) without downloading the large zip file from GloVe webiste that contains other files. 

<br>

If you run the notebook in Google Colab, you can just click `"Add shortcut to Drive"` and run the following code without needing to download the [glove.6B.50d_reduced.txt](https://drive.google.com/file/d/1wU0LLC3KcZleSsT-eRrYQ0puC1_ykXjJ/view?usp=sharing) file. 

<br>

**More on how to handle external files in Colab can be found [here](https://colab.research.google.com/notebooks/io.ipynb).**


In [None]:
# authorizing the access to files on your Drive
from google.colab import drive
drive.mount('/drive')
file_path = '/drive/My Drive/glove.6B.50d_reduced.txt'

Mounted at /drive


In [None]:
# convert the file content into a metadata.tsv and an mbeddings.tsv
# suitable for visualization in tensorboard projector

# load the glove file
glove = glove = open(file_path, 'r')

# create two .tsv files
writer1 = open('glove_words.tsv', 'w')
writer2 = open('glove_embds.tsv', 'w')

for line in glove:
  word, embds = line.split(maxsplit=1)
  writer1.write(word + '\n')
  writer2.write('\t'.join(embds.split()) + '\n')

**Notes:**
- Please be careful with the way your matedata and embeddings are saved. 
- For matedata file, make sure that every word/label takes a line. 
- For embeddings file, make sure that every embedding takes a line and every folating number in an embedding takes a tab except the first one. 

In [None]:
# download the files to your local machine
# alternatively, you can click the file icon
# on the left side bar to download them 

# sometime the following code just does not work
from google.colab import files

files.download('glove_words.tsv')
files.download('glove_embds.tsv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**Final Output:**

If you see the following, congratulations!

<img src='https://drive.google.com/uc?export=view&id=1mjl-0ACM2pd4z_L-IbCW846jfURV7l9F' width="800" height="600">

<br>

**You can also download this [glove_tensorboard_visualization.zip](https://drive.google.com/file/d/1BUBBQHpYkYOOdSE9n6ZjS2aWbsILsTuX/view?usp=sharing) file and upload the two files inside to tensorboard's [embedding projector website](http://projector.tensorflow.org) to replicate the visualization above.**

<a name='3'></a>
# 3. References
- [TensorBoard Guide](https://www.tensorflow.org/tensorboard/get_started)