# Facets Dive Visualization

This module defines a [`FacetsDive`](/facets.html#FacetsDive) object that is used to create a [Facets Dive](https://pair-code.github.io/facets/) [visualization](https://pair-code.github.io/facets/quickdraw.html) directly from an `ImageDataBunch`. 

**🚨Currently Facets works only in _Jupyter Notebook_. It does not support Jupyter Lab. See [Issue 113](https://github.com/PAIR-code/facets/issues/113) in the Facets repo.**

**In Google Colab the images generated cannot be visualized in Dive. This is a limitation of Google Colab at the moment**.

In [None]:
# # Uncomment and run these if you're running this in Google Colab.
# !pip install numpy torchvision_nightly
# !pip3 install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html
# !pip3 install git+git://github.com/pgollakota/fastai.git@8e9ec7c
# !pip3 install ipywidgets
# !git clone https://github.com/PAIR-code/facets
# !jupyter nbextension install facets/facets-dist/

## Auto reload causes PIL errors in Google Colab. 
## Comment these two lines out if you're running this notebook on Google Colab.
%reload_ext autoreload
%autoreload 2

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.widgets import FacetsDive
from fastai.vision import *
from fastai import *
import pandas as pd

## FacetsDive

In [None]:
show_doc(FacetsDive, doc_string=False)

The [`FacetsDive`](/facets.html#FacetsDive) object has one method `show()` which renders the Facets Dive plugin in the notebook. `show()` also generates the thumbnails and sprite image to load into Facets Dive. These will be created in a `facets_dive` folder in the current directory.

## Demo - Birds

Here is a small demo that shows the capabilities of Facets Dive using the [`Caltech-UCSD Birds-200-2011`](https://course.fast.ai/datasets) dataset (ref. [Lin et al. 2015](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html)). The actual code to generate the visualization is only one line. The rest is data prep to get it into the right format to load easily.

In [None]:
birds_path = untar_data('https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011')
train_dir = birds_path/'fastai/train/'

# Let's clean this up a little bit such that these files will be in the test and train folders.
np.random.seed(1729)

def get_test_valid_fnames(path, split_valid=False):
    images_df = pd.read_csv(path / 'images.txt', sep=' ', names=['id', 'path'], index_col=0)
    train_test_split_df = pd.read_csv(path / 'train_test_split.txt',
                                      sep=' ', names=['id', 'is_train'], index_col=0)
    df = images_df.join(train_test_split_df)
    df.loc[:, 'class_name'] = df.loc[:, 'path'].apply(lambda x: x.split('/')[0])
    train_fnames=df.loc[df.is_train == True].path.tolist()
    test_fnames = df.loc[df.is_train == False].path.tolist()
    
    valid_fnames = []
    if split_valid:
        valid_fnames = df.loc[df.is_train == True].groupby(
            'class_name').apply(lambda x: x.sample(frac=0.2)).path.tolist()
        train_fnames = list(sorted(set(train_fnames) - set(test_fnames)))

    return (train_fnames, valid_fnames, test_fnames)

def arrange_files(path, valid_fnames, test_fnames):
    if os.path.exists(path / 'fastai'):
        shutil.rmtree(path / 'fastai')

    shutil.copytree(path / 'images', path / 'fastai/train')
    os.makedirs(path / f'fastai/test')
    os.makedirs(path / f'fastai/valid')

    for name, files in {'test': test_fnames, 'valid': valid_fnames}.items():
        for fname in files:
            folder_name, file_name = fname.split('/')
            if not os.path.exists(path / f'fastai/{name}/{folder_name}'):
                os.makedirs(path / f'fastai/{name}/{folder_name}')
            shutil.move(path / f'fastai/train/{fname}', path / f'fastai/{name}/{fname}')

train_fnames, v, te = get_test_valid_fnames(birds_path, False)
arrange_files(birds_path, v, te)
train_fpaths = [train_dir/x for x in train_fnames]

In [None]:
pat = re.compile(r'(\d+\.\w+)\/\w+\.jpg$')

In [None]:
data = ImageDataBunch.from_name_re(train_dir, pat=pat, fnames=train_fpaths, ds_tfms=get_transforms(), size=224, bs=64)
data.normalize(imagenet_stats)

In [None]:
learn = create_cnn(data, models.resnet34, metrics=accuracy)
learn.fit_one_cycle(4)

In [None]:
preds = learn.get_preds(with_loss=True)

### Explore all the data at once

In [None]:
viz = FacetsDive(data, preds=preds)

In [None]:
viz.show()

Here is a GIF of how it's going to look and how you can explore the images and dataset.

![Facets Dive Exploration 01](imgs/facets_dive_01.gif)

### Filter and explore

Facets Dive can't display more than 100 facets on one axis. We have 200 labels of birds. Anything more than 100 will get grouped to `other` which can be inconvenient. If you want to examine these more closely, or say you wanted to look at only images that have a loss > 0.5 or any other condition, you can pass a `filter_fn` to filter the data before generating the Facets Dive visualization. In the following example, we limit our exploration to classes - '001.Black_footed_Albatross' to '010.Red_winged_Blackbird' and with loss > 0.25. Any image will be retained in the final set if `filter_fn` returns truthy.

In [None]:
def filter_fn(**kwargs):
    return kwargs['class_idx'] < 10 and kwargs['loss'] > 0.25

In [None]:
small_viz = FacetsDive(data, preds=preds, filter_fn=filter_fn)

In [None]:
small_viz.show()

Here's another GIF that shows how you can explore this smaller subset.

![Dive exploration 2](imgs/facets_dive_02.gif)

You'll notice that Facets also automatically generates a bunch of other details about the images, like `width` and `height`, `is_vertical` etc. that you can also slice and dice by in the visualization.

## Troubleshooting

**🚨Currently Facets works only in _Jupyter Notebook_. It does not support Jupyter Lab. See [Issue 113](https://github.com/PAIR-code/facets/issues/113) in the Facets repo.**

- If you cannot see your visualzation make sure you're using Jupyter Notebook and not Jupyter Lab.
- If you still cannot see your visualization, make sure that the directory where the stripe image and metadata are stored (`facets_tmp` in current directory by default) is accessible to Jupyter server. This means that the folder cannot be located outside of the folder where Jupyter Server is running from. For example if you ran the `jupyter notebook` command from `/home/johndoe/course/`, the `facets_tmp` directory can be located in `/home/johndoe/course/**/facets_tmp`, but it cannot be located at `/home/facets_tmp`, or `/tmp/` etc.
- If you still cannot see it, make sure that the directory where sprite and metadata are stored is not a hidden folder.
- Still can't see it? Try the forums :)