# Setup

- The below example shows a way in which you can implement the Google Facet tool to dive into your data for analytical purposes
- In this example you can explore both the option of Dive and Overview, both different ways of analysing and seeing your dataset
- If you want to use your own dataset change the directory given for variable 'dataset_dir' to the relevant directory of your dataset
- When changing the directory, ensure your data is in the "dataset" folder and that your data is a CSV format. Change directory to -> 'dataset/yourcsvfilename.csv'

In [42]:
import pandas as pd
import base64
from sklearn.model_selection import train_test_split
from IPython.display import display, HTML, clear_output

from facets_overview.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

In [67]:
clear_output()

In [44]:
dataset_dir = 'dataset/MCD_survey_data.csv'
dataset = pd.read_csv(dataset_dir)

features = ["first_name", "last_name", "age", "country", "favorite_color", "frequency_of_visits", "favorite_menu_item", "employment_status", "Gender"] # Select features you'd like to have included in your train & test dataset .
dataset = dataset[features]                                                                                                                             # If you wish to keep all data you do not require this step.

In [45]:
train_dataset, test_dataset = train_test_split(dataset, test_size=0.33, random_state=42)

In [81]:
# Display the Dive visualization for the training data.

def dive(train_dataset, test_dataset):
    clear_output()
    jsonstr = train_dataset.to_json(orient='records')
    HTML_TEMPLATE = """
            <script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js"></script>
            <link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html">
            <facets-dive id="elem" height="600"></facets-dive>
            <script>
              var data = {jsonstr};
              document.querySelector("#elem").data = data;
            </script>"""
    html = HTML_TEMPLATE.format(jsonstr=jsonstr)
    return html

def overview(train_dataset,test_dataset):
    clear_output()
    gfsg = GenericFeatureStatisticsGenerator()
    proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train_dataset},
                                      {'name': 'test', 'table': test_dataset}])
    protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")

    HTML_TEMPLATE2 = """
            <script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js"></script>
            <link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html" >
            <facets-overview id="elem"></facets-overview>
            <script>
              document.querySelector("#elem").protoInput = "{protostr}";
            </script>"""
    html = HTML_TEMPLATE2.format(protostr=protostr)
    return html

# How to use

## In order to use the Google Facets Tool, follow the below steps:
    
    - Insert or use a new cell bellow. To insert a new cell click into a cell, press ESC then double tap 'B' twice to create a cell bellow.
    - If you need to remove a cell - click cell -> Press ESC -> Double click 'D'
    
    - Once a new cell is insert use the following code to view a 'dive' Facet on your dataset:
        """
        html = dive(train_dataset,test_dataset)
        display(HTML(html))
        """
    
    - Use the following code to view an 'overview' on your dataset:
        """
        html = overview(train_dataset,test_dataset)
        display(HTML(html))
        """
    
    # IMPORTANT NOTICE!
    Once you have completed using either of the two options, ensure that you delete the cell you were using by pressing ESC -> Double press 'D'. Jupyter Studio does not allow for two cells to be running visuals at the same time, ONLY USE ONE AT A TIME.
    If any of the views when used singularly cause issues, ensure that you have saved your notebook and refresh the page; this will fix any issues with viewing
    ENSURE YOU USE LIGHT MODE WHEN USING THIS TOOL.
    Change mode by navigating to -> Settings -> Theme -> JupyterLab Light

In [84]:
html = dive(train_dataset,test_dataset)
display(HTML(html))