In this kernel, we will use Weights and Biases Tables feature to interactively perform basic EDA. The purpose of creating this kernel is many fold:
* Educate fellow Kagglers to use W&B Tables.
* Perform Basic EDA.
* Show how this can be useful for model performance visualization. (Working on it)

W&B Tables let you to log, query, and analyze data interactively. This can help you understand your dataset, visualize model predictions, and share insights in a central dashboard. 


Why should you use W&B Tables?

* It is suited for quick EDA.

* It helps understand the data better with few lines of code. Here's a [quick colab notebook](http://wandb.me/tables-quickstart).

* It lets you see the "actual" data in it's entirety. With matplotlib based visualization you will have to plot everything in batches and it not very scalable.

* You can filter, sort and group data which can help answer some fundamental questions.

* It is well suited to visualize model predictions and compare models on example level. You can check out [this Kaggle kernel](https://www.kaggle.com/ayuraj/better-data-understanding-with-w-b-tables) to learn more about model prediction visualization.

Read more about Tables [here](https://wandb.ai/wandb/posts/reports/Announcing-W-B-Tables-Iterate-on-Your-Data--Vmlldzo4NTMxNDU).

# Imports and Setup

In [None]:
!pip install -q --upgrade wandb

In [None]:
import os
import numpy as np
import pandas as pd
from tqdm import tqdm
from pathlib import Path
import matplotlib.pyplot as plt

The code cell below imports W&B and login using Kaggle secrets.

In [None]:
# Import wandb
import wandb

try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    secret_value_0 = user_secrets.get_secret("wandb_api")
    wandb.login(key=secret_value_0)

    anony=None
except:
    anony = "must"
    print('If you want to use your W&B account, go to Add-ons -> Secrets and add your W&B access token. Use the Label name as "wandb_api". \nGet your W&B access token from here: https://wandb.ai/authorize')

In [None]:
CONFIG = {'competition': 'petfinder', 
          '_wandb_kernel': 'ayut'}

# Dataset

In [None]:
ROOT_PATH = Path('../input/petfinder-pawpularity-score')
TRAIN_IMGS_PATH = ROOT_PATH / 'train/'

In [None]:
# Read csv file
train_df = pd.read_csv(f'{ROOT_PATH}/train.csv')
train_df.head()

The code cell below logs the entire dataset of this competition.

In [None]:
columns = train_df.columns.tolist()
columns.insert(1, 'image')
print(columns)

# Visualize Dataset Interactively using W&B Tables

It only requires 5 lines of extra code to get the power of W&B Tables. 

1. You first need to initialize a W&B run using `wandb.init` API. This step is common for any W&B Logging.
2. Create a `wandb.Table` object. Imagine this to be an empty Pandas Dataframe. 
3. Iterate through each row of the `train.csv` file and `add_data` to the `wandb.Table` object. Imagine this to be appending new rows to your Dataframe. 
4. Log the W&B Tables using `wandb.log` API. You will use this API to log almost anything to W&B.
5. In a Juypter like interactive session, you need to call `wandb.finish` to close the initialized W&B run. 

In [None]:
# Initialize a W&B run to log images
run = wandb.init(project='petfinder-viz', config=CONFIG, anonymous=anony) # W&B Code 1

data_at = wandb.Table(columns=columns) # W&B Code 2

for i in tqdm(range(len(train_df))):
    row = train_df.loc[i]
    img_id = row.Id

    data_at.add_data(img_id,                                            
                     wandb.Image(f'{TRAIN_IMGS_PATH}/{img_id}.jpg'),
                     *tuple(row.values[1:])) # W&B Code 3

wandb.log({'Raw Petfinder data': data_at}) # W&B Code 4
wandb.finish() # W&B Code 5

In [None]:
# This is just to display the W&B run page in this interactive session.
from IPython import display

# we create an IFrame and set the width and height
iF = display.IFrame(run.url, width=1080, height=720)
iF

# Quick EDA using W&B Tables

### [Check out the W&B Tables $\rightarrow$](https://wandb.ai/ayush-thakur/petfinder-viz/runs/1k1frhav)

![img](https://i.imgur.com/cV9ycET.gif)

### Number of images

There are a total of 9912 images. You can see this in the annotated image below. Also note all the column names. 

![img](https://i.imgur.com/m28sGYX.png)

### Groupby "Pawpularity"

You can groupby any column in the W&B Tables.
* Click on the triple dot icon in the column name of your choice.
* Click on Group by.

![img](https://i.imgur.com/kVyKYbX.png)

* There are a total of 100 unique "pawpularity" values. 
* You can visualize examples belonging to each pawpularity value.
* You can see the distribution for other columns.  

![img](https://i.imgur.com/bPe43Xp.png)

! Note that you cannot do multiple grouping. 

### Count

You can easily add of a count column and sort it in ascending and descending order. 

![img](https://i.imgur.com/5sptLco.gif)