#### The other day I was listening to [Linear Digressions podcast](http://lineardigressions.com/) and they were talking about a tool to visualize numerical features called Chernoff faces ([episode](http://lineardigressions.com/episodes/2016/2/29/chernoff-faces-and-minard-maps)). I wanted to give it a try in this dataset just for fun.

### What are Chernoff faces?
#### Chernoff faces, invented by Herman Chernoff in 1973, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrow-slant have been found to carry significant weight) ([paper]( http://www.research.ibm.com/people/c/cjmorris/publications/Chernoff_990402.pdf)). Following is an example for the famous iris dataset ([source](https://archive.ics.uci.edu/ml/datasets/iris)) where features (sepal length, sepal width, petal length, petal width) are presented as Chernoff faces and the colour marks each of the species (i.e. the target: setosa, versicolor, virginica) ([source](https://github.com/antononcube/MathematicaForPrediction/blob/master/MarkdownDocuments/Making-Chernoff-faces-for-data-visualization.md)):

![Chernoff faces](https://camo.githubusercontent.com/ced6cd0923a923aade542c33f9ccc86614049e30ab7c0c1085ee7823752ce858/687474703a2f2f692e696d6775722e636f6d2f7550425a4a75666c2e676966)

#### It can be seen that different species have different characteristic facial features that let their indentification visually very easily.
#### Let's take a look now at how the Chernoff faces look like for the numerical features of this competition and whether the resultant faces can help to discriminate the target classes in some way.

In [None]:
import numpy as np 
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import os

PATH = "/kaggle/input/tabular-playground-series-mar-2021/"

In [None]:
# Function to draw Chernoff faces (source: https://gist.github.com/dribnet/e26f52f423f0656c1bb8fc6f4e741cc2#file-mpl_cfaces-py)

def cface(ax, x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18):
    # x1 = height  of upper face
    # x2 = overlap of lower face
    # x3 = half of vertical size of face
    # x4 = width of upper face
    # x5 = width of lower face
    # x6 = length of nose
    # x7 = vertical position of mouth
    # x8 = curvature of mouth
    # x9 = width of mouth
    # x10 = vertical position of eyes
    # x11 = separation of eyes
    # x12 = slant of eyes
    # x13 = eccentricity of eyes
    # x14 = size of eyes
    # x15 = position of pupils
    # x16 = vertical position of eyebrows
    # x17 = slant of eyebrows
    # x18 = size of eyebrows
    
    # transform some values so that input between 0,1 yields variety of output
    x3 = 1.9*(x3-.5)
    x4 = (x4+.25)
    x5 = (x5+.2)
    x6 = .3*(x6+.01)
    x8 = 5*(x8+.001)
    x11 /= 5
    x12 = 2*(x12-.5)
    x13 += .05
    x14 += .1
    x15 = .5*(x15-.5)
    x16 = .25*x16
    x17 = .5*(x17-.5)
    x18 = .5*(x18+.1)

    # top of face, in box with l=-x4, r=x4, t=x1, b=x3
    e = matplotlib.patches.Ellipse( (0,(x1+x3)/2), 2*x4, (x1-x3), fc='white', edgecolor='black', linewidth=2)
    # e.set_clip_box(ax.bbox)
    # e.set_facecolor([0,0,0])
    ax.add_artist(e)

    # bottom of face, in box with l=-x5, r=x5, b=-x1, t=x2+x3
    e = matplotlib.patches.Ellipse( (0,(-x1+x2+x3)/2), 2*x5, (x1+x2+x3), fc='white', edgecolor='black', linewidth=2)
    ax.add_artist(e)

    # cover overlaps
    e = matplotlib.patches.Ellipse( (0,(x1+x3)/2), 2*x4, (x1-x3), fc='white', edgecolor='black', ec='none')
    ax.add_artist(e)
    e = matplotlib.patches.Ellipse( (0,(-x1+x2+x3)/2), 2*x5, (x1+x2+x3), fc='white', edgecolor='black', ec='none')
    ax.add_artist(e)
    
    # draw nose
    ax.plot([0,0], [-x6/2, x6/2], 'k')
    
    # draw mouth
    p = matplotlib.patches.Arc( (0,-x7+.5/x8), 1/x8, 1/x8, theta1=270-180/np.pi*np.arctan(x8*x9), theta2=270+180/np.pi*np.arctan(x8*x9))
    ax.add_artist(p)
    
    # draw eyes
    p = matplotlib.patches.Ellipse( (-x11-x14/2,x10), x14, x13*x14, angle=-180/np.pi*x12, facecolor='white', edgecolor='black')
    ax.add_artist(p)
    
    p = matplotlib.patches.Ellipse( (x11+x14/2,x10), x14, x13*x14, angle=180/np.pi*x12, facecolor='white', edgecolor='black')
    ax.add_artist(p)

    # draw pupils
    p = matplotlib.patches.Ellipse( (-x11-x14/2-x15*x14/2, x10), .05, .05, facecolor='black')
    ax.add_artist(p)
    p = matplotlib.patches.Ellipse( (x11+x14/2-x15*x14/2, x10), .05, .05, facecolor='black')
    ax.add_artist(p)
    
    # draw eyebrows
    ax.plot([-x11-x14/2-x14*x18/2,-x11-x14/2+x14*x18/2],[x10+x13*x14*(x16+x17),x10+x13*x14*(x16-x17)],'k')
    ax.plot([x11+x14/2+x14*x18/2,x11+x14/2-x14*x18/2],[x10+x13*x14*(x16+x17),x10+x13*x14*(x16-x17)],'k')

In [None]:
# Read the training data
df_train = pd.read_csv(os.path.join(PATH, "train.csv"))
num_feats = [c for c in df_train.columns if c.startswith("cont")]
df_train_num = df_train[num_feats].values

#### The function used to generate the faces must ingest 17 features to plot them as facial features. Here we have only 11 numerical features and, since I didn't want to change the function code, I used constant values for the rest. I chose what I thought are the most distinctive facial features to represent the numerical features of the dataset.

In [None]:
cf = np.ones((df_train.shape[0], 17)) * 0.5
# Fill the columns with the desired facial features with our dataset
cf[:,[0,1,2,3,5,8,9,10,11,12,15]] = df_train_num

# Split the matrix according to the target value
cf_ones = cf[df_train["target"]==1, :]
cf_zeros = cf[df_train["target"]==0, :]

#### Let's plot some faces at random for each of the classes (background color salmon:0, backgorund color turquoise:1):

In [None]:
fig, ax = plt.subplots(5,10, figsize=(20,10))

for i in range(5):
    for j in range(10):
        if j < 5:
            cface(ax[i,j], .9, *cf_zeros[np.random.randint(cf_zeros.shape[0]),:])
            ax[i,j].set_facecolor('xkcd:salmon')
        else:
            cface(ax[i,j], .9, *cf_ones[np.random.randint(cf_ones.shape[0]),:])
            ax[i,j].set_facecolor('xkcd:turquoise')
        ax[i,j].axis([-1.2, 1.2, -1.2, 1.2])
        ax[i,j].set_xticks([])
        ax[i,j].set_yticks([])

#### I leave it to you to find the characteristic facial features (if any) of each class!