In [21]:
import numpy as np
from scipy.stats import multivariate_normal

## Prepare the data

Use dataset Skin_NonSkin.txt from [here](http://archive.ics.uci.edu/ml/datasets/Skin+Segmentation) \[1\].

> The skin dataset is collected by randomly sampling B,G,R values from face images of various age groups (young, middle, and old), race groups (white, black, and asian), and genders obtained from FERET database and PAL database. Total learning sample size is 245057; out of which 50859 is the skin samples and 194198 is non-skin samples. Color FERET Image Database, PAL Face Database from Productive Aging Laboratory, The University of Texas at Dallas.

In [22]:
data = np.loadtxt('Skin_NonSkin.txt')

data1 is the skin samples, 
data2 is the non-skin samples

In [23]:
data1 = data[data[:,3]==1,:]
data2 = data[data[:,3]==2,:]

Then split the whole dataset into training dataset and testing dataset. I randomly take $n$ samples from data1 and data2 respectively, to form a test dataset. d1train and d2train are the training dataset, d1test and d2test are the testing dataset. np.random.seed is used to ensure a reproducible result.

In [24]:
np.random.seed(1)
np.random.shuffle(data1)
np.random.shuffle(data2)
ntest = 1000
d1train = data1[:-ntest,:]
d2train = data2[:-ntest,:]
d1test = data1[-ntest:,:]
d2test = data2[-ntest:,:]

## Create *generative* model

Two things you can play with here \[2\]:

* Color space
* Model

Here I am using RGB color space and three-dimension gaussian model. u1 and s1 are the mean and covariance matrix of the skin model, u2 and s2 are the mean and covariance of the non-skin model.

In [25]:
u1 = np.average(d1train[:,0:3],axis=0)
u2 = np.average(d2train[:,0:3],axis=0)
s1 = np.cov(d1train[:,0:3].T)
s2 = np.cov(d2train[:,0:3].T)

Calculate the prior probability over the states as $$Pr\left(w\right)=Bern_w\left[\lambda\right].$$ pw1 is $Pr\left(w=\text{skin}\right)=\lambda$. pw2 is $Pr\left(w=\text{non-skin}\right)=1-\lambda$

In [26]:
pw1 = d1train.shape[0]*1./d2train.shape[0]
pw2 = 1-pw1

## Calculate the inference

Class conditional likelihood/probability of each pixel of being skin or non-skin pixel

In [27]:
ll11 = multivariate_normal.pdf(d1test[:,0:3],u1,s1)
ll12 = multivariate_normal.pdf(d1test[:,0:3],u2,s2)
ll21 = multivariate_normal.pdf(d2test[:,0:3],u1,s1)
ll22 = multivariate_normal.pdf(d2test[:,0:3],u2,s2)

Calculate posterior probability according to Bayes' shown as equation 6.14

$$
Pr\left(w=1|x\right)=\frac{Pr\left(x|w=1\right)Pr\left(w=1\right)}{\sum_{k=0}^1Pr\left(x|w=k\right)Pr\left(w=k\right)}
$$

In [28]:
d1bayes = ll11*pw1/(ll11*pw1+ll12*pw2)
d2bayes = ll22*pw2/(ll21*pw1+ll22*pw2)

Use a threshold to classify a pixel as skin or non-skin

In [29]:
threshold = 0.5
d1err = d1test[d1bayes<threshold,:]
d2err = d2test[d2bayes<threshold,:]

In [30]:
print('False positive {}, {}%'.format(d1err.shape[0],d1err.shape[0]*100./ntest))
print('False negative {}, {}%'.format(d2err.shape[0],d2err.shape[0]*100./ntest))

False positive 56, 5.6%
False negative 2, 0.2%


The detection rate also depends on random seed used by random.shuffle.

# Reference

1. Rajen Bhatt, Abhinav Dhall, 'Skin Segmentation Dataset', UCI Machine Learning Repository
2. Vezhnevets, V., Sazonov, V., & Andreeva, A. (2003, September). A survey on pixel-based skin color detection techniques. In Proc. Graphicon (Vol. 3, pp. 85-92).