#**Gower's Similarity Coefficient**

In the previous examples of distance/similarity metric we used either continous or categorical data as inputs to our distance metrix, but not both. In real world databases we encounter both and will regularly find that we have to accomodate both in our cluster analysis. Gower's Similarity Coefficient is applied for mixed data types, namely, databases with continuous, ordinal or categorical variables  at the same time. Gower’s General Similarity Coefficient $S_{ij}$ compares two cases i and j and is defined as follows:

$$S_{ij}=\frac{\sum_kw_{ijk}S_{ijk}}{\sum_kw_{ij}}$$

where $S_{ijk}$ denotes the contribution provided by the k-th variable and  $w_{ijk}$ is usually 1 or 0 depending if the comparison is valid for the k-th variable.

#Ordinal and continous variables
Gower similarity defines the value of $S_{ijk}$ for ordinal and continuous variables as follows:

$$S_{ijk}=1-\frac{\lvert x_{ik}-x{jk} \rvert}{r_k}$$

where $r_k$ is the range of values for the *k-th* variable.

#Nominal Variables

The value of $s_{ijk}$ for nominal variables is 1 if $x_{ik} = x_{jk}$, or 0 if $x_{ik} ≠ x_{jk}$.  Thus $s_{ijk} = 1$ if cases *i* and *j* have the same "state" for attribute k, or 0 if they have different "states", and $w_{ijk} = 1$ if both cases have observed states for attribute $k$.

The following code snipet imports a [gower object](https://www.thinkdatascience.com/post/2019-12-16-introducing-python-package-gower/) and lets you calculate the distances. You can also specify the weights to determine what variables should be included in your distance measures. Again play with the parameters and see how you get on. Don't forget to leave your thoughts on the comments board.





We will have to install the gower library.

In [1]:
!pip install gower

Collecting gower
  Downloading gower-0.1.2-py3-none-any.whl.metadata (3.7 kB)
Downloading gower-0.1.2-py3-none-any.whl (5.2 kB)
Installing collected packages: gower
Successfully installed gower-0.1.2


This example has mixed variable types.

In [2]:
import numpy as np
import pandas as pd
import gower



Xd=pd.DataFrame({'age':[21,21,19, 30,21,21,19,30,None],
'gender':['M','M','N','M','F','F','F','F',None],
'civil_status':['MARRIED','SINGLE','SINGLE','SINGLE','MARRIED','SINGLE','WIDOW','DIVORCED',None],
'salary':[3000.0,1200.0 ,32000.0,1800.0 ,2900.0 ,1100.0 ,10000.0,1500.0,None],
'has_children':[1,0,1,1,1,0,0,1,None],
'available_credit':[2200,100,22000,1100,2000,100,6000,2200,None]})


Yd = Xd.iloc[1:2,:]
X = np.asarray(Xd.iloc[2:3,:])

Y = np.asarray(Yd)
print(X)
print(Y)

[[19.0 'N' 'SINGLE' 32000.0 1.0 22000.0]]
[[21.0 'M' 'SINGLE' 1200.0 0.0 100.0]]


In [None]:

gower.gower_dist.gower_matrix(X,Y,cat_features = [False, True,True,False,True,False])


array([[0.8333334]], dtype=float32)