# Relationship Example

This Notebook show a simple tool used to check the validity of relationship property

_______

## Relationship property

The `relationship property` is defined between two lists and can take 4 values:
- **derived** : relationship 1 - n between the two lists
- **coupled** : relationship 1 - 1 between the two lists
- **crossed** : relationship n - n between the two lists (when all values of the first list are associated with all values of the second list)
- **linked** : other cases

## Check tool 

Calculation of the property with only standard python functions.

In [1]:
def check_relationship(field1, field2):
    
    dist = len(set(zip(field1, field2)))
    len1 = len(set(field1))
    len2 = len(set(field2))
    
    if dist == len1 and dist > len2:    
        return "field 2 is derived from field 1"
    if dist == len2 and dist > len1:    
        return "field 1 is derived from field 2"    
    if dist == len1 and dist == len2:   
        return "field 2 and field 1 are coupled"
    if dist == len1 * len2:             
        return "field 2 and field 1 are crossed"
    return "field 1 and field 2 are linked"

## Fast check tool

This example uses Numpy functions

In [4]:
import numpy as np
import pandas as pd

def fast_check_relationship(field1, field2):
    
    field1 = field1.astype('category').cat.codes if isinstance(field1, pd.Series) else field1
    field2 = field2.astype('category').cat.codes if isinstance(field2, pd.Series) else field2
        
    f1   = np.array(field1)
    f2   = np.array(field2)
    f1f2 = np.column_stack((f1, f2))
    
    dist = len(np.unique(f1f2, axis=0))
    len1 = len(np.unique(f1))
    len2 = len(np.unique(f2))
    
    if dist == len1 and dist > len2:    
        return "field 2 is derived from field 1"
    if dist == len2 and dist > len1:    
        return "field 1 is derived from field 2"    
    if dist == len1 and dist == len2:   
        return "field 2 and field 1 are coupled"
    if dist == len1 * len2:             
        return "field 2 and field 1 are crossed"
    return "field 1 and field 2 are linked"

## Example


|Quarter___ |Month___|Name___|Nickname|Year___|Semester|
|:---:|:---:|:---:|:---:|:---:|:---:|
|T1 |jan|john|jock |2020|s1|
|T2 |apr|paul|paulo|2020|s2|
|T2 |jun|leah|lili |2021|s1|
|T1 |feb|paul|paulo|2021|s2|
|T2 |may|paul|paulo|2022|s1|
|T1 |jan|john|jock |2022|s2|

    
In this example each list has a specific relationship with another.

In [2]:
example = [ [  'T1',    'T2',   'T2',    'T1',    'T2',   'T1'],
            [ 'jan',   'apr',  'jun',   'feb',   'may',  'jan'],
            ['john',  'paul', 'leah',  'paul',  'paul', 'john'],
            ['jock', 'paulo', 'lili', 'paulo', 'paulo', 'jock'],
            [  2020,    2020,   2021,    2021,    2022,   2022],
            [  's1',    's2',   's1',    's2',    's1',   's2']]

## Test with simple data

The result of applying the check_relationship function to the example above is as follows:

In [3]:
print(check_relationship(example[0], example[1]))  #field 1 (quarter) is derived from field 2 (month)
print(check_relationship(example[2], example[3]))  #field 2 (nickname) and field 1 (name) are coupled
print(check_relationship(example[4], example[5]))  #field 2 (semester) and field 1 (year) are crossed
print(check_relationship(example[1], example[4]))  #field 1 (month) and field 2 (year) are linked

field 1 is derived from field 2
field 2 and field 1 are coupled
field 2 and field 1 are crossed
field 1 and field 2 are linked


## Test with pandas data

In [15]:
example_dic = {'field' + str(ind) : field for ind, field in enumerate(example) }
example_df = pd.DataFrame(example_dic)
example_df

Unnamed: 0,field0,field1,field2,field3,field4,field5
0,T1,jan,john,jock,2020,s1
1,T2,apr,paul,paulo,2020,s2
2,T2,jun,leah,lili,2021,s1
3,T1,feb,paul,paulo,2021,s2
4,T2,may,paul,paulo,2022,s1
5,T1,jan,john,jock,2022,s2


In [16]:
print(fast_check_relationship(example_df['field0'], example_df['field1']))  #field 1 (quarter) is derived from field 2 (month)
print(fast_check_relationship(example_df['field2'], example_df['field3']))  #field 2 (nickname) and field 1 (name) are coupled
print(fast_check_relationship(example_df['field4'], example_df['field5']))  #field 2 (semester) and field 1 (year) are crossed
print(fast_check_relationship(example_df['field1'], example_df['field4']))  #field 1 (month) and field 2 (year) are linked

field 1 is derived from field 2
field 2 and field 1 are coupled
field 2 and field 1 are crossed
field 1 and field 2 are linked
