# Relationship Example

This Notebook show a simple tool used to check the validity of relationship property

_______

## Relationship property

The `relationship property` is defined between two lists and can take 4 values:
- **derived** : relationship 1 - n between the two lists
- **coupled** : relationship 1 - 1 between the two lists
- **crossed** : relationship n - n between the two lists (when all values of the first list are associated with all values of the second list)
- **linked** : other cases

## Check tool 

Calculation of the property.

Note: 
The function `len(set(zip(field1, field2)))` is faster than the others below 
- `len(pd.Series(zip(series1, series2)).astype('category').cat.categories)`
- `len(np.unique(np.column_stack((numpy1, numpy2)), axis=0))`
- `len(np.unique(np.fromiter(zip(series1, series2), dtype='object')))`
- `len(df1[[name_field1, name_field2]].apply(tuple, axis=1).astype('category').cat.categories)`

In [1]:
import numpy as np
import pandas as pd

def check_relationship(field1, field2):
    
    field1 = list(field1.astype('category').cat.codes) if isinstance(field1, pd.Series) else field1
    field2 = list(field2.astype('category').cat.codes) if isinstance(field2, pd.Series) else field2
        
    dist = len(set(zip(field1, field2)))
    len1 = len(set(field1))
    len2 = len(set(field2))
    
    if dist == len1 and dist > len2:    
        return "field 2 is derived from field 1"
    if dist == len2 and dist > len1:    
        return "field 1 is derived from field 2"    
    if dist == len1 and dist == len2:   
        return "field 2 and field 1 are coupled"
    if dist == len1 * len2:             
        return "field 2 and field 1 are crossed"
    return "field 1 and field 2 are linked"

## Example


|Quarter___ |Month___|Name___|Nickname|Year___|Semester|
|:---:|:---:|:---:|:---:|:---:|:---:|
|T1 |jan|john|jock |2020|s1|
|T2 |apr|paul|paulo|2020|s2|
|T2 |jun|leah|lili |2021|s1|
|T1 |feb|paul|paulo|2021|s2|
|T2 |may|paul|paulo|2022|s1|
|T1 |jan|john|jock |2022|s2|

    
In this example each list has a specific relationship with another.

In [2]:
example = { 'quarter':  [  'T1',    'T2',   'T2',    'T1',    'T2',   'T1'],
            'month':    [ 'jan',   'apr',  'jun',   'feb',   'may',  'jan'],
            'name':     ['john',  'paul', 'leah',  'paul',  'paul', 'john'],
            'nickname': ['jock', 'paulo', 'lili', 'paulo', 'paulo', 'jock'],
            'year':     [  2020,    2020,   2021,    2021,    2022,   2022],
            'semester': [  's1',    's2',   's1',    's2',    's1',   's2'] }

## Test with simple data

The result of applying the check_relationship function to the example above is as follows:

In [3]:
print(check_relationship(example['quarter'], example['month']   ))  #field 1 (quarter) is derived from field 2 (month)
print(check_relationship(example['name'],    example['nickname']))  #field 2 (nickname) and field 1 (name) are coupled
print(check_relationship(example['year'],    example['semester']))  #field 2 (semester) and field 1 (year) are crossed
print(check_relationship(example['month'],   example['year']    ))  #field 1 (month) and field 2 (year) are linked

field 1 is derived from field 2
field 2 and field 1 are coupled
field 2 and field 1 are crossed
field 1 and field 2 are linked


## Test with pandas data

In [4]:
example_df = pd.DataFrame(example)
example_df

Unnamed: 0,quarter,month,name,nickname,year,semester
0,T1,jan,john,jock,2020,s1
1,T2,apr,paul,paulo,2020,s2
2,T2,jun,leah,lili,2021,s1
3,T1,feb,paul,paulo,2021,s2
4,T2,may,paul,paulo,2022,s1
5,T1,jan,john,jock,2022,s2


In [5]:
print(check_relationship(example_df['quarter'], example_df['month']))  #field 1 (quarter) is derived from field 2 (month)
print(check_relationship(example_df['name'], example_df['nickname']))  #field 2 (nickname) and field 1 (name) are coupled
print(check_relationship(example_df['year'], example_df['semester']))  #field 2 (semester) and field 1 (year) are crossed
print(check_relationship(example_df['month'], example_df['year']))  #field 1 (month) and field 2 (year) are linked

field 1 is derived from field 2
field 2 and field 1 are coupled
field 2 and field 1 are crossed
field 1 and field 2 are linked
