# Object: Example of specifying the "relationship" property 
## Goal
- show on a real example how to specify the links between fields 
- identify the contributions that a tool for analyzing these links could have

## Presentation of the example
It concerns the IRVE file of VE charging stations (data used: https://static.data.gouv.fr/resources/fichier-consolide-des-bornes-de-recharge-pour-vehicules-electriques/20220629-080611/consolidation-etalab-schema-irve-v-2.0.2-20220628.csv). 

The IRVE file contains a list of charging stations with in particular: 
- for a station: an Id, a name, an address and coordinates
- for each station several charging points identified by an Id_pdc 
- an operator for each station 

Only a few rows and columns have been extracted for the example (table below for 4 stations):

|nom_operateur	|id_station_itinerance	|nom_station	|adresse_station	|coordonneesXY	|id_pdc_itinerance|
|:----|:----|:----|:----|:----|:----|					
|SEVDEC	|FRSEVP1SCH01	|SCH01	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0101|
|SEVDEC	|FRSEVP1SCH03	|SCH03	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0301|
|SEVDEC	|FRSEVP1SCH02	|SCH02	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0201|	
|Sodetrel	|FRS35PSD35711	|RENNES - PLACE HONORE COMMEREUC	|13 Place HonorÃ© Commeurec 35000 Rennes	|[-1.679739, 48.108482]	|FRS35ESD357111|
|Sodetrel	|FRS35PSD35712	|RENNES - PLACE HONORE COMMEREUC	|13 Place HonorÃ© Commeurec 35000 Rennes	|[-1.679739, 48.108482]	|FRS35ESD357112|				
|Virta	|FRE10E30333	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445075, 41.995246]	|FRE10E30333|
|Virta	|FRE10E20923	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445073, 41.995246]	|FRE10E20923|
|Virta	|FRE10P20922	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445072, 41.995246]	|FRE10P20922|
|Virta	|FRE10P20921	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445071, 41.995246]	|FRE10P20921|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202603|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202602|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202601|

In particular, there are a few errors: 
- the id and name of the station operated by SEVDEC is different for each charging point,
- the id of the station operated by Sodetrel is also different for each charging point,
- Virta station coordinates and ids are also variable depending on charging points

## improvement of the specification
The errors found could be avoided by defining the dependency rules between columns according to the data model associated with the table. 

There are three entities: 
- the operator who can operate several stations (a single field: nom_operateur)
- the stations which contain several charging points (four fields: id_station_itinerance, nom_station, adresse_station, coordonnéesXY),
- the charging points (a single field: id_pdc_itinerance)

This data model results in the following specifications: 
- the operator field is derived from the id_station field (1-n relationship)
- the id_station_itinerance field is derived from the id_pdc_itinerance field (1-n relationship)
- the nom_station, addresse_station, coordonnéesXY fields are coupled to the id_station field (relation 1-1)

These specifications translate into "relationship" properties for each of the fields:

```
« name »: « nom_operateur »
« relationship » : {
    « parent » : « id_station_itinerance »,
    « link » : « derived » 
},
« name »: « id_station_itinerance »
« relationship » : {
    « parent » : « id_pdc_itinerance »,
    « link » : « derived » 
},
« name »: « nom_station »
« relationship » : {
    « parent » : « id_station_itinerance »,
    « link » : « derived » 
},
« name »: « adresse_station »
« relationship » : {
    « parent » : « id_station_itinerance »,
    « link » : « derived » 
},
« name »: « coordonnéesXY »
« relationship » : {
    « parent » : « id_station_itinerance »,
    « link » : « derived » 
}
```


------
## specification check tool example

- a csv file is populated with the above table
- an 'Ilist' object is initialized with this file


In [1]:
from pprint import pprint
import os
print(os.getcwd())
os.chdir('../../../Environnemental-Sensing/python/ES')
from ilist import Ilist

chemin = 'C:/Users/a179227/OneDrive - Alliance/perso Wx/ES standard/python ESstandard/validation/irve/'
file = chemin + 'IRVE_example.csv'

irve = Ilist.from_csv(file, header=True, optcsv=None)
print('row number : ', len(irve))
print('fields list : ')
pprint(irve.indexinfos(keys=['num', 'name'], base=True), width=120)

C:\Users\a179227\OneDrive - Alliance\perso Wx\ES standard\python ESstandard\validation\irve
row number :  12
fields list : 
[{'name': 'nom_operateur', 'num': 0},
 {'name': 'id_station_itinerance', 'num': 1},
 {'name': 'nom_station', 'num': 2},
 {'name': 'adresse_station', 'num': 3},
 {'name': 'coordonneesXY', 'num': 4},
 {'name': 'id_pdc_itinerance', 'num': 5}]


## initial control 
In the chosen example we have one operator per station, the relationship between operator and station must therefore be 'coupled' rather than 'derived'. 

We note that only one relation is correct (between id_station and id_pdc).


In [2]:
operateur, id_station, nom_station, adresse, coord, id_pdc = irve.lindex
print('operateur is coupled with id_station : ', id_station.iscoupled(operateur))
print('id_station is derived from id_pdc : ', id_station.isderived(id_pdc))
print('nom_station is coupled with id_station : ', nom_station.iscoupled(id_station))
print('adresse_station is coupled with id_station : ', adresse.iscoupled(id_station))
print('coordonneesXY is coupled with id_station : ', coord.iscoupled(id_station))


operateur is coupled with id_station :  False
id_station is derived from id_pdc :  True
nom_station is coupled with id_station :  False
adresse_station is coupled with id_station :  False
coordonneesXY is coupled with id_station :  False


----
## Application of an imposed structure
Records that are inconsistent with a defined data pattern can also be searched.

In this example, the columns can be grouped according to two entities (this amounts to considering the columns as attributes of each of the entities): the stations (columns 0 to 4), the charging points (column 5).

To identify the inconsistent data, we impose the couplings (see detail in the cell).


In [3]:
id_station.coupling(operateur, derived=False)
id_pdc.coupling(id_station)
id_station.coupling(nom_station, derived=False)
id_station.coupling(adresse, derived=False)
id_station.coupling(coord, derived=False)
pprint(irve.indexinfos(keys=['num', 'name', 'parent', 'typecoupl']), width=120)

[{'name': 'nom_operateur', 'num': 0, 'parent': 5, 'typecoupl': 'derived'},
 {'name': 'id_station_itinerance', 'num': 1, 'parent': 0, 'typecoupl': 'coupled'},
 {'name': 'nom_station', 'num': 2, 'parent': 0, 'typecoupl': 'coupled'},
 {'name': 'adresse_station', 'num': 3, 'parent': 0, 'typecoupl': 'coupled'},
 {'name': 'coordonneesXY', 'num': 4, 'parent': 0, 'typecoupl': 'coupled'},
 {'name': 'id_pdc_itinerance', 'num': 5, 'parent': 5, 'typecoupl': 'crossed'}]


## Checking against the imposed structure
Forcing the structure results in additional data which is checked by the 'getduplicates' function. 

A new column is added with True value when a record respects the structure and False otherwise. In the example considered, the last three records corresponding to operator DEBELEC are correct 

Note : for more detail, a column could be had for each of the defined couplings.

In [4]:
duplic = irve.getduplicates(irve.lname, '$filter')
print(irve.lidx[6].val)

[False, False, False, False, False, False, False, False, False, True, True, True]


----
## data correction
The corrections to be made to comply with the specification could be as follows:
- field id_station: FRSEVP1SCH (first 3), FRS35PSD35711 (2 next), FRE10E2092 (4 next)
- field nom_station: SCH (first 3)
- field coordonneesXY: [9.445071, 41.995246] from 6th to 8th

The corrected table would therefore be:

|nom_operateur	|id_station_itinerance	|nom_station	|adresse_station	|coordonneesXY	|id_pdc_itinerance|
|:----|:----|:----|:----|:----|:----|					
|SEVDEC	|FRSEVP1SCH	|SCH	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0101|
|SEVDEC	|FRSEVP1SCH	|SCH	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0301|
|SEVDEC	|FRSEVP1SCH	|SCH	|151 Rue d'Uelzen 76230 Bois-Guillaume	|[1.106329, 49.474202]	|FRSEVE1SCH0201|	
|Sodetrel	|FRS35PSD35711	|RENNES - PLACE HONORE COMMEREUC	|13 Place HonorÃ© Commeurec 35000 Rennes	|[-1.679739, 48.108482]	|FRS35ESD357111|
|Sodetrel	|FRS35PSD35711	|RENNES - PLACE HONORE COMMEREUC	|13 Place HonorÃ© Commeurec 35000 Rennes	|[-1.679739, 48.108482]	|FRS35ESD357112|				
|Virta	|FRE10E2092	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445071, 41.995246]	|FRE10E30333|
|Virta	|FRE10E2092	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445071, 41.995246]	|FRE10E20923|
|Virta	|FRE10P2092	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445071, 41.995246]	|FRE10P20922|
|Virta	|FRE10P2092	|Camping Arinella	|Route de la mer, Brushetto - 20240 Ghisonaccia	|[9.445071, 41.995246]	|FRE10P20921|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202603|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202602|
|DEBELEC	|FRSGAP1M2026	|M2026	|2682 Boulevard FranÃ§ois Xavier Fafeur 11000 Carcassonne	|[2.298185, 43.212574]	|FRSGAE1M202601|

In [5]:
id_station.setvalue(0, 'FRSEVP1SCH')
id_station.setvalue(1, 'FRSEVP1SCH')
id_station.setvalue(2, 'FRSEVP1SCH')
id_station.setvalue(3, 'FRS35PSD35711')
id_station.setvalue(4, 'FRS35PSD35711')
id_station.setvalue(5, 'FRE10E2092')
id_station.setvalue(6, 'FRE10E2092')
id_station.setvalue(7, 'FRE10E2092')
id_station.setvalue(8, 'FRE10E2092')
nom_station.setvalue(0, 'SCH')
nom_station.setvalue(1, 'SCH')
nom_station.setvalue(2, 'SCH')
coord.setvalue(5, '[9.445071, 41.995246]')
coord.setvalue(6, '[9.445071, 41.995246]')
coord.setvalue(7, '[9.445071, 41.995246]')
irve.reindex()

Ilist[12, 7]

## New check 
The check carried out with this new data shows that the specification would then be respected:


In [6]:
print('operateur is coupled with id_station : ', id_station.iscoupled(operateur))
print('id_station is derived from id_pdc : ', id_station.isderived(id_pdc))
print('nom_station is coupled with id_station : ', nom_station.iscoupled(id_station))
print('adresse_station is coupled with id_station : ', adresse.iscoupled(id_station))
print('coordonneesXY is coupled with id_station : ', coord.iscoupled(id_station))

operateur is coupled with id_station :  True
id_station is derived from id_pdc :  True
nom_station is coupled with id_station :  True
adresse_station is coupled with id_station :  True
coordonneesXY is coupled with id_station :  True
