# Data model to Data schema
------
This Notebook describe an example to explain the link between Data model and Table schema.

The Dataset in which the records are countries is made up of four Fields :

- country : name of the country
- region : name of the region of the country
- code : alpha-2 country code
- population : population of the region (millions)

## Data model
Two entities are defined :
- country : The first attribute is the name of the country (primary key of the entity), the second is its alpha-2 country code. The value of this attribute is unique for each country.
- region : The first attribute is the name of the region (primary key of the entity), the second is its population. 
The data model is as follows :

In [2]:
from base64 import b64encode
from IPython.display import Image, display
from json_ntv import Ntv, MermaidConnec

In [3]:
# the Json data model is build with Json
country = { 
    'country and region:$erDiagram' : { 
        'entity': {
            'COUNTRY':  [ 
                ['string', 'country',  'PK' ], 
                ['string', 'code', 'unique'] 
            ], 
            'REGION': [ 
                ['string', 'region',  'PK'],
                ['number',    'population'] 
            ]
        },
        'relationship': [ 
            [ 'REGION', 'exactly one', 'identifying', 'one or more', 'COUNTRY',     'brings_together']
        ],

     } }

# It is converted in Mermaid structure and then displayed
diag = MermaidConnec.diagram(country)
display(Image(url="https://mermaid.ink/img/" + b64encode(diag.encode("ascii")).decode("ascii")))

## Rules to translate Data model in Table schema

Main rules :

* Each attribute in the data model if converted in a Field in the Table schema.
* The type defined in the data model is converted in a Type / Format in the Table schema
* Each Field has a 'derived' (or 'coupled' if the attribute is unique) relationship with the Field associated with the PK attribute of the same entity.
* The relationship between two entities is converted in a relationship between the Fields associated to PK attributes
* The cardinality with a 0 are translated with the same rules as the cardinalities with 1 (0 indicates that the Field is optional) 
* The cardinality of the data model relationships are translated as follows :
  * 1 - 1 : "coupled"
  * 1 - n : "derived"
  * n - n : "linked" 

Additional rules:

* If the Table schema has a "PrimaryKey" property, the "derived" relationships with the "primaryKey" Fields are implicit (the values of a "primarykey" Field are unique). They can be removed.


## Table schema



The deduced Data Schema is as follows:

```json
"schema": {
  "fields": [
    {"name": "country",    "type": "string"},
    {"name": "region",     "type": "string"},
    {"name": "code",       "type": "string"},
    {"name": "population", "type": "number"},
  ],
    "relationships":
      { "fields" : [ "region", "population"], "description" : "attributes",      "link" : "derived" },
      { "fields" : [ "region", "country"],    "description" : "brings_together", "link" : "derived" },
 }
 ```

The indication that the country code is unique for a country reinforces the relationship between "code" and "country" (it was "derived" and is now "coupled").
So this relationship is added in the schema.
To be consistent with the Data-model we can add the relationship between entities (but this constrinst will be always True).

```json
"schema": {
  "fields": [
    {"name": "country",    "type": "string"},
    {"name": "region",     "type": "string"},
    {"name": "code",       "type": "string"},
    {"name": "population", "type": "number"},
  ],
  "primaryKey": "country"
  "relationships": [
      { "fields" : [ "country", "code"],      "description" : "attributes",      "link" : "coupled" },
      { "fields" : [ "region", "population"], "description" : "attributes",      "link" : "derived" },
      { "fields" : [ "contry", "region"],    "description" : "brings_together", "link" : "derived" }
  ]
 }
 ```

## Example : before check

In [21]:
import pandas as pd
import ntv_pandas as npd

| country | region         | code  | population |
|---------|----------------|-------|------------|
| France  | European Union | FR    | 449        |
| Spain   | European Union | ES    | 48         |
| Estonia | European Union | ES    | 449        |
| Nigeria | Africa         | NI    | 1460       |

In [24]:
example1 = {'country' :   ['France', 'Spain', 'Estonia', 'Nigeria'],
            'region':     ['European Union', 'European Union', 'European Union', 'Africa'],
            'code':       ['FR', 'ES', 'ES', 'NI'],
            'population': [449, 48, 449, 1460]}
ex1 = pd.DataFrame(example1)

In [31]:
ana1 = ex1.npd.analysis()
print("country - code : ", ana1.get_relation('country', 'code').typecoupl)
print("region - population : ", ana1.get_relation('region', 'population').typecoupl, 
      ana1.get_relation('region', 'population').parent_child)
print("country - region : ", ana1.get_relation('country', 'region').typecoupl,
     ana1.get_relation('country', 'region').parent_child)

country - code :  derived
region - population :  derived False
country - region :  derived True


## Example : after corrections

| country | region         | code  | population |
|---------|----------------|-------|------------|
| France  | European Union | FR    | 449        |
| Spain   | European Union | ES    | 449         |
| Estonia | European Union | EE    | 449        |
| Nigeria | Africa         | NI    | 1460       |

In [32]:
example1 = {'country' :   ['France', 'Spain', 'Estonia', 'Nigeria'],
            'region':     ['European Union', 'European Union', 'European Union', 'Africa'],
            'code':       ['FR', 'ES', 'EE', 'NI'],
            'population': [449, 449, 449, 1460]}
ex1 = pd.DataFrame(example1)

In [33]:
ana1 = ex1.npd.analysis()
print("country - code : ", ana1.get_relation('country', 'code').typecoupl)
print("region - population : ", ana1.get_relation('region', 'population').typecoupl, 
      ana1.get_relation('region', 'population').parent_child)
print("country - region : ", ana1.get_relation('country', 'region').typecoupl,
     ana1.get_relation('country', 'region').parent_child)

country - code :  coupled
region - population :  coupled True
country - region :  derived True
