# Cookiecutter ID translation demo

The main entry point is the `id_translation.translate()`-method, which should be enough for mose use cases. When working manually, the `id_translation.map()` and `id_translation.map_scores()` may be of interest as well.

In [1]:
from big_corporation_inc import id_translation

In [2]:
singleton = id_translation.get_singleton()
singleton

Translator(online=True: fetcher=MultiFetcher(max_workers=2, fetchers=[
    SqlFetcher(Engine(postgresql+pg8000://postgres:***@localhost:5002/sakila), whitelist=['language', 'address', 'city', 'country']),
    SqlFetcher(Engine(postgresql+pg8000://postgres:***@localhost:5002/sakila), blacklist={'language', 'address', 'city', 'country'}),
]))

# Available data

In [3]:
for source, placeholders in singleton.placeholders.items():
    print(f"Placeholders for {source=}:")
    print(f"    {placeholders[:6]}")

Placeholders for source='language':
    ['language_id', 'name', 'last_update']
Placeholders for source='address':
    ['address_id', 'address', 'address2', 'district', 'city_id', 'postal_code']
Placeholders for source='city':
    ['city_id', 'city', 'country_id', 'last_update']
Placeholders for source='country':
    ['country_id', 'country', 'last_update']
Placeholders for source='rental':
    ['rental_id', 'rental_date', 'inventory_id', 'customer_id', 'return_date', 'staff_id']
Placeholders for source='inventory':
    ['inventory_id', 'film_id', 'store_id', 'last_update']
Placeholders for source='staff':
    ['staff_id', 'first_name', 'last_name', 'address_id', 'email', 'store_id']
Placeholders for source='film':
    ['film_id', 'title', 'description', 'release_year', 'language_id', 'original_language_id']
Placeholders for source='actor':
    ['actor_id', 'first_name', 'last_name', 'last_update']
Placeholders for source='store':
    ['store_id', 'manager_staff_id', 'address_id', 'last

## Integrations
The `Translator` has support for built-in collections, as well as about types such as the `pandas.DataFrame`.

In [4]:
import pandas as pd

one = [[1] * len(singleton.sources)]
first = pd.DataFrame(one, columns=map("{}_id".format, singleton.sources))
first

Unnamed: 0,language_id,address_id,city_id,country_id,rental_id,inventory_id,staff_id,film_id,actor_id,store_id,payment_id,category_id,customer_id
0,1,1,1,1,1,1,1,1,1,1,1,1,1


The included config doesn't add `name`-column mappings for all tables. To avoid a crash, let's use a temporary format in which the name is optional.

In [5]:
id_translation.translate(first, fmt="{id}[:{name}]")

Unnamed: 0,language_id,address_id,city_id,country_id,rental_id,inventory_id,staff_id,film_id,actor_id,store_id,payment_id,category_id,customer_id
0,1:English,1:47 MySakila Drive,1:A Corua (La Corua),1:Afghanistan,1,1,1:Mike,1,1:PENELOPE,1,1,1:Action,1:MARY


Let's focus on tables that support our preferred **`{id}:{name}`** translation format.

In [6]:
columns = [
    "actor_id",
    "address_id",
    "category_id",
    "city_id",
    "country_id",
    "customer_id",
]
first = first[columns]
first

Unnamed: 0,actor_id,address_id,category_id,city_id,country_id,customer_id
0,1,1,1,1,1,1


In [7]:
id_translation.translate(first)

Unnamed: 0,actor_id,address_id,category_id,city_id,country_id,customer_id
0,1:PENELOPE,1:47 MySakila Drive,1:Action,1:A Corua (La Corua),1:Afghanistan,1:MARY


# Singleton namespace
The top-level `big_corporation_inc.id_translation`-namespace exposes only the most important functions. More convenience functions for the singleton are available in the the `singleton` submodule.

## Mapping
Mapping is done automatically when calling `translate()`, but can also be done manually when needed.

In [8]:
id_translation.singleton.map(first)

{'actor_id': 'actor',
 'address_id': 'address',
 'category_id': 'category',
 'city_id': 'city',
 'country_id': 'country',
 'customer_id': 'customer'}

The `translate()`-method will accept a name-to-source mapping as the `names` argument.

```python
my_source = "actor"
names = {"actor_id": my_source, "customer_id": my_source}
```
Passing this mapping will map only the `actor_id` and `customer_id` columns, using the same `source='actor'` for both.

In [9]:
id_translation.translate(
    first,
    names={"actor_id": "actor", "customer_id": "actor"},
)

Unnamed: 0,actor_id,address_id,category_id,city_id,country_id,customer_id
0,1:PENELOPE,1,1,1,1,1:PENELOPE


Finally, you the actual scores used to make the mappings may be obtained by using the `map_scores()`-method. Higher is better. 

For filters and overrides, positive and negative infinity are used.

In [10]:
id_translation.singleton.map_scores(first).round(3)

candidates,language,address,city,country,rental,inventory,staff,film,actor,store,payment,category,customer
values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
actor_id,-0.002,0.059,0.12,0.065,0.099,-0.003,0.0,-0.004,0.993,-0.008,-0.009,0.115,0.109
address_id,0.056,0.988,-0.005,-0.006,-0.001,0.053,0.033,-0.004,-0.007,0.025,0.074,-0.01,0.047
category_id,0.123,-0.012,0.045,0.137,0.055,0.184,0.0,-0.004,-0.007,0.092,-0.009,0.99,0.114
city_id,0.064,-0.012,0.995,0.137,-0.001,0.04,0.0,0.246,0.118,-0.008,-0.009,0.128,0.06
country_id,0.035,-0.012,0.058,0.994,0.166,0.103,0.0,-0.004,0.06,0.059,-0.009,0.133,-0.011
customer_id,-0.002,0.044,-0.005,-0.006,0.027,-0.003,0.0,-0.004,0.043,-0.008,-0.009,0.115,0.989
