# Primer: TOML implementation
This notebook reconstructs the `Translator` showcased in the [Translation primer](../../../translation-primer.rst) using the a [TOML configuration](config.toml).

In [1]:
import sys
import rics
import id_translation

# Print relevant versions
print(f"{rics.__version__=}")
print(f"{id_translation.__version__=}")
print(f"{sys.version=}")

rics.__version__='3.2.0'
id_translation.__version__='0.5.1.dev1'
sys.version='3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0]'


In [2]:
rics.configure_stuff(format="[%(name)s:%(levelname)s] %(message)s")

👻 Configured some stuff just the way I like it!


## Translatable data

In [3]:
import pandas as pd

bite_report = pd.read_csv("biting-victims-2019-05-11.csv")
bite_report

Unnamed: 0,human_id,bitten_by
0,1904,1
1,1991,0
2,1991,2
3,1999,0


## Mapping
### Define heuristic function

This will map to map `id` to `animal_id` when `context="animals"`.

It will remap the correctly named `id` column in `humans.csv` as well, but this is not a problem since the best match will be used.

In [4]:
def smurf_column_heuristic(value, candidates, context):
    """Heuristic for matching columns that use the "smurf" convention."""
    return (
        # Handles plural form that ends with or without an s.
        f"{context[:-1]}_{value}" if context[-1] == "s" else f"{context}_{value}",
        candidates,  # unchanged
    )

## Moment of truth

In [5]:
from id_translation import Translator

translated_bite_report = Translator.from_config("config.toml").translate(bite_report)
translated_bite_report

[id_translation.Translator.translate:INFO] Finished translation of 2 names in 'DataFrame'-type data in 7ms, using name-to-source mapping: {'human_id': 'humans', 'bitten_by': 'animals'}.


Unnamed: 0,human_id,bitten_by
0,Mr. Fred (id=1904),Morris (id=1) the dog
1,Mr. Richard (id=1991),Tarzan (id=0) the cat
2,Mr. Richard (id=1991),Simba (id=2) the lion
3,Dr. Sofia (id=1999),Tarzan (id=0) the cat


In [6]:
expected = pd.read_csv("biting-victims-2019-05-11-translated.csv")
pd.testing.assert_frame_equal(translated_bite_report, expected)

## Print the config
Click [here](config.toml) to download.

In [7]:
!pygmentize config.toml

[37m################################################################################[39;49;00m[37m[39;49;00m
[37m# For help, see https://id-translation.readthedocs.io                          #[39;49;00m[37m[39;49;00m
[37m################################################################################[39;49;00m[37m[39;49;00m
[34m[translator][39;49;00m[37m[39;49;00m
fmt[37m [39;49;00m=[37m [39;49;00m[33m"[{title}. ]{name} (id={id})[ the {species}]"[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[37m# ------------------------------------------------------------------------------[39;49;00m[37m[39;49;00m
[37m# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'[39;49;00m[37m[39;49;00m
[37m# -> 'my_database.animals'. Overrides take precedence over scoring logic.[39;49;00m[37m[39;49;00m
[34m[translator.mapping][39;49;00m[37m[39;49;00m
score_function[37m [39;49;00m=[37m [39;49;00m[33m"equality"[39;49;00m[37m[39;49;00m
[34m[