# Fetching data using `SqlFetcher`
Translating using a SQL database. This notebook assumes that the ***Prepare for `SqlFetcher` demo***-step from the [PickleFetcher](../pickle-translation/PickleFetcher.ipynb) demo notebook has been completed.

In [1]:
import sys

import rics

# Print relevant versions
print(f"{rics.__version__=}")
print(f"{sys.version=}")
!git log --pretty=oneline --abbrev-commit -1

rics.__version__='0.17.0.dev1'
sys.version='3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0]'
[33mcbd8da2[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m, [m[1;31morigin/main[m[33m, [m[1;31morigin/HEAD[m[33m)[m Rerun some notebooks


In [2]:
from rics.utility.logs import basic_config, logging

basic_config(level=logging.INFO, rics_level=logging.DEBUG)

## Create translator from config
Click [here](config.toml) to see the file.

In [3]:
from rics.translation import Translator

translator = Translator.from_config("config.toml")
translator

2022-10-08T15:45:39.660 [rics.translation.fetching.SqlFetcher:DEBUG] Engine(postgresql+pg8000://postgres:***@localhost:5432/imdb): Metadata created in 0.109644 sec.
2022-10-08T15:45:39.662 [rics.mapping.Mapper:DEBUG] Begin computing match scores for values=('id',) in context='title_basics' to candidates=('titleType', 'index', 'tconst', 'isAdult', 'startYear', 'primaryTitle', 'endYear', 'genres', 'runtimeMinutes', 'originalTitle', 'int_id_tconst') using HeuristicScore([force_lower_case()] -> AbstractFetcher.default_score_function).
2022-10-08T15:45:39.670 [rics.mapping.Mapper:DEBUG] Computed 1x11 match scores in 0.00208911 sec:
candidates  titleType  index  tconst  isAdult  startYear  primaryTitle  endYear  genres  runtimeMinutes  originalTitle  int_id_tconst
values                                                                                                                               
id               -inf   -inf     inf     -inf       -inf          -inf     -inf    -inf          

Translator(online=True: fetcher=SqlFetcher(Engine(postgresql+pg8000://postgres:***@localhost:5432/imdb), tables=['title_basics', 'name_basics']))

## Make some data to translate

In [4]:
import pandas as pd

engine = translator._fetcher._engine


def first_title(seed=None, n=1000):
    df = pd.read_sql("SELECT * FROM name_basics;", engine).sample(n, random_state=seed)
    df["firstTitle"] = df.knownForTitles.str.split(",").str[0]
    return df[["nconst", "firstTitle"]]

In [5]:
translator.store().cache

2022-10-08T15:45:39.771 [rics.mapping.Mapper:DEBUG] Begin computing match scores for values=('original_name', 'to', 'name', 'from') in context='title_basics' to candidates=('titleType', 'index', 'tconst', 'isAdult', 'startYear', 'primaryTitle', 'endYear', 'genres', 'runtimeMinutes', 'originalTitle', 'int_id_tconst') using HeuristicScore([force_lower_case()] -> AbstractFetcher.default_score_function).
2022-10-08T15:45:39.785 [rics.mapping.Mapper:DEBUG] Computed 4x11 match scores in 0.00633102 sec:
candidates     titleType  index  tconst  isAdult  startYear  primaryTitle  endYear  genres  runtimeMinutes  originalTitle  int_id_tconst
values                                                                                                                                  
original_name       -inf   -inf    -inf     -inf       -inf          -inf     -inf    -inf            -inf            inf           -inf
to                  -inf   -inf    -inf     -inf       -inf          -inf      inf    -

TranslationMap('name_basics': 168310 IDs, 'title_basics': 45674 IDs)

## Get the name and the "first" appearance for actors
In the IMDb list anyway. I have no idea how they're ordered in "knownForTitles".

In [6]:
df = first_title(seed=5)
df.head()

Unnamed: 0,nconst,firstTitle
33993,nm0260875,tt0255068
22215,nm0167306,tt0252264
76585,nm0604711,tt0052933
47602,nm0369739,tt0125301
164264,nm6981261,tt3952746


## Translate

In [7]:
translator.translate(df).head(5)

2022-10-08T15:45:43.539 [rics.mapping.Mapper:DEBUG] Begin computing match scores for values=('nconst', 'firstTitle') to candidates=('title_basics', 'name_basics') using HeuristicScore([like_database_table()] -> modified_hamming).
2022-10-08T15:45:43.545 [rics.mapping.Mapper:DEBUG] Computed 2x2 match scores in 0.00440562 sec:
candidates  title_basics  name_basics
values                               
nconst              -inf          inf
firstTitle           inf         -inf
2022-10-08T15:45:43.548 [rics.mapping.Mapper.accept:DEBUG] Accepted: 'nconst' -> 'name_basics'; score=inf (short-circuit or override).
2022-10-08T15:45:43.550 [rics.mapping.Mapper.accept.details:DEBUG] This match supersedes 1 other matches:
    'nconst' -> 'title_basics'; score=-inf (superseded by short-circuit or override).
2022-10-08T15:45:43.551 [rics.mapping.Mapper.accept:DEBUG] Accepted: 'firstTitle' -> 'title_basics'; score=inf (short-circuit or override).
2022-10-08T15:45:43.552 [rics.mapping.Mapper.accept.de

Unnamed: 0,nconst,firstTitle
33993,nm0260875:Margarito Esparza *1936†2016,tt0255068 not translated; default name=Title u...
22215,nm0167306:Rick Cluchey *1933†2015,tt0252264 not translated; default name=Title u...
76585,nm0604711:Henry Morgan *1915†1994,tt0052933 not translated; default name=Title u...
47602,nm0369739:Svatopluk Havelka *1925†2009,tt0125301 not translated; default name=Title u...
164264,nm6981261:Tyler Sanders *2004†2022,tt3952746:Just Add Magic (original: Just Add M...


In [8]:
translator.translate(df, inplace=True)  # returns None
df.head(5)

2022-10-08T15:45:43.906 [rics.mapping.Mapper:DEBUG] Begin computing match scores for values=('nconst', 'firstTitle') to candidates=('title_basics', 'name_basics') using HeuristicScore([like_database_table()] -> modified_hamming).
2022-10-08T15:45:43.911 [rics.mapping.Mapper:DEBUG] Computed 2x2 match scores in 0.00335408 sec:
candidates  title_basics  name_basics
values                               
nconst              -inf          inf
firstTitle           inf         -inf
2022-10-08T15:45:43.914 [rics.mapping.Mapper.accept:DEBUG] Accepted: 'nconst' -> 'name_basics'; score=inf (short-circuit or override).
2022-10-08T15:45:43.916 [rics.mapping.Mapper.accept.details:DEBUG] This match supersedes 1 other matches:
    'nconst' -> 'title_basics'; score=-inf (superseded by short-circuit or override).
2022-10-08T15:45:43.917 [rics.mapping.Mapper.accept:DEBUG] Accepted: 'firstTitle' -> 'title_basics'; score=inf (short-circuit or override).
2022-10-08T15:45:43.919 [rics.mapping.Mapper.accept.de

Unnamed: 0,nconst,firstTitle
33993,nm0260875:Margarito Esparza *1936†2016,tt0255068 not translated; default name=Title u...
22215,nm0167306:Rick Cluchey *1933†2015,tt0252264 not translated; default name=Title u...
76585,nm0604711:Henry Morgan *1915†1994,tt0052933 not translated; default name=Title u...
47602,nm0369739:Svatopluk Havelka *1925†2009,tt0125301 not translated; default name=Title u...
164264,nm6981261:Tyler Sanders *2004†2022,tt3952746:Just Add Magic (original: Just Add M...
