# Nationality / Nacionalidades

Analyze the nationalities of individuals in the dataset.

## Setup

Create a TimelinkNotebook object. 

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [1]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()
tlnb.print_info(show_token=True)



Timelink version: 1.1.25
Project name: dehergne
Project home: /Users/jrc/mhk-home/sources/dehergne
Database type: sqlite
Database name: dehergne
Kleio image: timelinkserver/kleio-server
Kleio server token: kNUFYoTqbJpr4eBBJ99Jw5E1ulv5UY9d
Kleio server URL: http://127.0.0.1:8088
Kleio server home: /Users/jrc/mhk-home/sources/dehergne
Kleio server container: elastic_allen
Kleio version requested: latest
Kleio server version: 12.8.593 (2025-03-16 21:55:53)
SQLite directory: /Users/jrc/mhk-home/sources/dehergne/database/sqlite
Database version: 6ccf1ef385a6
Call print_info(show_password=True) to show the Postgres password
TimelinkNotebook(project_name=dehergne, project_home=/Users/jrc/mhk-home/sources/dehergne, db_type=sqlite, db_name=dehergne, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Extensions for this notebook 
(to later migrate to timelink-py)

In [2]:

from datetime import datetime
import pandas as pd
from timelink.kleio.utilities import format_timelink_date, convert_timelink_date


def calc_age_at(date_birth, today):
    """Compute the number of years between two dates"""
    # return None if either argument is None
    if date_birth is None or today is None:
        return None
    # Ensure the dates are datetime objects
    if not isinstance(date_birth, datetime):
        date_birth = convert_timelink_date(date_birth)
    if not isinstance(today, datetime):
        today = convert_timelink_date(today)

    if date_birth is None or today is None:
        return None

    # Compute the difference in years
    difference_in_years = (today - date_birth).days / 365.25
    return int(difference_in_years)


print("testing")
print(format_timelink_date('00000000'))
print(format_timelink_date(None))
print(format_timelink_date('1582'))
print(format_timelink_date('158203'))
print(format_timelink_date('1582-03-02'))
print(format_timelink_date('15820302'))
print(calc_age_at('1980-01-01', '2020-01-01'))
print(calc_age_at('1980-01-01', 0))

testing


1582
1582-03
1582-03-02
1582-03-02
40
None


### Database status

Count the number of rows in each table in the database.


In [3]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,29
1,alembic_version,1
2,aregisters,1
3,attributes,26742
4,blinks,200
5,class_attributes,70
6,classes,14
7,entities,32950
8,geoentities,359
9,goods,0


### Check the status of the files

Check the import status of the translated files

* I: Imported
* E: Imported with error
* W: Imported with warnings no errors
* N: Not imported
* U: Translation updated need to reimport

In [4]:
kleio_files = tlnb.get_kleio_files()
# kleio_files.info()
kleio_files[["name","import_status","status","errors","warnings","imported","import_errors","import_warnings"]]

Unnamed: 0,name,import_status,status,errors,warnings,imported,import_errors,import_warnings
0,dehergne-0-abrev.cli,I,V,0,0,2025-05-10 03:50:53.702537,0,0
1,dehergne-a.cli,I,V,0,0,2025-05-10 03:51:01.352077,0,0
2,dehergne-b.cli,I,V,0,0,2025-05-10 03:51:10.480262,0,0
3,dehergne-c.cli,I,V,0,0,2025-05-10 03:51:23.126101,0,0
4,dehergne-d.cli,I,V,0,0,2025-05-10 03:51:28.685167,0,0
5,dehergne-e.cli,I,V,0,0,2025-05-10 03:51:30.219256,0,0
6,dehergne-f.cli,I,V,0,0,2025-05-10 03:51:38.361830,0,0
7,dehergne-g.cli,I,V,0,0,2025-05-10 03:51:46.453827,0,0
8,dehergne-h.cli,I,V,0,0,2025-05-10 03:51:49.364195,0,0
9,dehergne-i.cli,I,V,0,0,2025-05-10 03:51:51.033646,0,0


In [5]:
%pip install pyuca openpyxl

.bash_profile RUN!

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [6]:
import pyuca  # to sort accented characters properly

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

nacionais = entities_with_attribute(
    entity_type='person',
    show_elements=['id','name','groupname'],
    the_type='nacionalidade',
    more_attributes=['nascimento'],
    db=tlnb.db,
)
# filter groupname = 'n' (avoid "referido", "pai", "mãe")
nacionais = nacionais[nacionais.groupname=='n']
nacionais.info()

<class 'pandas.core.frame.DataFrame'>
Index: 943 entries, deh-abraham-le-royer to joao-cardoso
Data columns (total 18 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   id_1                      943 non-null    object
 1   name                      943 non-null    object
 2   groupname                 943 non-null    object
 3   nacionalidade.attr_id     943 non-null    object
 4   nacionalidade.type        943 non-null    object
 5   nacionalidade             943 non-null    object
 6   nacionalidade.date        943 non-null    object
 7   nacionalidade.line        943 non-null    int64 
 8   nacionalidade.level       943 non-null    int64 
 9   nacionalidade.obs         943 non-null    object
 10  nacionalidade.extra_info  943 non-null    object
 11  nacionalidade.original    1 non-null      object
 12  nacionalidade.comment     34 non-null     object
 13  nascimento                855 non-null    object
 14  nas

### Export all with attribute "nacionalidade"

In [None]:
nacionais.to_excel("../inferences/paises_pessoas_n.xlsx", sheet_name='Sheet1', index=False)

### Group by nacionalidade

In [7]:
paises_totais=nacionais.groupby('nacionalidade')['id_1'].nunique().reset_index().sort_values('id_1',ascending=False)
paises_totais

Unnamed: 0,nacionalidade,id_1
19,Portugal,346
10,França,168
4,China,146
14,Itália,113
0,Alemanha,35
7,Espanha,34
3,Bélgica,22
24,Áustria,14
9,Flandres,11
2,Boémia,10


In [None]:
paises_totais.to_excel("../inferences/paises_totais_n.xlsx", sheet_name='Sheet1', index=False)

### Countries and evolution of departures

In [8]:
import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

embarques = entities_with_attribute(
    entity_type='person',
    show_elements=['id','name','groupname'],
    the_type='nacionalidade',
    more_attributes=['wicky'],
    db=tlnb.db,
)
# filter groupname = 'n' (avoid "referido", "pai", "mãe")
embarques = embarques[embarques.groupname=='n']
embarques['wicky.date'] = embarques['wicky.date'].fillna('0000')
embarques.info()
cols =['name','nacionalidade','wicky.date']
# replace NaN in wicky.date with 0000

embarques[cols].sort_values('wicky.date')

<class 'pandas.core.frame.DataFrame'>
Index: 973 entries, deh-abraham-le-royer to joao-cardoso
Data columns (total 19 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   id_1                      973 non-null    object
 1   name                      973 non-null    object
 2   groupname                 973 non-null    object
 3   nacionalidade.attr_id     973 non-null    object
 4   nacionalidade.type        973 non-null    object
 5   nacionalidade             973 non-null    object
 6   nacionalidade.date        973 non-null    object
 7   nacionalidade.line        973 non-null    int64 
 8   nacionalidade.level       973 non-null    int64 
 9   nacionalidade.obs         973 non-null    object
 10  nacionalidade.extra_info  973 non-null    object
 11  nacionalidade.original    1 non-null      object
 12  nacionalidade.comment     35 non-null     object
 13  wicky                     515 non-null    object
 14  wic

Unnamed: 0_level_0,name,nacionalidade,wicky.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
deh-abraham-le-royer,Abraham Le Royer,França,0000
deh-joao-da-fonseca-i,João da Fonseca,Portugal,0000
deh-joao-da-silva,João da Silva,Portugal,0000
deh-joao-de-borgia-kouo,João de Borgia Kouo,Portugal,0000
deh-joao-de-sa,João de Sá,Portugal,0000
...,...,...,...
deh-francisco-pinto-ii,Francisco Pinto,Portugal,>1755:<1758
deh-faustino-soares,Faustino Soares,Portugal,>1755:<1758
deh-sebastiao-correa,Sebastião Correa,Portugal,>1755:<1758
deh-aleixo-rodrigues,Aleixo Rodrigues,Portugal,>1755:<1758


#### Group by country decade

In [10]:
import pandas as pd
from timelink.kleio.utilities import convert_timelink_date

# extract the year from wicky.date
embarques['data_embarque'] = embarques['wicky.date'].apply(convert_timelink_date)
embarques['data_embarque'] = embarques['data_embarque'].fillna('0000-01-01')
embarques['ano_embarque'] = pd.to_datetime(embarques['data_embarque'], format='%Y-%m-%d', errors='coerce').dt.year
# count ano_entrada per periods of 10 years starting from 1540
embarques['periodo_embarque'] = embarques['ano_embarque'] // 10 * 10
# show columns jesuita-entrada.date, ano_entrada, periodo
embarques[['wicky.date','ano_embarque','periodo_embarque']].sort_values('wicky.date').tail(10)
# count number of embarques per period
embarques.groupby(['nacionalidade','periodo_embarque',])['id_1'].count().reset_index().sort_values(['nacionalidade','periodo_embarque',])


Unnamed: 0,nacionalidade,periodo_embarque,id_1
0,Alemanha,1680.0,2
1,Alemanha,1690.0,6
2,Alemanha,1700.0,3
3,Alemanha,1710.0,4
4,Alemanha,1730.0,5
5,Alemanha,1740.0,1
6,Alemanha,1750.0,1
7,Alsácia,1700.0,1
8,Boémia,1700.0,1
9,Boémia,1710.0,1
