# Dehergne repertoire analysis

> First time use: follow instructions in the [README.md](README.md) file in this directory.

## Check version of timelink

If the cell bellow updates timelink a Restart of the notebook is needed (click Restart in the bar on top of this window)

In [2]:
%pip install --upgrade timelink

.bash_profile RUN!
Note: you may need to restart the kernel to use updated packages.


## Extensions for this notebook 
(to later migrate to timelink-py)

In [3]:

from datetime import datetime
import pandas as pd

def convert_timelink_date(tl_date: str, format='%Y%m%d') -> datetime:
    """Convert a Timelink date in the format YYYYMMDD to a Python datetime"""
    # return None if tl_date is None
    if tl_date is None:
        return None
    # if tl_date is not a string, return None
    if not isinstance(tl_date, str):
        return None
    # remove dashes
    tl_date_clean = tl_date.replace('-','')
    # pad tl_date with zeros up to length 8
    tl_date_clean = tl_date_clean.ljust(8,'0')
    year, month, day = tl_date_clean[:4], tl_date_clean[4:6], tl_date_clean[6:]
    # handle the case where month is zero by setting the month to 07 and day to 02 (middle day of year)
    if year == '0000':
        return None
    elif month == '00':
        month = '07'
        day = '02'
    elif day == '00':
        day = '15'

    new_date = year + month + day
    try:
        result = datetime.strptime(new_date, format)
    except ValueError as BadDate:
        print(tl_date, tl_date_clean)
        print(BadDate)
        result=None
    return result

def calc_age_at(date_birth, today):
    """Compute the number of years between two dates"""
    # return None if either argument is None
    if date_birth is None or today is None:
        return None
    # Ensure the dates are datetime objects
    if not isinstance(date_birth, datetime):
        date_birth = convert_timelink_date(date_birth)
    if not isinstance(today, datetime):
        today = convert_timelink_date(today)

    if date_birth is None or today is None:
        return None

    # Compute the difference in years
    difference_in_years = (today - date_birth).days / 365.25
    return int(difference_in_years)

def format_timelink_date(tl_datet)->str:
    """ Format a timelink date with format YYYYMMDD and variantes to nice string"""
    if tl_datet is None:
        return ''
    py_date = convert_timelink_date(tl_datet)
    if py_date is None:
        return ''
    # return date in format YYYY-MM-DD
    return py_date.strftime('%Y-%m-%d')

print("testing")
print(format_timelink_date('00000000'))
print(format_timelink_date('1582'))
print(format_timelink_date('1582-03-02'))
print(format_timelink_date('15820302'))
print(calc_age_at('1980-01-01', '2020-01-01'))
print(calc_age_at('1980-01-01', 0))

testing

1582-07-02
1582-03-02
1582-03-02
40
None


## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [4]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook(db_type='sqlite')
tlnb.print_info(show_token=True)

Timelink version: 1.1.10
Project name: dehergne-locations
Project home: /Users/jrc/mhk-home/sources/dehergne-locations
Database type: sqlite
Database name: dehergne_locations
Kleio image: timelinkserver/kleio-server
Kleio server token: HK94oHNA5Wnl40D73IHqJOvB3M8B20qx
Kleio server URL: http://127.0.0.1:8088
Kleio server home: /Users/jrc/mhk-home/sources/dehergne-locations
Kleio server container: tender_ritchie
Kleio version requested: latest
Kleio server version: 12.4.567 (2024-02-07 13:02:03)
SQLite directory: /Users/jrc/mhk-home/sources/dehergne-locations/database/sqlite
Call print_info(show_password=True) to show the Postgres password
TimelinkNotebook(project_name=dehergne-locations, project_home=/Users/jrc/mhk-home/sources/dehergne-locations, db_type=sqlite, db_name=dehergne_locations, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [7]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,29
1,aregisters,1
2,attributes,25533
3,class_attributes,70
4,classes,14
5,entities,31691
6,geoentities,359
7,goods,0
8,kleiofiles,60
9,objects,0


## Check the status of the files

Check the import status of the translated files

* I: Imported
* E: Imported with error
* W: Imported with warnings no errors
* N: Not imported
* U: Translation updated need to reimport

In [8]:
kleio_files = tlnb.get_kleio_files()
# kleio_files.info()
kleio_files[["name","import_status","status","errors","warnings","imported","import_errors","import_warnings"]]

Unnamed: 0,name,import_status,status,errors,warnings,imported,import_errors,import_warnings
0,dehergne-0-abrev.cli,I,V,0,0,2024-05-16 06:30:28.086132,0,0
1,dehergne-a.cli,U,V,0,0,0,0,0
2,dehergne-b.cli,U,V,0,0,0,0,0
3,dehergne-c.cli,U,V,0,0,0,0,0
4,dehergne-d.cli,I,V,0,0,2024-05-16 06:32:07.477823,0,0
5,dehergne-e.cli,I,V,0,0,2024-05-16 06:32:10.009990,0,0
6,dehergne-f.cli,U,V,0,0,0,0,0
7,dehergne-g.cli,U,V,0,0,0,0,0
8,dehergne-h.cli,U,V,0,0,0,0,0
9,dehergne-i.cli,I,V,0,0,2024-04-24 07:13:34.285669,0,0


List files with import errors

In [9]:
eids = kleio_files[kleio_files["import_status"]=='E'].index
print(eids)
rpt = tlnb.get_import_rpt(kleio_files,rows=list(eids))
print(rpt)

Index([], dtype='int64')




## Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

When importing to an empty database some errors can be generated
if some files contain "xsame_as/xmesmo_que" links to other files
not yet imported.

This is solved by reimporting the files in error after the first import, so that the cross references are solved. Ensure that
that `update_from_sources` is called with ``with_import_errors=True``.

It may be necessary a third import run
to solve all cross references.

In [10]:
import logging
logging.basicConfig(level=logging.INFO)

tlnb.update_from_sources(with_import_errors=True)

## Nacionality / Nacionalidades

In [52]:
import pandas as pd
from timelink.pandas import attribute_values
# set max rows to 500
pd.set_option('display.max_rows', 500)

df_totals = attribute_values('nacionalidade',db=tlnb.db)
df_totals

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Portugal,431,0,0
França,184,0,0
China,177,0,0
Itália,125,0,19730000
Espanha,43,0,0
Alemanha,36,0,0
Bélgica,23,0,0
Áustria,15,0,0
Flandres,12,0,0
Boémia,11,0,0


### List of all with attribute "nacionalidade"

In [53]:
# filter groupname = 'n' (avoid "referido", "pai", "mãe")
nacionais = nacionais[nacionais.groupname=='n']
nacionais.info()

<class 'pandas.core.frame.DataFrame'>
Index: 942 entries, deh-barnabe-de-oliveira to deh-alessandro-valignano
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   id_1                 942 non-null    object
 1   name                 942 non-null    object
 2   groupname            942 non-null    object
 3   nacionalidade.type   942 non-null    object
 4   nacionalidade        942 non-null    object
 5   nacionalidade.date   942 non-null    object
 6   nacionalidade.line   942 non-null    int64 
 7   nacionalidade.level  942 non-null    int64 
 8   nacionalidade.obs    942 non-null    object
 9   nascimento           854 non-null    object
 10  nascimento.date      854 non-null    object
 11  nascimento.obs       854 non-null    object
dtypes: int64(2), object(10)
memory usage: 95.7+ KB


### Export all with attribute "nacionalidade"

In [54]:
nacionais.to_excel("paises_pessoas_n.xlsx", sheet_name='Sheet1', index=False)

### Group by nacionalidade

In [57]:
paises_totais=nacionais.groupby('nacionalidade')['id_1'].nunique().reset_index().sort_values('id_1',ascending=False)
paises_totais

Unnamed: 0,nacionalidade,id_1
19,Portugal,346
10,França,168
4,China,146
14,Itália,113
0,Alemanha,35
7,Espanha,34
3,Bélgica,22
24,Áustria,14
9,Flandres,11
2,Boémia,10


In [58]:
paises_totais.to_excel("paises_totais_n.xlsx", sheet_name='Sheet1', index=False)

### Países e locais de entrada

In [59]:

import pyuca  # to sort accented characters properly

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

nacionalidade_entrada = entities_with_attribute(
    entity_type='person',
    show_elements=['id','name','groupname'],
    the_type='nacionalidade',
    more_attributes=['nascimento', 'jesuita-entrada','jesuita-entrada-x'],
    db=tlnb.db,
)
# filter groupname = 'n' (avoid "referido", "pai", "mãe")
nacionalidade_entrada = nacionalidade_entrada[nacionalidade_entrada.groupname=='n']
nacionalidade_entrada['jesuita-entrada'] = nacionalidade_entrada['jesuita-entrada'].fillna('?')
nacionalidade_entrada.info()

<class 'pandas.core.frame.DataFrame'>
Index: 951 entries, aloys-moriz to joao-cardoso
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id_1                  951 non-null    object
 1   name                  951 non-null    object
 2   groupname             951 non-null    object
 3   nacionalidade.type    951 non-null    object
 4   nacionalidade         951 non-null    object
 5   nacionalidade.date    951 non-null    object
 6   nacionalidade.line    951 non-null    int64 
 7   nacionalidade.level   951 non-null    int64 
 8   nacionalidade.obs     951 non-null    object
 9   nascimento            863 non-null    object
 10  nascimento.date       863 non-null    object
 11  nascimento.obs        863 non-null    object
 12  jesuita-entrada       951 non-null    object
 13  jesuita-entrada.date  849 non-null    object
 14  jesuita-entrada.obs   849 non-null    object
 15  jesuita-entrada-x     0 no

In [22]:
nacionalidade_entrada.groupby(['nacionalidade','jesuita-entrada'])['id_1'].nunique()
df = nacionalidade_entrada.groupby(['nacionalidade','jesuita-entrada'])['id_1'].nunique().reset_index()
df_subtotal = df.groupby('nacionalidade')['id_1'].sum().reset_index()
df_subtotal.columns = ['nacionalidade', 'subtotal']
df = pd.merge(df, df_subtotal, on='nacionalidade')
df['perc_nacionalidade'] = df['id_1']/df['subtotal']
df.sort_values(['subtotal','nacionalidade','id_1'], ascending=[False,True,False])


Unnamed: 0,nacionalidade,jesuita-entrada,id_1,subtotal,perc_nacionalidade
101,Portugal,?,205,349,0.587393
102,Portugal,Coimbra,55,349,0.157593
105,Portugal,Lisboa,31,349,0.088825
112,Portugal,Évora,22,349,0.063037
103,Portugal,Goa,20,349,0.057307
106,Portugal,"Lisboa, Arroios",6,349,0.017192
107,Portugal,Macau,3,349,0.008596
104,Portugal,Japão,2,349,0.005731
108,Portugal,Ormuz,1,349,0.002865
109,Portugal,Paris,1,349,0.002865


In [60]:
df = nacionalidade_entrada.groupby(['nacionalidade','jesuita-entrada'])['id_1'].nunique().reset_index()
df_subtotal = df.groupby('nacionalidade')['id_1'].sum().reset_index()
df_subtotal.columns = ['nacionalidade', 'subtotal']
df = pd.merge(df, df_subtotal, on='nacionalidade')
df['perc_nacionalidade'] = df['id_1']/df['subtotal']
df

Unnamed: 0,nacionalidade,jesuita-entrada,id_1,subtotal,perc_nacionalidade
0,Alemanha,?,15,35,0.428571
1,Alemanha,Brno,2,35,0.057143
2,Alemanha,Landsberg,7,35,0.2
3,Alemanha,Mainz,1,35,0.028571
4,Alemanha,Polotsk,1,35,0.028571
5,Alemanha,Portugal,1,35,0.028571
6,Alemanha,Roma,3,35,0.085714
7,Alemanha,Trier,3,35,0.085714
8,Alemanha,Viena,2,35,0.057143
9,Alsácia,Landsberg,1,2,0.5


## Jesuit Entry (jesuita-entrada)

The attribute `jesuita-entrada` records the jesuit college/noviciate
where the future missionary enrolled in the Society of Jesus.

Note that a few entries have more than one value for `jesuita-entrada`
because the person dropped out after the first enrollement and later enrolled again.

### jesuita-entrada (totals)

In [61]:
import pandas as pd
from timelink.pandas import attribute_values
# set max rows to 500
pd.set_option('display.max_rows', 500)

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)
df_totals.info()
df_totals['data_inicial'] = df_totals['date_min'].apply(format_timelink_date)
df_totals['data_final'] = df_totals['date_max'].apply(format_timelink_date)
# Extract the first 4 characters of data_inicial to get the year, store in column Ano_inicial
df_totals['Ano_inicial'] = df_totals['data_inicial'].str[:4]
# Extract the first 4 characters of data_final to get the year, store in column Ano_final
df_totals['Ano_final'] = df_totals['data_final'].str[:4]
df_totals

<class 'pandas.core.frame.DataFrame'>
Index: 67 entries, ? to Índia
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   count     67 non-null     int64 
 1   date_min  67 non-null     object
 2   date_max  67 non-null     object
dtypes: int64(1), object(2)
memory usage: 2.1+ KB


Unnamed: 0_level_0,count,date_min,date_max,data_inicial,data_final,Ano_inicial,Ano_final
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
?,489,15440000,17880828,1544-07-02,1788-08-28,1544.0,1788.0
Coimbra,57,15420000,17420427,1542-07-02,1742-04-27,1542.0,1742.0
Lisboa,45,15460000,17530612,1546-07-02,1753-06-12,1546.0,1753.0
Paris,44,15340815,17590310,1534-08-15,1759-03-10,1534.0,1759.0
Roma,41,15400927,17560709,1540-09-27,1756-07-09,1540.0,1756.0
Goa,33,15480000,17360000,1548-07-02,1736-07-02,1548.0,1736.0
Évora,26,15660714,17460405,1566-07-14,1746-04-05,1566.0,1746.0
Nancy,15,16270929,17510827,1627-09-29,1751-08-27,1627.0,1751.0
Macau,14,16280000,17490201,1628-07-02,1749-02-01,1628.0,1749.0
Avignon,13,16150926,17370927,1615-09-26,1737-09-27,1615.0,1737.0


#### Export to Excel

In [62]:
df_totals.to_excel("jesuita-entrada_totals.xlsx", sheet_name='Sheet_name_1')

### All know places of entry (todos os lugares de entrada conhecidos)

In [63]:
import pyuca  # to sort accented characters properly

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

jesuita_entrada = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname'],
    the_type='jesuita-entrada',
    more_attributes=['nacionalidade','nascimento'],
    db=tlnb.db,
)
jesuita_entrada.info()

coll = pyuca.Collator()

# we use the Collator class to sort the names and countries properly
# since Python does not sort accented characters properly
# we create two new columns with the sort keys

# first we fill the NaN values with empty strings
jesuita_entrada['nacionalidade'] = jesuita_entrada['nacionalidade'].fillna('')
jesuita_entrada['name'] = jesuita_entrada['name'].fillna('?')

# now we add two columns with sort keys
jesuita_entrada['nacionalidade_sort'] = jesuita_entrada['nacionalidade'].apply(coll.sort_key)
jesuita_entrada['name_sort'] = jesuita_entrada['name'].apply(coll.sort_key)

# save those with jesuita-entrada not equal to '?'
entrada_known = jesuita_entrada['jesuita-entrada'] != '?'
only_n = jesuita_entrada.fillna('').groupname == 'n'
cols=['jesuita-entrada','jesuita-entrada.date','name','nacionalidade','nascimento','jesuita-entrada.obs']
jesuita_entrada[(only_n) & (entrada_known)].sort_values(by=['jesuita-entrada','jesuita-entrada.date','name_sort','nascimento'])[cols].head(70)



<class 'pandas.core.frame.DataFrame'>
Index: 968 entries, aloys-moriz to joao-cardoso
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   name                   968 non-null    object
 1   groupname              968 non-null    object
 2   jesuita-entrada.type   968 non-null    object
 3   jesuita-entrada        968 non-null    object
 4   jesuita-entrada.date   968 non-null    object
 5   jesuita-entrada.line   968 non-null    int64 
 6   jesuita-entrada.level  968 non-null    int64 
 7   jesuita-entrada.obs    968 non-null    object
 8   nacionalidade          924 non-null    object
 9   nacionalidade.date     924 non-null    object
 10  nacionalidade.obs      924 non-null    object
 11  nascimento             951 non-null    object
 12  nascimento.date        951 non-null    object
 13  nascimento.obs         951 non-null    object
dtypes: int64(2), object(12)
memory usage: 113.4+ KB


Unnamed: 0_level_0,jesuita-entrada,jesuita-entrada.date,name,nacionalidade,nascimento,jesuita-entrada.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
deh-pedro-gomez,Alcalá,15550925,Pedro Gómez,Espanha,"Antequera, diocese de Málaga, Andaluzia","extra_info: {""date"": {""comment"": ""j\u00e1 era ..."
deh-hernando-de-alcaraz,Alcalá,15580413,Hernando de Alcaraz,Espanha,"Navalafuente, diocese de Toledo","extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-alonso-sanchez,Alcalá,15650702,Alonso Sanchéz,Espanha,"Mondéjar, Guadalajara","extra_info: {""date"": {""comment"": ""? ou 1563?""}..."
deh-melchior-mora,Alcalá,15700000,Melchior Mora,Espanha,"Caravaca de la Cruz, diocese de Cartagena","extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-pedro-ramon,Alcalá,15710000,Pedro Ramón,Espanha,Saragoça,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-nicolas-gallardo,Alcalá,15710624,Nicolás Gallardo,Espanha,Valladolid,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-giovanni-antonio-rubino,Arona,15960921,Giovanni Antonio Rubino,Itália,"Strambino, diocese de Ivrea, Piemonte","extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-giacomo-rho,Arona,16140824,Giacomo Rho,Itália,Pavia,"extra_info: {""date"": {""comment"": ""ou 16140507""..."
deh-etienne-faber,Avignon,16150926,Etienne Faver,França,"Mourières, perto d'Avignon","extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-aime-chezaud,Avignon,16270922,Aimé Chézaud,França,Lyon,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."


#### Export to Excel

In [64]:
jesuita_entrada[cols].to_excel("jesuitas-entrada-full.xlsx", sheet_name='Sheet_name_1')

### Place of entry "unkown" with country and place of birth

In [32]:
import pyuca  # to sort accented characters properly

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

jesuita_entrada = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname'],
    the_type='jesuita-entrada',
    the_value='?',
    more_attributes=['nacionalidade','nascimento'],
    db=tlnb.db,
)
jesuita_entrada.info()

coll = pyuca.Collator()

# we use the Collator class to sort the names and countries properly
# since Python does not sort accented characters properly
# we create two new columns with the sort keys

# first we fill the NaN values with empty strings
jesuita_entrada['nacionalidade'] = jesuita_entrada['nacionalidade'].fillna('?')
jesuita_entrada['name'] = jesuita_entrada['name'].fillna('?')
jesuita_entrada['nascimento'] = jesuita_entrada['nascimento'].fillna('?')

# now we add two columns with sort keys
jesuita_entrada['nacionalidade_sort'] = jesuita_entrada['nacionalidade'].apply(coll.sort_key)
jesuita_entrada['name_sort'] = jesuita_entrada['name'].apply(coll.sort_key)

cols=['name','nacionalidade','nascimento','jesuita-entrada.date','jesuita-entrada','jesuita-entrada.obs']
jesuita_entrada[jesuita_entrada.groupname=='n'].sort_values(by=['nacionalidade_sort','name_sort','nascimento'])[cols].head(50)



<class 'pandas.core.frame.DataFrame'>
Index: 493 entries, deh-adam-algenler to joao-cardoso
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   name                   493 non-null    object
 1   groupname              493 non-null    object
 2   jesuita-entrada.type   493 non-null    object
 3   jesuita-entrada        493 non-null    object
 4   jesuita-entrada.date   493 non-null    object
 5   jesuita-entrada.line   493 non-null    int64 
 6   jesuita-entrada.level  493 non-null    int64 
 7   jesuita-entrada.obs    493 non-null    object
 8   nacionalidade          463 non-null    object
 9   nacionalidade.date     463 non-null    object
 10  nacionalidade.obs      463 non-null    object
 11  nascimento             485 non-null    object
 12  nascimento.date        485 non-null    object
 13  nascimento.obs         485 non-null    object
dtypes: int64(2), object(12)
memory usage: 57.8+ KB


Unnamed: 0_level_0,name,nacionalidade,nascimento,jesuita-entrada.date,jesuita-entrada,jesuita-entrada.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
deh-bartolome-de-roboredo,Bartolomé de Roboredo,?,?,16460000,?,"extra_info: {""date"": {""comment"": ""volta a ser ..."
deh-herman-scolt,Herman Scolt,?,?,17281228,?,
deh-jacques-barthe,Jacques Barthe,?,?,16780000,?,
deh-adam-weidenfied,Adam Weidenfied,Alemanha,Colónia,16630412,?,
deh-anton-gogeisl,Anton Gogeisl,Alemanha,"Siegenburg, Baviera, diocese de Regensburg",17200913,?,
deh-bernhard-diestel,Bernhard Diestel,Alemanha,"Wippach, Carniole",16391022,?,
deh-christian-henricus,Christian Henricus,Alemanha,?,16420000,?,
deh-christof-brack,Christof Brack,Alemanha,"Diocese de Constance, Rottenburg am Neckar",16720716,?,
deh-franz-moser,Franz Moser,Alemanha,"Mauerkirchen, dicese de Passau",17321003,?,
deh-franz-xaver-mittermayr,Franz Xaver Mittermayr,Alemanha,Neuötting,17020928,?,


#### Export to Excel

In [65]:
jesuita_entrada[cols].to_excel("jesuitas-entrada-desc.xlsx", sheet_name='Sheet_name_1')

#### Export to text file (Portugal)

In [66]:
# Get id of interest
ids = jesuita_entrada[jesuita_entrada.nacionalidade=='Portugal'].index.unique()

tlnb.db.export_as_kleio(ids,'jesuitas-entrada-desc-portugal.txt')

##  Missionaries by specific place of entry in the Jesuit order /  Obter os missionários por lugar de entrada na ordem


### Example/Exemplo: Coimbra

Showing palce of birth, place of stay ("estadia"), place of death.

In [67]:
place_of_entry = "Coimbra"

In [68]:
import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 1550)

jesuita_entrada_coimbra_estadia = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname'],
    the_type='jesuita-entrada',
    the_value=place_of_entry,
    more_attributes=['nascimento','embarque'],
    db=tlnb.db,
)
jesuita_entrada_coimbra_estadia.info()

# user only group "n"
jesuita_entrada_coimbra_estadia = jesuita_entrada_coimbra_estadia[jesuita_entrada_coimbra_estadia.groupname=='n']

# add age at 'jesuita-entrada.date'
jesuita_entrada_coimbra_estadia['age_at_entrada'] = jesuita_entrada_coimbra_estadia.apply(lambda x: calc_age_at(x['nascimento.date'], x['jesuita-entrada.date']), axis=1)
# add age at 'embarque.date'
jesuita_entrada_coimbra_estadia['age_at_embarque'] = jesuita_entrada_coimbra_estadia.apply(lambda x: calc_age_at(x['nascimento.date'], x['embarque.date']), axis=1)

jesuita_entrada_coimbra_estadia['tempo_em_coimbra'] = jesuita_entrada_coimbra_estadia['age_at_embarque'] - jesuita_entrada_coimbra_estadia['age_at_entrada']

cols=['name','nascimento.date','jesuita-entrada.date','jesuita-entrada','age_at_entrada','tempo_em_coimbra','embarque','embarque.date','age_at_embarque']
jesuita_entrada_coimbra_estadia[cols].sort_values(by=['jesuita-entrada.date','embarque.date'])


<class 'pandas.core.frame.DataFrame'>
Index: 62 entries, deh-adriano-pestana to deh-tome-pereira
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   name                   62 non-null     object
 1   groupname              62 non-null     object
 2   jesuita-entrada.type   62 non-null     object
 3   jesuita-entrada        62 non-null     object
 4   jesuita-entrada.date   62 non-null     object
 5   jesuita-entrada.line   62 non-null     int64 
 6   jesuita-entrada.level  62 non-null     int64 
 7   jesuita-entrada.obs    62 non-null     object
 8   nascimento             62 non-null     object
 9   nascimento.date        62 non-null     object
 10  nascimento.obs         62 non-null     object
 11  embarque               59 non-null     object
 12  embarque.date          59 non-null     object
 13  embarque.obs           59 non-null     object
dtypes: int64(2), object(12)
memory usage: 7.3+ KB


Unnamed: 0_level_0,name,nascimento.date,jesuita-entrada.date,jesuita-entrada,age_at_entrada,tempo_em_coimbra,embarque,embarque.date,age_at_embarque
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,
deh-belchior-nunes-barreto,Belchior Nunes Barreto,15190000,15430311,Coimbra,23,8.0,Esfera,15510310.0,31.0
deh-belchior-miguel-carneiro-leitao,Belchior Miguel Carneiro Leitão,15190000,15430425,Coimbra,23,12.0,S. Filipe,15550401.0,35.0
deh-francisco-perez,Francisco Pérez,15140000,15440125,Coimbra,29,2.0,Flor de la Mar,15460408.0,31.0
deh-tiburcio-de-quadros,Tibúrcio de Quadros,15280000,15440418,Coimbra,15,11.0,Sta. Marta,15550401.0,26.0
deh-goncalo-alvares,Gonçalo Álvares,15270000,15490101,Coimbra,21,19.0,Chagas,15680407.0,40.0
deh-cristovao-da-costa,Cristóvão da Costa,15290000,15500103,Coimbra,20,1.0,?,15510310.0,21.0
deh-joao-rodrigues-de-sainan,João Rodrigues de Sainan,15310000,15550000,Coimbra,24,0.0,Flor de la Mar,15560330.0,24.0
deh-pedro-martins,Pedro Martins,15420000,15560525,Coimbra,13,29.0,Santiago,15850413.0,42.0
deh-pedro-martins,Pedro Martins,15420000,15560525,Coimbra,13,29.0,Bom Jesus,15850413.0,42.0


In [69]:
conimbricenses = jesuita_entrada_coimbra_estadia.index.unique()

#### Export to Excel

In [70]:
jesuita_entrada_coimbra_estadia[cols].to_excel("jesuita-entrada-coimbra.xlsx", sheet_name='jesuita_entrada_coimbra', index=True)

#### Export records to text file

In [71]:
tlnb.db.export_as_kleio(jesuita_entrada_coimbra_estadia.index.unique(), "jesuita-entrada-coimbra.txt")

### Entry in Coimbra, places of stay, no wikidata on location / Locais não idenficados dos missionários de Coimbra

#### Show "estadia" not yet identified in wikidata

Place names with no wikidata link.

In [45]:
import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 1550)

jesuita_entrada_coimbra_estadia = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname'],
    the_type='jesuita-entrada',
    the_value=place_of_entry,
    more_attributes=['nascimento','embarque','estadia'],
    db=tlnb.db,
)
jesuita_entrada_coimbra_estadia.info()

# user only group "n"
jesuita_entrada_coimbra_estadia = jesuita_entrada_coimbra_estadia[jesuita_entrada_coimbra_estadia.groupname=='n']

# add age at 'jesuita-entrada.date'
jesuita_entrada_coimbra_estadia['age_at_entrada'] = jesuita_entrada_coimbra_estadia.apply(lambda x: calc_age_at(x['nascimento.date'], x['jesuita-entrada.date']), axis=1)
# add age at 'embarque.date'
jesuita_entrada_coimbra_estadia['age_at_embarque'] = jesuita_entrada_coimbra_estadia.apply(lambda x: calc_age_at(x['nascimento.date'], x['embarque.date']), axis=1)

jesuita_entrada_coimbra_estadia['tempo_em_coimbra'] = jesuita_entrada_coimbra_estadia['age_at_embarque'] - jesuita_entrada_coimbra_estadia['age_at_entrada']

cols=['name','nascimento.date','jesuita-entrada.date','jesuita-entrada','age_at_entrada','tempo_em_coimbra','embarque','embarque.date','age_at_embarque']
jesuita_entrada_coimbra_estadia[cols].sort_values(by=['jesuita-entrada.date','embarque.date']).head()


<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, deh-adriano-pestana to deh-tome-pereira
Data columns (total 17 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   name                   290 non-null    object
 1   groupname              290 non-null    object
 2   jesuita-entrada.type   290 non-null    object
 3   jesuita-entrada        290 non-null    object
 4   jesuita-entrada.date   290 non-null    object
 5   jesuita-entrada.line   290 non-null    int64 
 6   jesuita-entrada.level  290 non-null    int64 
 7   jesuita-entrada.obs    290 non-null    object
 8   nascimento             290 non-null    object
 9   nascimento.date        290 non-null    object
 10  nascimento.obs         290 non-null    object
 11  embarque               283 non-null    object
 12  embarque.date          283 non-null    object
 13  embarque.obs           283 non-null    object
 14  estadia                284 non-null    object
 1

Unnamed: 0_level_0,name,nascimento.date,jesuita-entrada.date,jesuita-entrada,age_at_entrada,tempo_em_coimbra,embarque,embarque.date,age_at_embarque
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,
deh-pedro-de-alcacova,Pedro de Alcáçova,15230000,15420000,Coimbra,19,,,,


In [48]:
cols = cols + ['estadia.date','estadia.obs']
jesuita_entrada_coimbra_estadia[ ~(jesuita_entrada_coimbra_estadia['estadia.obs'].str.contains('wikidata', na=False))][cols].sort_values(by=['name','jesuita-entrada.date','estadia.date'])

Unnamed: 0_level_0,name,nascimento.date,jesuita-entrada.date,jesuita-entrada,age_at_entrada,tempo_em_coimbra,embarque,embarque.date,age_at_embarque,estadia.date,estadia.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
deh-afonso-aires,Afonso Aires,16300000,16490000,Coimbra,19,7.0,Bom Jesus da Vidigueira,16570404.0,26.0,,
deh-andre-ferrao,André Ferrão,16250000,16400000,Coimbra,15,16.0,Bom Jesus da Vidigueira,16570404.0,31.0,16590000.0,"extra_info: {""value"": {""comment"": ""Huai'an, Ja..."
deh-andre-palmeiro,André Palmeiro,15690000,15840114,Coimbra,14,33.0,Nossa Senhora da Guia,16170421.0,47.0,16200000.0,"extra_info: {""value"": {""comment"": ""costa este ..."
deh-andre-palmeiro,André Palmeiro,15690000,15840114,Coimbra,14,33.0,Nossa Senhora da Guia,16170421.0,47.0,16210300.0,"extra_info: {""value"": {""comment"": ""costa este ..."
deh-andre-palmeiro,André Palmeiro,15690000,15840114,Coimbra,14,33.0,Nossa Senhora da Guia,16170421.0,47.0,16270000.0,
deh-antonio-gomes,António Gomes,17060000,17250528,Coimbra,18,2.0,?,17270000.0,20.0,17460000.0,
deh-antonio-lopes-junior,"António Lopes, júnior",16690500,16841225,Coimbra,15,11.0,[Para a China],16950000.0,26.0,,
deh-antonio-de-gouvea,António de Gouvea,15920000,16080502,Coimbra,15,15.0,[Para a Índia],16230324.0,30.0,0.0,
deh-antonio-de-gouvea,António de Gouvea,15920000,16080502,Coimbra,15,15.0,[Para a Índia],16230324.0,30.0,0.0,
deh-antonio-de-gouvea,António de Gouvea,15920000,16080502,Coimbra,15,15.0,[Para a Índia],16230324.0,30.0,16480000.0,"extra_info: {""value"": {""comment"": ""Fuzhou, Fuj..."


#### Export to excel


In [None]:
jesuita_entrada_coimbra_estadia[ ~(jesuita_entrada_coimbra_estadia['estadia.obs'].str.contains('wikidata', na=False))][cols].to_excel("jesuita-entrada-coimbra_estadia_no_wikidata.xlsx", sheet_name='no_estadia@', index=True)

### Joint biography of entry in Coimbra

In [51]:
from timelink.pandas import display_group_attributes, group_attributes

ids = jesuita_entrada_coimbra_estadia.index

no_show=['jesuita-entrada.level','jesuita-entrada.line','dehergne','estadia@wikidata','jesuita-estatuto',
          'nome','nacionalidade','nascimento@wikidata','nome-chines','titulo','jesuita-entrada@wikidata',
          'jesuita-votos-local@wikidata',
          'wicky','wicky-viagem']

show=['jesuita-entrada','embarque','partida','chegada','estadia','cargo','jesuita-cargo','jesuita-tarefa','morte']
display_group_attributes(ids,
                    entity_type='person',
                    header_elements=['name'],
                    sort_header='jesuita-entrada.date',
                    header_attributes=['jesuita-entrada'],
                    include_attributes=show,
                    exclude_attributes=no_show,
                    sort_attributes=['the_date','the_type','the_value'],
                    db=tlnb.db,
                    category='id',
                    cmap_name='Pastel2'
                    )

Unnamed: 0,name,jesuita-entrada.type,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.line,jesuita-entrada.level,jesuita-entrada.obs,id
0,Pedro de Alcáçova,jesuita-entrada,Coimbra,15420000,294,4,"saiu a primeira vez extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-pedro-de-alcacova
1,Belchior Nunes Barreto,jesuita-entrada,Coimbra,15430311,372,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-belchior-nunes-barreto
2,Belchior Miguel Carneiro Leitão,jesuita-entrada,Coimbra,15430425,487,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-belchior-miguel-carneiro-leitao
3,Francisco Pérez,jesuita-entrada,Coimbra,15440125,829,4,"extra_info: {""date"": {""comment"": ""j\u00e1 \u00e9 padre""}, ""value"": {""comment"": ""@wikidata:Q45412""}}",deh-francisco-perez
4,Tibúrcio de Quadros,jesuita-entrada,Coimbra,15440418,14,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-tiburcio-de-quadros
5,Gonçalo Álvares,jesuita-entrada,Coimbra,15490101,689,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-goncalo-alvares
6,Cristóvão da Costa,jesuita-entrada,Coimbra,15500103,2105,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-cristovao-da-costa
7,João Rodrigues de Sainan,jesuita-entrada,Coimbra,15550000,1076,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-joao-rodrigues-de-sainan
8,Pedro Martins,jesuita-entrada,Coimbra,15560525,747,4,"extra_info: {""value"": {""comment"": ""@wikidata:Q45412""}}",deh-pedro-martins
9,Lourenço Mexia,jesuita-entrada,Coimbra,15600314,1445,4,"extra_info: {""date"": {""comment"": ""ou 15600325 ou Coimbra, 25-03-1560 MMHM""}, ""value"": {""comment"": ""Domingues & O Neil, IV: 2645. MMHM:p.226"", ""original"": ""?""}}",deh-lourenco-mexia


Unnamed: 0,name,the_type,the_value,the_date,attr_obs,id
0,Belchior Miguel Carneiro Leitão,cargo,Patriarca da Etiópia,0,,deh-belchior-miguel-carneiro-leitao
1,Pedro Martins,cargo,Pregador do rei D. Sebastião,0,,deh-pedro-martins
2,António da Silva,cargo,Vigário de vara,0,il rejette la patente de vic. apostolique et reprend la charge de vicaire de vara,deh-antonio-da-silva
3,Estanislau Machado,cargo,Vigário de vara,0,"Dehergne diz «Vicaire de vara: au sens technique actuel: vicaire forain. En réalité, il s'agit ici d'un représentant envoyé par un évêque portugais sur le territoire d'un vicaire apostolique., en somme un délégué ayant reçu des pouvoirs étendus.»",deh-estanislau-machado
4,António de Andrade,estadia,Agra,0,"extra_info: {""value"": {""comment"": ""@wikidata:Q42941""}}",deh-antonio-de-andrade
5,Francisco Pacheco,estadia,"Arima, Japão",0,"prisioneiro «in odium fidei» e torturado extra_info: {""value"": {""comment"": ""@wikidata:Q4790673""}}",deh-francisco-pacheco
6,João da Rocha,estadia,"Changchow, Fukien",0,,deh-joao-da-rocha
7,Gonçalo Álvares,estadia,Cochim,0,"feito prisioneiro na região de Cochim extra_info: {""value"": {""comment"": ""@wikidata:Q1800""}}",deh-goncalo-alvares
8,António Gomes,estadia,Diu,0,"extra_info: {""value"": {""comment"": ""@wikidata:Q631849""}}",deh-antonio-gomes
9,Manuel de Sá,estadia,Europa,0,"extra_info: {""value"": {""comment"": ""@wikidata:Q46""}}",deh-manuel-de-sa


## Places of stay / "estadias"

All the places of stay (long)

In [71]:
import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 1550)

places_of_stay = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname','the_line','the_order'],
    the_type=['jesuita-entrada','partida','chegada','estadia','estadia-x',
              'jesuita-votos-local','jesuita-ordenacao-padre','morte'],
    column_name='place',
    more_attributes=[],
    db=tlnb.db,
)

places_of_stay.info()

# this sequence replaces missing dates with the value of the previous date
# followed by ">". This meant to use the information that although the date
# is unkown it has happened after a certain date
# we need the id in a column to group by it (it is currently in the index)
# TODO: maybe an option in entities_with_attribute and compute intervals using
#       post quem and ante quem dates. See https://github.com/time-link/timelink-kleio/issues/1
places_of_stay['id_col'] = places_of_stay.index
# create a copy of the date column to replace the '0' values with NaN
places_of_stay['place.date_inferred'] = places_of_stay['place.date'].replace('0', pd.NA)
# order by id and line to have the missing date values filled with the previous date
places_of_stay = places_of_stay.sort_values(by=['id_col', 'place.line'])
# use ffill to fill the missing values with the previous value
places_of_stay['place.date_inferred'] = places_of_stay.groupby(['id_col'])['place.date_inferred'].fillna(method='ffill')
# create a column to flag the inferred dates
places_of_stay['place.date_is_inferred'] = places_of_stay['place.date_inferred'] != places_of_stay['place.date']
# reset index
places_of_stay.reset_index(inplace=True)
# if data_is_inferred set date_inferred to date_inferrred+">"
places_of_stay.loc[places_of_stay['place.date_is_inferred'], 'place.date_inferred'] = places_of_stay['place.date_inferred'] + '>'
# set missing value to ''
places_of_stay['place.date_inferred'].fillna('', inplace=True)
# restore index
places_of_stay.index = places_of_stay['id_col']

# show results
show_only=150
cols=['groupname','name','place.type','place','place.date_inferred','place.line','place.obs']
places_of_stay[places_of_stay.groupname=='n'][cols].sort_values(by=['name','place.date_inferred']).head(show_only)



<class 'pandas.core.frame.DataFrame'>
Index: 7172 entries, deh-camillo-di-costanzo to deh-giovanni-antonio-grassi
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         7172 non-null   object
 1   groupname    7172 non-null   object
 2   the_line     7172 non-null   int64 
 3   the_order    7172 non-null   int64 
 4   place.type   7172 non-null   object
 5   place        7172 non-null   object
 6   place.date   7172 non-null   object
 7   place.line   7172 non-null   int64 
 8   place.level  7172 non-null   int64 
 9   place.obs    7172 non-null   object
dtypes: int64(4), object(6)
memory usage: 616.3+ KB


Unnamed: 0_level_0,groupname,name,place.type,place,place.date_inferred,place.line,place.obs
id_col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
deh-abraham-le-royer,n,Abraham Le Royer,jesuita-entrada,Paris,16650831,894,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,jesuita-ordenacao-padre,?,16770000,908,
deh-abraham-le-royer,n,Abraham Le Royer,jesuita-votos-local,Caen,16810202,910,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,chegada,Sião,16870000,897,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,estadia,Brest,16870301,895,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,estadia,Pondichéry,16910000,898,"extra_info: {""date"": {""comment"": ""expulso do S..."
deh-abraham-le-royer,n,Abraham Le Royer,estadia,"Chennai, Índia",16920000,899,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,estadia,Tonquim,16920000,900,
deh-abraham-le-royer,n,Abraham Le Royer,estadia,Cantão,16931200,901,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,n,Abraham Le Royer,estadia,Cantão,17010000,902,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."


### Save to Excel

In [72]:
places_of_stay[ ~(places_of_stay['place.obs'].str.contains('wikidata', na=False))][cols].sort_values(by=['place','place.date_inferred']).to_excel("places_of_stay_no_wikidata.xlsx", sheet_name='no_wikidata', index=True)

### Frequency of places / Frequência dos lugares

In [3]:
places_of_stay.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7167 entries, aloys-moriz to simao-rodrigues-ref1
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      7167 non-null   object
 1   name                    7167 non-null   object
 2   groupname               7167 non-null   object
 3   the_line                7167 non-null   int64 
 4   the_order               7167 non-null   int64 
 5   place.type              7167 non-null   object
 6   place                   7167 non-null   object
 7   place.date              7167 non-null   object
 8   place.line              7167 non-null   int64 
 9   place.level             7167 non-null   int64 
 10  place.obs               7167 non-null   object
 11  id_col                  7167 non-null   object
 12  place.date_inferred     7167 non-null   object
 13  place.date_is_inferred  7167 non-null   bool  
dtypes: bool(1), int64(4), object(9)
mem

In [4]:
# create a column with the index values which are the id numbers

col = 'place' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = places_of_stay.groupby(col).agg({'id':'nunique',
                                                'place.date_inferred':'min',
                                                'place.date':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,place.date_inferred,place.date
place,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,740,000000,18050000
Macau,450,,1773
Pequim,249,,18131213
Goa,206,,17860123
Cantão,197,15550000,17850000
China,180,,18050114
Lisboa,129,,17810300
Roma,112,,18491212
Paris,64,15270000,17780000
Nanquim,64,,17800000


### Place of stay "Coimbra"

In [None]:
place='Coimbra'

In [8]:
places_of_stay[places_of_stay.place==place][cols].sort_values(by=['place.date_inferred']).head(show_only)


NameError: name 'places_of_stay' is not defined

## Vows / Votos

In [167]:
import pyuca  # to sort accented characters properly

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

jesuita_entrada = entities_with_attribute(
    entity_type='person',
    show_elements=['name','groupname'],
    the_type='jesuita-votos',
    more_attributes=['jesuita-votos-local','nascimento', 'jesuita-entrada'],
    db=tlnb.db,
)
jesuita_entrada.info()

coll = pyuca.Collator()

# we use the Collator class to sort the names and countries properly
# since Python does not sort accented characters properly
# we create two new columns with the sort keys

# first we fill the NaN values with empty strings
jesuita_entrada['jesuita-votos-local'] = jesuita_entrada['jesuita-votos-local'].fillna('?')
jesuita_entrada['jesuita-votos-local'] = jesuita_entrada['jesuita-votos-local'].fillna('?')

# now we add two columns with sort keys
jesuita_entrada['jesuita-votos-loc-sort'] = jesuita_entrada['jesuita-votos-local'].apply(coll.sort_key)
jesuita_entrada['name_sort'] = jesuita_entrada['name'].apply(coll.sort_key)

<class 'pandas.core.frame.DataFrame'>
Index: 629 entries, aloys-moriz to joao-cardoso
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   name                      629 non-null    object
 1   groupname                 629 non-null    object
 2   jesuita-votos.type        629 non-null    object
 3   jesuita-votos             629 non-null    object
 4   jesuita-votos.date        629 non-null    object
 5   jesuita-votos.line        629 non-null    int64 
 6   jesuita-votos.level       629 non-null    int64 
 7   jesuita-votos.obs         629 non-null    object
 8   jesuita-votos-local       529 non-null    object
 9   jesuita-votos-local.date  529 non-null    object
 10  jesuita-votos-local.obs   529 non-null    object
 11  nascimento                604 non-null    object
 12  nascimento.date           604 non-null    object
 13  nascimento.obs            604 non-null    object
 14  jesuita-entr

In [None]:
# filter dataframe with rows with index in "conimbricensis"


In [170]:

df = jesuita_entrada[jesuita_entrada.index.isin(conimbricenses)].copy()
# calculate age at date "jesuita-votos.date" considering "nascimento.date" as date of birth
# and using the function convert_timelink_date to convert the Timelink date to a Python datetime
df['nascimento.date_2'] = df['nascimento.date'].fillna('0').apply(convert_timelink_date)
df['jesuita-votos.date_2'] = df['jesuita-votos.date'].fillna('0').apply(convert_timelink_date)
# calculate diference in years between the two dates
df['age_at_votos'] = df.apply(lambda row: calc_age_at(row['nascimento.date_2'], row['jesuita-votos.date_2']), axis=1)
df['age_at_votos'] = df['age_at_votos'].fillna(0).astype(int)
df[['nascimento.date','jesuita-votos.date','jesuita-votos','age_at_votos']].head(50)


Unnamed: 0_level_0,nascimento.date,jesuita-votos.date,jesuita-votos,age_at_votos
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
deh-andre-carneiro,16500000,16821208,4V,32
deh-andre-ferrao,16250000,16590116,4V,33
deh-andre-palmeiro,15690000,16030727,4V,34
deh-antao-dantas,16741102,17110815,4V,36
deh-antonio-da-silva,16540113,16870815,4V,33
deh-antonio-de-andrade,15800000,16121014,4V,32
deh-antonio-de-gouvea,15920000,16460714,4V,54
deh-antonio-de-gouvea,15920000,16460714,4V,54
deh-antonio-de-magalhaes,16770000,17110202,4V,33
deh-antonio-de-melo,16810803,17191203,4V,38


## Other



###  Remover colunas sem valores

---



###  Remove empty columns

In [53]:
jesuita_entrada_coimbra_estadia.dropna(how='all', axis=1, inplace=True)
jesuita_entrada_coimbra_estadia.info()

<class 'pandas.core.frame.DataFrame'>
Index: 181 entries, deh-afonso-aires to deh-tome-pereira
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  181 non-null    object
 1   groupname             181 non-null    object
 2   jesuita-entrada       181 non-null    object
 3   jesuita-entrada.date  181 non-null    object
 4   jesuita-entrada.obs   181 non-null    object
 5   nascimento            181 non-null    object
 6   nascimento.date       181 non-null    object
 7   nascimento.obs        181 non-null    object
 8   estadia               176 non-null    object
 9   estadia.date          176 non-null    object
 10  estadia.obs           176 non-null    object
 11  morte                 172 non-null    object
 12  morte.date            172 non-null    object
 13  morte.obs             172 non-null    object
dtypes: object(14)
memory usage: 21.2+ KB


In [54]:
jesuita_entrada_coimbra_estadia.head(5)

Unnamed: 0_level_0,name,groupname,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.obs,nascimento,nascimento.date,nascimento.obs,estadia,estadia.date,estadia.obs,morte,morte.date,morte.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
deh-afonso-aires,Afonso Aires,n,Coimbra,16490000,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Santarém,16300000,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",,,,"[No mar, depois do Cabo da Boa Esperança]",16570000,
deh-andre-carneiro,André Carneiro,n,Coimbra,16640325,"extra_info: {""date"": {""comment"": ""ou 16630325""...",Porto,16500000,"extra_info: {""date"": {""comment"": ""ou 16480923""...","Chinchinim, Goa",17050000.0,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Goa,17090612,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-antao-dantas,Antão Dantas,n,Coimbra,16930310,"extra_info: {""date"": {""comment"": ""ou 16931016 ...",Porto,16741102,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Macau,17050000.0,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Goa,17210522,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-antao-dantas,Antão Dantas,n,Coimbra,16930310,"extra_info: {""date"": {""comment"": ""ou 16931016 ...",Porto,16741102,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Manila,17070000.0,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Goa,17210522,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-antao-dantas,Antão Dantas,n,Coimbra,16930310,"extra_info: {""date"": {""comment"": ""ou 16931016 ...",Porto,16741102,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Macau,17120000.0,"extra_info: {""value"": {""comment"": ""@wikidata:Q...",Goa,17210522,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [20]:
# create a column with the index values which are the id numbers
jesuita_entrada_coimbra_estadia['id'] = jesuita_entrada_coimbra_estadia.index.values

col = 'estadia' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = jesuita_entrada_coimbra_estadia.groupby(col).agg({'id':'nunique',
                                                  'estadia.date':'min',
                                                  'estadia.date':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,estadia.date
estadia,Unnamed: 1_level_1,Unnamed: 2_level_1
Macau,19,17480000
Goa,12,17270000
Cantão,8,17190000
Pequim,6,17440000
Índia,5,17220000
Nanquim,4,17140000
Malaca,4,15720000
Sião,4,17450000
China,3,17340000
Roma,3,16950800



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [21]:
from timelink.pandas import attribute_values

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)


In [22]:
df_totals.head(10)


Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,512,0,17880828
Coimbra,43,15420000,17420427
Paris,42,15340815,17590310
Lisboa,39,15460000,17521123
Roma,32,15400927,17560709
Goa,30,15480000,17360000
Évora,22,15660714,17460405
Nancy,15,16270929,17510827
Macau,14,16280000,17490201
Avignon,13,16150926,17370927


#### Filtrar por datas

---

#### Filter by dates



In [29]:
df_totals = attribute_values('grau',dates_between=('1535','1600'),db=tlnb.db)

In [30]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bacharel,9,1554-07-19,1593-06-09
Formatura,6,1574-07-24,1593-05-12
Bacharel em Artes,3,1574-03-14,1588-03-12
Licenciado,3,1574-06-03,1597-05-06
Doutor,2,1560-12-22,1589-10-27
Licenciado em Artes,2,1574-05-15,1589-01-13


## Visualizar registos / View records





### Visualizar uma pessoa / View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [58]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = 'deh-abraham-le-royer'
pdf = person_attributes([id],db=tlnb.db)  # note id in a list
pdf[['the_date','the_type','the_value','attr_obs']].sort_values(['the_date','the_type'])

Unnamed: 0_level_0,the_date,the_type,the_value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
deh-abraham-le-royer,0,dehergne,465,"Le Royer, Abraham (franç.) P. 465 N. 9 févr. 1..."
deh-abraham-le-royer,0,estadia,Kwangtung,"extra_info: {""value"": {""comment"": ""?""}}"
deh-abraham-le-royer,0,jesuita-estatuto,Padre,
deh-abraham-le-royer,0,nacionalidade,França,
deh-abraham-le-royer,16460209,nascimento,"Domfront, diocese de Mans","extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,16460209,nascimento@wikidata,https://www.wikidata.org/wiki/Q659992,"extra_info: {""value"": {""original"": ""Domfront, ..."
deh-abraham-le-royer,16650831,jesuita-entrada,Paris,"extra_info: {""value"": {""comment"": ""@wikidata:Q..."
deh-abraham-le-royer,16650831,jesuita-entrada@wikidata,https://www.wikidata.org/wiki/Q90,"extra_info: {""value"": {""original"": ""Paris""}}"
deh-abraham-le-royer,16770000,jesuita-ordenacao-padre,?,
deh-abraham-le-royer,16810202,jesuita-votos,4V,


In [16]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Adriano Pestana/m/id=deh-adriano-pestana
  ls$nacionalidade/Portugal/0
  ls$jesuita-estatuto/Padre/0
  ls$nascimento@/"https://www.wikidata.org/wiki/Q209036"/16170000
  ls$nascimento/Ilha de S. Miguel, Açores/16170000
  ls$jesuita-entrada/?/16350000
  ls$embarque/?/16430330
  ls$wicky/912/16430330
  ls$wicky-viagem/77/16430330
  ls$estadia@/"https://www.wikidata.org/wiki/Q14773"/16480000
  ls$estadia/Macau (colégio)/16480000
  ls$estadia@/"https://www.wikidata.org/wiki/Q14773"/16540000
  ls$estadia/Macau (colégio)/16540000
  ls$dehergne/631/0/obs="""
      Pestana, Adriano (port.) P. 631 N. Ile S. Miguel, 1617 -E. 1635 (Lus. 44 II, 580). Emb. 30 mars 1643 (W 912)., en 1648 au collège de Macao, mais « pertinet ad vice-provinciam Sinensem » (JS 25, 148), de même en 1654 (JS 25, 161v). M. (n'est plus au catal. de 1660, JS 134, 344)., omis par HS 48.
  """


#### Atributos de uma pessoa numa tabela, attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

In [39]:
# Get list of people with with a certain value in a specific attribute

id = '316297'  # João Pedro Ribeiro
# id = '234295'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='uc.entrada',  # we need a base attribute
                    more_info=['name'],
                    more_cols=['instituta','faculdade'],
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','faculdade','faculdade.date','faculdade.obs']
df.info()
df[view_cols].sort_values('faculdade.date')

<class 'pandas.core.frame.DataFrame'>
Index: 503 entries, 140337 to 269364
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             503 non-null    object
 1   uc.entrada       503 non-null    object
 2   uc.entrada.date  503 non-null    object
 3   uc.entrada.obs   0 non-null      object
 4   instituta        205 non-null    object
 5   instituta.date   205 non-null    object
 6   instituta.obs    205 non-null    object
 7   faculdade        447 non-null    object
 8   faculdade.date   447 non-null    object
 9   faculdade.obs    29 non-null     object
dtypes: object(10)
memory usage: 43.2+ KB


Unnamed: 0_level_0,name,faculdade,faculdade.date,faculdade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140345,Jorge Abenajado,Medicina,1537-12-17,
140669,Diogo de Abreu,Leis,1538-05-07,
140347,Adrião de Aboim,Leis,1539-12-07,
140708,Filipe de Abreu,Cânones,1548-06-23,
140541,António de Abreu,Cânones,1551-10-00,
140358,João de Aboim,Cânones,1551-10-01,
140670,Diogo de Abreu,Leis,1552-07-00,
140695,Duarte de Abreu,Cânones,1566-01-20,
140349,António de Aboim,Cânones,1566-12-20,
140742,Francisco de Abreu,Medicina,1573-10-03,


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [15]:
from timelink.pandas import display_group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

dup_ids = ['234295','234710',]  # Alexandre Metelo de

display_group_attributes(dup_ids,
                             header_cols=['uc-entrada','naturalidade','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

Unnamed: 0,id,uc-entrada,naturalidade,faculdade,nome-pai
0,234295,1704-11-07,Marialva,Cânones,
1,234710,1705-10-24,Marialva,Matemática,Manuel Cardoso Metelo


Unnamed: 0,date,id,type,value,attr_obs
0,1704-11-07,234295,faculdade,Cânones,Cânones
1,1704-11-07,234295,instituta,1704-11-07,07.11.1704 1704-11-07
2,1704-11-07,234295,naturalidade,Marialva,
3,1704-11-07,234295,nome,Alexandre Metelo de Sousa,
4,1704-11-07,234295,uc-entrada,1704-11-07,
5,1704-11-07,234295,uc-entrada.ano,1704,
6,1705-10-24,234710,faculdade,Matemática,Matemática
7,1705-10-24,234295,matricula-faculdade,Cânones,24.10.1705
8,1705-10-24,234710,matricula-faculdade,Matemática,24.10.1705
9,1705-10-24,234710,naturalidade,Marialva,
