# Dehergne repertoire analysis

> First time use: follow instructions in the [README.md](README.md) file in this directory.

In [1]:
import timelink

timelink.version

'1.1.2'

## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [3]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook(db_type='postgres')



TimelinkNotebook created a database to store data and a Kleio server to 
translate transcriptions in Kleio notation into data that can be imported
into a database.

To get information about the database and the Kleio server do ```tnlb.print_info()```

In [4]:
tlnb.print_info()

Project name: dehergne-repertoire
Project home: /Users/jrc/mhk-home/sources/dehergne-repertoire
Database type: postgres
Database name: dehergne_repertoire
Kleio image: timelinkserver/kleio-server
Kleio server token: RnTJRkMNM3OlaPU2GKla7vWEpwHLVi4D
Kleio server URL: http://127.0.0.1:8088
Kleio server home: /Users/jrc/mhk-home/sources/dehergne-repertoire
Kleio server container: laughing_mclaren
Kleio version requested: latest
Kleio server version: 12.4.567 (2024-02-07 13:02:03)
Postgres image: postgres
Postgres version: latest
Postgres user: postgres
Postgres password: nexzFlJtQR
TimelinkNotebook(project_name=dehergne-repertoire, project_home=/Users/jrc/mhk-home/sources/dehergne-repertoire, db_type=postgres, db_name=dehergne_repertoire, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [5]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,entities,13
1,syspar,0
2,syslog,0
3,kleiofiles,0
4,attributes,0
5,relations,0
6,acts,0
7,sources,0
8,persons,0
9,goods,0


## Check the status of the files

Check the import status of the translated files

I: Imported

E: Imported with error

W: Imported with warnings no errors

N: Not imported

U: Translation updated need to reimport

In [24]:
kleio_files = tlnb.get_kleio_files()
kleio_files.info()
kleio_files[["name","status","errors","warnings","import_status","imported","import_errors","import_warnings"]]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   path                30 non-null     object             
 1   name                30 non-null     object             
 2   modified            30 non-null     datetime64[ns, UTC]
 3   status              30 non-null     object             
 4   translated          30 non-null     datetime64[ns, UTC]
 5   errors              30 non-null     int64              
 7   import_status       30 non-null     object             
 8   import_errors       30 non-null     Int64              
 10  import_error_rpt    30 non-null     object             
 12  imported            30 non-null     datetime64[ns]     
 13  rpt_url             30 non-null     object             
 14  xml_url             30 non-null     object             
dtypes: Int64(2), datetime64[ns, UTC](2), d

Unnamed: 0,name,status,errors,warnings,import_status,imported,import_errors,import_warnings
0,dehergne-a.cli,V,0,0,I,2024-03-15 07:07:54.187988,0,0
1,dehergne-b.cli,V,0,0,I,2024-03-15 07:08:39.518102,0,0
2,dehergne-c.cli,V,0,0,I,2024-03-15 07:09:46.150519,0,0
3,dehergne-d.cli,V,0,0,I,2024-03-15 07:10:11.728839,0,0
4,dehergne-e.cli,V,0,0,I,2024-03-15 07:10:15.366330,0,0
5,dehergne-f.cli,V,0,0,E,2024-03-15 07:10:55.811548,2,2
6,dehergne-g.cli,V,0,0,E,2024-03-15 07:11:34.882548,3,3
7,dehergne-h.cli,V,0,0,I,2024-03-15 07:11:46.119281,0,0
8,dehergne-i.cli,V,0,0,I,2024-03-15 07:11:50.676954,0,0
9,dehergne-j.cli,V,0,0,I,2024-03-15 07:11:56.348648,0,0


In [26]:
rpt = tlnb.get_import_rpt(kleio_files,rows=6)
print(rpt)

ERROR: dehergne-g.cli 25 storing rel$class=relation/date=0/groupname=rel/id=deh-giandomenico-gabiani-rel1-120/destination=deh-lascaris-ref1/level=4/line=25/undef=Lascaris/order=12/type=parentesco/value=parente
 (psycopg2.errors.ForeignKeyViolation) insert or update on table "relations" violates foreign key constraint "relations_destination_fkey"
DETAIL:  Key (destination)=(deh-lascaris-ref1) is not present in table "entities".

[SQL: INSERT INTO relations (id, origin, destination, the_type, the_value, the_date, obs) VALUES (%(id)s, %(origin)s, %(destination)s, %(the_type)s, %(the_value)s, %(the_date)s, %(obs)s)]
[parameters: {'id': 'deh-giandomenico-gabiani-rel1-120', 'origin': 'deh-giandomenico-gabiani', 'destination': 'deh-lascaris-ref1', 'the_type': 'parentesco', 'the_value': 'parente', 'the_date': '0', 'obs': None}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

ERROR: dehergne-g.cli 190 storing rel$class=relation/date=16730315/date=16730315/groupname=rel/id=deh-dida


# Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

In [23]:
import logging
logging.basicConfig(level=logging.INFO)

tlnb.update_from_sources(with_errors=True)

## Jesuit Entry

In [7]:
import pandas as pd
from timelink.pandas import attribute_values
# set max rows to 500
pd.set_option('display.max_rows', 500)

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)
df_totals

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,512,0,17880828
Coimbra,43,15420000,17420427
Paris,42,15340815,17590310
Lisboa,39,15460000,17521123
Roma,32,15400927,17560709
Goa,30,15480000,17360000
Évora,22,15660714,17460405
Nancy,15,16270929,17510827
Macau,14,16280000,17490201
Avignon,13,16150926,17370927


### Entrada unkown with country and place of birth

In [29]:

import pandas as pd
from timelink.pandas import entities_with_attribute
# show 500 rows
pd.set_option('display.max_rows', 550)

jesuita_entrada = entities_with_attribute(
    entity_type='person',
    show_elements=['name'],
    the_type='jesuita-entrada',
    the_value='?',
    more_attributes=['nacionalidade','nascimento'],
    db=tlnb.db,
)
jesuita_entrada.info()

cols=['name','nacionalidade','nascimento','jesuita-entrada.date','jesuita-entrada',]
jesuita_entrada[jesuita_entrada['nacionalidade']=='Portugal'][cols].sort_values(by=['nacionalidade','nascimento'])



<class 'pandas.core.frame.DataFrame'>
Index: 517 entries, deh-adam-algenler to joao-cardoso
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  517 non-null    object
 1   jesuita-entrada       517 non-null    object
 2   jesuita-entrada.date  517 non-null    object
 3   jesuita-entrada.obs   5 non-null      object
 4   nacionalidade         486 non-null    object
 5   nacionalidade.date    486 non-null    object
 6   nacionalidade.obs     1 non-null      object
 7   nascimento            509 non-null    object
 8   nascimento.date       509 non-null    object
 9   nascimento.obs        3 non-null      object
dtypes: object(10)
memory usage: 44.4+ KB


Unnamed: 0_level_0,name,nacionalidade,nascimento,jesuita-entrada.date,jesuita-entrada
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
deh-francisco-simoes,Francisco Simões,Portugal,?,16661028,?
deh-goncalo-de-oliveira,Gonçalo de Oliveira,Portugal,?,16370000,?
deh-jose-galvao,José Galvão,Portugal,?,17431016,?
deh-luis-pereira,Luís Pereira,Portugal,?,17251125,?
deh-manuel-goncalves,Manuel Gonçalves,Portugal,?,17390810,?
deh-manuel-ribeiro-junior,"Manuel Ribeiro, junior",Portugal,?,17000301,?
deh-estevao-coelho,Estêvão Coelho,Portugal,"Abrantes, diocese da Guarda",16040000,?
deh-manuel-lopes-ref3,Manuel Lopes,Portugal,"Abrantes, diocese da Guarda",16030000,?
deh-bartolomeu-de-roboredo,Bartolomeu de Roboredo,Portugal,"Alpalhão, diocese de Portalegre",16250000,?
deh-domingos-pereira-ref1,Domingos Pereira,Portugal,"Alvito, diocese de Évora",16030000,?



# Obter dados
---

# Getting data

### Procurar pessoa, mostrar em notação Kleio

---
### Search for people, show in Kleio notation

In [None]:
from timelink.api.models import Person

show_only=10

with tlnb.db.session() as session:
    persons = session.query(Person).filter(Person.name.like('%Ricci')).all()

    for person in persons[:show_only]:
        print(person.to_kleio())
        print()

In [None]:
from sqlalchemy import select, and_
from sqlalchemy.orm import joinedload
from timelink.api.models import Person, Attribute

show_only=10

with tlnb.db.session() as session:
    stmt = select(Person).join(Person.attributes.and_(Attribute.the_type == 'naturalidade',
                                                      Attribute.the_value.like('Lisboa')))
    print(stmt)
    persons = session.execute(stmt).scalars().all()

    for person in persons[:show_only]:
        p = tlnb.db.get_person(person.id)
        print(p.to_kleio())




###  Obter um dataframe a partir de atributos

---


###  Get a Dataframe from attributes


#### Exemplo: Faculdade, data de entrada e data de saída e grau dos naturais de Coimbra

In [16]:
from timelink.pandas import entities_with_attribute


# Get list of people with with a certain value in a specific attribute
df = entities_with_attribute(
                    entity_type='person',
                    the_type='naturalidade',
                    the_value='Coimbra',
                    more_info=['name'],
                    more_cols=['faculdade','uc.entrada','uc.saida'],
                    db=tlnb.db,
                    sql_echo=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 140349 to 140367
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               14 non-null     object
 1   naturalidade       14 non-null     object
 2   naturalidade.date  14 non-null     object
 3   naturalidade.obs   0 non-null      object
 4   faculdade          14 non-null     object
 5   faculdade.date     14 non-null     object
 6   faculdade.obs      1 non-null      object
 7   uc.entrada         14 non-null     object
 8   uc.entrada.date    14 non-null     object
 9   uc.entrada.obs     0 non-null      object
 10  uc.saida           14 non-null     object
 11  uc.saida.date      14 non-null     object
 12  uc.saida.obs       0 non-null      object
dtypes: object(13)
memory usage: 2.1+ KB


In [17]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.entrada.obs,uc.saida,uc.saida.date,uc.saida.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
140349,António de Aboim,Coimbra,1566-12-20,,Cânones,1566-12-20,,1566-12-20,1566-12-20,,1574-07-24,1574-07-24,
140671,Diogo de Abreu,Coimbra,1604-10-11,,Cânones,1604-10-11,,1604-10-11,1604-10-11,,1604-10-11,1604-10-11,
140643,Bernardo de Abreu,Coimbra,1612-10-15,,Cânones,1612-10-15,,1612-10-15,1612-10-15,,1616-10-19,1616-10-19,
140629,Bartolomeu de Abreu,Coimbra,1627-10-19,,Cânones,1627-10-19,,1627-10-19,1627-10-19,,1638-10-11,1638-10-11,
140662,Cristóvão de Abreu,Coimbra,1660-01-08,,Cânones,1660-01-08,,1660-01-08,1660-01-08,,1662-10-22,1662-10-22,




###  Remover colunas sem valores

---



###  Remove empty columns

In [18]:
df.dropna(how='all', axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 140349 to 140367
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               14 non-null     object
 1   naturalidade       14 non-null     object
 2   naturalidade.date  14 non-null     object
 3   faculdade          14 non-null     object
 4   faculdade.date     14 non-null     object
 5   faculdade.obs      1 non-null      object
 6   uc.entrada         14 non-null     object
 7   uc.entrada.date    14 non-null     object
 8   uc.saida           14 non-null     object
 9   uc.saida.date      14 non-null     object
dtypes: object(10)
memory usage: 1.7+ KB


In [19]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.saida,uc.saida.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
140349,António de Aboim,Coimbra,1566-12-20,Cânones,1566-12-20,,1566-12-20,1566-12-20,1574-07-24,1574-07-24
140671,Diogo de Abreu,Coimbra,1604-10-11,Cânones,1604-10-11,,1604-10-11,1604-10-11,1604-10-11,1604-10-11
140643,Bernardo de Abreu,Coimbra,1612-10-15,Cânones,1612-10-15,,1612-10-15,1612-10-15,1616-10-19,1616-10-19
140629,Bartolomeu de Abreu,Coimbra,1627-10-19,Cânones,1627-10-19,,1627-10-19,1627-10-19,1638-10-11,1638-10-11
140662,Cristóvão de Abreu,Coimbra,1660-01-08,Cânones,1660-01-08,,1660-01-08,1660-01-08,1662-10-22,1662-10-22



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [20]:
# create a column with the index values which are the id numbers
df['id'] = df.index.values

col = 'faculdade' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = df.groupby(col).agg({'id':'nunique',
                                                  'uc.entrada':'min',
                                                  'uc.saida':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,uc.entrada,uc.saida
faculdade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cânones,7,0000-00-00,1767-07-27
Medicina,5,1670-10-01,1823-10-20
Direito,1,1748-10-19,1748-10-19
Matemática,1,1868-10-02,1872-10-03



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [25]:
from timelink.pandas import attribute_values

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)


In [26]:
df_totals.head(10)


Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,31,15550000,17581126
Coimbra,6,15420000,16770701
Goa,3,15480502,16660000
Lisboa,3,16680000,17460223
Avignon,2,17350731,17370927
Paris,2,16700926,16771029
Saragoça,2,16740613,16771115
Évora,2,15710304,17230000
Alcalá de Henares,1,15580413,15580413
Bolonha,1,17250728,17250728


#### Filtrar por datas

---

#### Filter by dates



In [29]:
df_totals = attribute_values('grau',dates_between=('1535','1600'),db=tlnb.db)

In [30]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bacharel,9,1554-07-19,1593-06-09
Formatura,6,1574-07-24,1593-05-12
Bacharel em Artes,3,1574-03-14,1588-03-12
Licenciado,3,1574-06-03,1597-05-06
Doutor,2,1560-12-22,1589-10-27
Licenciado em Artes,2,1574-05-15,1589-01-13


## Visualizar registos

---

## View records





### Visualizar uma pessoa

---

### View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [40]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = '140687'
pdf = person_attributes([id],db=tlnb.db)  # note id in a list
pdf[['date','type','value','attr_obs']].sort_values(['date','type'])

Unnamed: 0_level_0,date,type,value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140687,1821-10-31,faculdade,"""#29""",
140687,1821-10-31,faculdade.ano,"""#29.1821""",
140687,1821-10-31,matricula,Filosofia,
140687,1821-10-31,matricula.ano,Filosofia.1821,
140687,1821-10-31,naturalidade,Vilar de Veiga,
140687,1821-10-31,naturalidade.ano,Vilar de Veiga.1821,
140687,1821-10-31,nome,Domingos José da Cunha Abreu,
140687,1821-10-31,nome-geografico,Vilar de Veiga,
140687,1821-10-31,nome-geografico.ano,Vilar de Veiga.1821,
140687,1821-10-31,nome.apelido,José da Cunha Abreu,


In [41]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Domingos José da Cunha Abreu/m/id=140687/obs="""
      """

                  Id: 140687
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000404

                  Nome        : Domingos José da Cunha Abreu
                  Data inicial: 1821-10-31
                  Data final  : 1821-11-05
                  Filiação:

                  Naturalidade: Vilar de Veiga
                  Faculdade: #29;Matemática

                  Matrícula(s): Filosofia: 1821/10/31 (obrigado)
                  Matemática: 1821/11/05 (obrigado)
                  Instituta:
                  Bacharel:
                  Formatura:
                  Licenciado:
                  Doutor:

                  Outras informações:
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000404""/2021-04-23
  atr$data-do-registo/2021-04-23/2021-04-23
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140687""

#### Atributos de uma pessoa numa tabela, attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

In [39]:
# Get list of people with with a certain value in a specific attribute

id = '316297'  # João Pedro Ribeiro
# id = '234295'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='uc.entrada',  # we need a base attribute
                    more_info=['name'],
                    more_cols=['instituta','faculdade'],
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','faculdade','faculdade.date','faculdade.obs']
df.info()
df[view_cols].sort_values('faculdade.date')

<class 'pandas.core.frame.DataFrame'>
Index: 503 entries, 140337 to 269364
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             503 non-null    object
 1   uc.entrada       503 non-null    object
 2   uc.entrada.date  503 non-null    object
 3   uc.entrada.obs   0 non-null      object
 4   instituta        205 non-null    object
 5   instituta.date   205 non-null    object
 6   instituta.obs    205 non-null    object
 7   faculdade        447 non-null    object
 8   faculdade.date   447 non-null    object
 9   faculdade.obs    29 non-null     object
dtypes: object(10)
memory usage: 43.2+ KB


Unnamed: 0_level_0,name,faculdade,faculdade.date,faculdade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140345,Jorge Abenajado,Medicina,1537-12-17,
140669,Diogo de Abreu,Leis,1538-05-07,
140347,Adrião de Aboim,Leis,1539-12-07,
140708,Filipe de Abreu,Cânones,1548-06-23,
140541,António de Abreu,Cânones,1551-10-00,
140358,João de Aboim,Cânones,1551-10-01,
140670,Diogo de Abreu,Leis,1552-07-00,
140695,Duarte de Abreu,Cânones,1566-01-20,
140349,António de Aboim,Cânones,1566-12-20,
140742,Francisco de Abreu,Medicina,1573-10-03,


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [15]:
from timelink.pandas import display_group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

dup_ids = ['234295','234710',]  # Alexandre Metelo de

display_group_attributes(dup_ids,
                             header_cols=['uc-entrada','naturalidade','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

Unnamed: 0,id,uc-entrada,naturalidade,faculdade,nome-pai
0,234295,1704-11-07,Marialva,Cânones,
1,234710,1705-10-24,Marialva,Matemática,Manuel Cardoso Metelo


Unnamed: 0,date,id,type,value,attr_obs
0,1704-11-07,234295,faculdade,Cânones,Cânones
1,1704-11-07,234295,instituta,1704-11-07,07.11.1704 1704-11-07
2,1704-11-07,234295,naturalidade,Marialva,
3,1704-11-07,234295,nome,Alexandre Metelo de Sousa,
4,1704-11-07,234295,uc-entrada,1704-11-07,
5,1704-11-07,234295,uc-entrada.ano,1704,
6,1705-10-24,234710,faculdade,Matemática,Matemática
7,1705-10-24,234295,matricula-faculdade,Cânones,24.10.1705
8,1705-10-24,234710,matricula-faculdade,Matemática,24.10.1705
9,1705-10-24,234710,naturalidade,Marialva,
