# Using Timelink in notebooks: tutorial

> First time use: follow instructions in the [README.md](README.md) file in this directory.

## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [2]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()


TimelinkNotebook created a database to store data and a Kleio server to 
translate transcriptions in Kleio notation into data that can be imported
into a database.

To get information about the database and the Kleio server do ```tnlb.print_info()```

In [3]:
tlnb.print_info()

Project name: dehergne-locations
Project home: /Users/jrc/mhk-home/sources/dehergne-locations
Database type: sqlite
Database name: dehergne_locations
Kleio image: timelinkserver/kleio-server
Kleio server token: gTMTcxSPsD96MhYi9Eu3cxVXmaBYht6k
Kleio server URL: http://127.0.0.1:8089
Kleio server home: /Users/jrc/mhk-home/sources/dehergne-locations
Kleio server container: pensive_moore
Kleio version requested: latest
Kleio server version: 12.4.567 (2024-02-07 13:02:03)
SQLite directory: /Users/jrc/mhk-home/sources/dehergne-locations/database/sqlite
TimelinkNotebook(project_name=dehergne-locations, project_home=/Users/jrc/mhk-home/sources/dehergne-locations, db_type=sqlite, db_name=dehergne_locations, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [4]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,29
1,aregisters,1
2,attributes,25482
3,class_attributes,70
4,classes,14
5,entities,31624
6,geoentities,359
7,goods,0
8,kleiofiles,31
9,objects,0


### Show the kleio files available 

In [5]:
kleio_files = tlnb.get_kleio_files()
kleio_files

Unnamed: 0,path,name,modified,status,translated,errors,warnings,import_status,import_errors,import_warnings,import_error_rpt,import_warning_rpt,imported,rpt_url,xml_url
0,sources/dehergne-a.cli,dehergne-a.cli,2023-11-29 02:16:33.838462+00:00,V,2023-10-13 04:46:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:22:59.712358,/rest/reports/sources/dehergne-a.rpt,/rest/exports/sources/dehergne-a.xml
1,sources/dehergne-b.cli,dehergne-b.cli,2023-11-29 02:16:33.847301+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:23:11.907971,/rest/reports/sources/dehergne-b.rpt,/rest/exports/sources/dehergne-b.xml
2,sources/dehergne-c.cli,dehergne-c.cli,2023-11-29 02:16:33.858180+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:23:29.418688,/rest/reports/sources/dehergne-c.rpt,/rest/exports/sources/dehergne-c.xml
3,sources/dehergne-d.cli,dehergne-d.cli,2023-11-29 02:16:33.873745+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:23:36.819380,/rest/reports/sources/dehergne-d.rpt,/rest/exports/sources/dehergne-d.xml
4,sources/dehergne-e.cli,dehergne-e.cli,2023-11-29 02:16:33.879442+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:23:38.526859,/rest/reports/sources/dehergne-e.rpt,/rest/exports/sources/dehergne-e.xml
5,sources/dehergne-f.cli,dehergne-f.cli,2023-11-29 02:16:33.882008+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:23:49.669086,/rest/reports/sources/dehergne-f.rpt,/rest/exports/sources/dehergne-f.xml
6,sources/dehergne-g.cli,dehergne-g.cli,2023-11-29 02:16:33.896159+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:24:00.428726,/rest/reports/sources/dehergne-g.rpt,/rest/exports/sources/dehergne-g.xml
7,sources/dehergne-h.cli,dehergne-h.cli,2023-11-29 02:16:33.911342+00:00,V,2023-10-13 04:47:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:24:04.137667,/rest/reports/sources/dehergne-h.rpt,/rest/exports/sources/dehergne-h.xml
8,sources/dehergne-i.cli,dehergne-i.cli,2023-11-29 02:16:33.916014+00:00,V,2023-07-16 20:43:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:24:06.156498,/rest/reports/sources/dehergne-i.rpt,/rest/exports/sources/dehergne-i.xml
9,sources/dehergne-j.cli,dehergne-j.cli,2023-11-29 02:16:33.919046+00:00,V,2023-07-16 20:43:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-07 03:24:08.424135,/rest/reports/sources/dehergne-j.rpt,/rest/exports/sources/dehergne-j.xml



# Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

In [6]:
import logging
logging.basicConfig(level=logging.INFO)

tlnb.update_from_sources()

## Check the status of the files

Check the import status of the translated files

```
I: Imported
E: Imported with error
W: Imported with warnings no errors
N: Not imported
U: Translation updated need to reimport
``` 

In [7]:
imported_files_df = tlnb.get_import_status()
imported_files_df[["import_status","import_errors","import_warnings","name","errors","warnings","path"]].sort_values("name")


Unnamed: 0,import_status,import_errors,import_warnings,name,errors,warnings,path
0,I,0,0,dehergne-a.cli,0,0,sources/dehergne-a.cli
1,I,0,0,dehergne-b.cli,0,0,sources/dehergne-b.cli
2,I,0,0,dehergne-c.cli,0,0,sources/dehergne-c.cli
3,I,0,0,dehergne-d.cli,0,0,sources/dehergne-d.cli
4,I,0,0,dehergne-e.cli,0,0,sources/dehergne-e.cli
5,I,0,0,dehergne-f.cli,0,0,sources/dehergne-f.cli
6,I,0,0,dehergne-g.cli,0,0,sources/dehergne-g.cli
7,I,0,0,dehergne-h.cli,0,0,sources/dehergne-h.cli
8,I,0,0,dehergne-i.cli,0,0,sources/dehergne-i.cli
9,I,0,0,dehergne-j.cli,0,0,sources/dehergne-j.cli


### Check the translation report

In [8]:
rpt = tlnb.get_translation_report(imported_files_df, 13)
print(rpt)


KleioTranslator - server version 12.3 - build 565 2024-01-26 03:59:15
8-2-2024 2-37

Processing data file dehergne-locations-1701.cli
-------------------------------------------
Generic Act translation module with geoentities (XML).
     Joaquim Ramos de Carvalho (joaquim@uc.pt) 
** New document: kleio
kleio translation started
Structure: gacto2.str
Prefix: 
Autorel: 
Translation count: 8
Obs: 
** Processing source fonte$dehergne-locations-1701
93: geodesc$deh-chre-1701
*** End of File


Structure file: /kleio-home/structures/sources.str
Structure processing report: /kleio-home/structures/sources.srpt
Structure in JSON: /kleio-home/structures/sources.str.json

Kleio file: /kleio-home/sources/dehergne-locations-1701.cli
Original file: /kleio-home/sources/dehergne-locations-1701.org
Previous version: /kleio-home/sources/dehergne-locations-1701.old
Temp file with ids: /kleio-home/sources/dehergne-locations-1701.ids
** - /kleio-home/sources/dehergne-locations-1701.cli-renamed to- /kleio-ho

### Get the import report for a file

In case there are errors in the import phase check the import report.

In [9]:
rpt = tlnb.get_import_rpt(imported_files_df, rows=[13])
print(rpt)

No errors


## Todo

This as data frame in a single function
* TimelinkNotebook.translate([files_df,paths=List[str], rows=List[int],status="T"])
* TimelinkNotebook.import([files_df,paths=List[str], rows=List[int],status="V"])



# Obter dados
---

# Getting data

### Procurar pessoa, mostrar em notação Kleio

---
### Search for people, show in Kleio notation

In [10]:
from timelink.api.models import Person

show_only=10

with tlnb.db.session() as session:
    ricci = session.query(Person).filter(Person.name.like('M%Ricci')).all()

    for mr in ricci[:show_only]:
        print(mr.to_kleio())
        print()

ricci_ids = [mr.id for mr in ricci]
print(ricci_ids)

referido$Matteo Ricci/m/id=deh-joao-barradas-ref1
  rel$function-in-act/referido/dehergne-notices-b/0
  ls$estadia/rio Kan/15940000

referido$Matteo Ricci/m/id=deh-lazzaro-cattaneo-ref1
  rel$function-in-act/referido/dehergne-notices-c/0
  ls$estadia/Shiuchow (Chao-tcheou)/15940000
  ls$jesuita-tarefa/Toma o hábito de letrado/0
  ls$tarefa/Romaniza os cinco tons do chinês/0

referido$Matteo Ricci/m/id=deh-giovanni-cola-niccolo-ref1
  rel$function-in-act/referido/dehergne-notices-c/0
  ls$chegada@/"https://www.wikidata.org/wiki/Q14773"/15820807
  ls$chegada/Macau/15820807

referido$Matteo Ricci/m/id=deh-sabatino-de-ursis-ref1
  rel$function-in-act/referido/dehergne-notices-d/0
  rel$institucional/Envia a Pequim/Sabatino De Ursis/deh-sabatino-de-ursis/0

referido$Matteo Ricci/m/id=deh-jean-fernandes-tchong-ref1
  rel$function-in-act/referido/dehergne-notices-f/0
  rel$sociabilidade/Envia em auxílio de Bento de Góis/Jean Fernandes Tchong/deh-jean-fernandes-tchong/16060000

referido$Matteo

In [11]:
from sqlalchemy import select

from timelink.api.models import Person, Attribute
import pandas as pd

show_only = 100

with tlnb.db.session() as session:
    stmt = select(Person, Attribute).join(Person.attributes.and_(Attribute.the_type == 'estadia',
                                                                 Attribute.the_value.like('Pequim')))
    print(stmt)
# FILEPATH: /Users/jrc/mhk-home/sources/dehergne-locations/notebooks/0-tutorial.ipynb
result = session.execute(stmt).all()

data = []
for row in result[:show_only]:
    data.append([row.Person.id, row.Person.name, row.Attribute.the_date, row.Attribute.the_type, row.Attribute.the_value])

df = pd.DataFrame(data, columns=['Person ID', 'Person Name', 'Attribute Date', 'Attribute Type', 'Attribute Value'])
df



SELECT persons.id, entities.id AS id_1, entities.class, entities.inside, entities.the_order, entities.the_level, entities.the_line, entities.groupname, entities.updated, entities.indexed, persons.name, persons.sex, persons.obs, attributes_1.id AS id_2, entities_1.id AS id_3, entities_1.class AS class_1, entities_1.inside AS inside_1, entities_1.the_order AS the_order_1, entities_1.the_level AS the_level_1, entities_1.the_line AS the_line_1, entities_1.groupname AS groupname_1, entities_1.updated AS updated_1, entities_1.indexed AS indexed_1, attributes_1.entity, attributes_1.the_type, attributes_1.the_value, attributes_1.the_date, attributes_1.obs AS obs_1 
FROM entities JOIN persons ON entities.id = persons.id JOIN (entities AS entities_1 JOIN attributes AS attributes_1 ON attributes_1.id = entities_1.id) ON entities.id = attributes_1.entity AND attributes_1.the_type = :the_type_1 AND attributes_1.the_value LIKE :the_value_1


  print(stmt)
  result = session.execute(stmt).all()


Unnamed: 0,Person ID,Person Name,Attribute Date,Attribute Type,Attribute Value
0,deh-joseph-de-aguiar-souen,Joseph de Aguiar Souen,17510000,estadia,Pequim
1,deh-giulio-aleni,Giulio Aleni,16130000,estadia,Pequim
2,deh-jose-de-almeida-i,José de Almeida,16450000,estadia,Pequim
3,deh-jose-bernardo-de-almeida,José Bernardo de Almeida,17590000,estadia,Pequim
4,deh-jose-estevao-de-almeida,José Estêvão de Almeida,0,estadia,Pequim
...,...,...,...,...,...
95,deh-claudio-filippo-grimaldi,Claudio Filippo Grimaldi,16940828,estadia,Pequim
96,deh-claudio-filippo-grimaldi-ref3,Charles Thomas Maillard de Tournon,17060520,estadia,Pequim
97,deh-johann-grueber,Johann Grueber,16590802,estadia,Pequim
98,deh-august-von-hallerstein,August von Hallerstein,17390301,estadia,Pequim




###  Obter um dataframe a partir de atributos

---


###  Get a Dataframe from attributes


#### Exemplo: Missionários que estiveram em Pequim, onde entraram
na ordem, e que nacionalidade tinham.

In [12]:
from timelink.pandas import entities_with_attribute


# Get list of people with with a certain value in a specific attribute
df = entities_with_attribute(
                    entity_type='person',
                    the_type='estadia',
                    the_value='Pequim',
                    more_info=['name'],
                    more_cols=['jesuita-entrada','nacionalidade'],
                    db=tlnb.db,
                    sql_echo=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 247 entries, deh-adrien-grelon to deh-vicent-caun
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  247 non-null    object
 1   estadia               247 non-null    object
 2   estadia.date          247 non-null    object
 3   estadia.obs           15 non-null     object
 4   jesuita-entrada       206 non-null    object
 5   jesuita-entrada.date  206 non-null    object
 6   jesuita-entrada.obs   3 non-null      object
 7   nacionalidade         222 non-null    object
 8   nacionalidade.date    222 non-null    object
 9   nacionalidade.obs     1 non-null      object
dtypes: object(10)
memory usage: 21.2+ KB


In [13]:
df.head(55).sort_values('jesuita-entrada')

Unnamed: 0_level_0,name,estadia,estadia.date,estadia.obs,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.obs,nacionalidade,nacionalidade.date,nacionalidade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
deh-adrien-grelon,Adrien Grelon,Pequim,16650613,,?,16430601.0,,França,0.0,
deh-bernhard-diestel,Bernhard Diestel,Pequim,0,,?,16391022.0,,Alemanha,0.0,
deh-bernard-rodes,Bernard Rodes,Pequim,0,,?,16740707.0,,França,0.0,
deh-antonio-simoes-i,António Simões,Pequim,17480000,,?,16780000.0,,Portugal,0.0,
deh-antonio-leitao,António Leitão,Pequim,16030000,,?,16030000.0,,China,0.0,
deh-antonio-de-barros,António de Barros,Pequim,17010700,,?,16810612.0,,Portugal,0.0,
deh-antonio-de-barros,António de Barros,Pequim,16971000,,?,16810612.0,,Portugal,0.0,
deh-anton-gogeisl,Anton Gogeisl,Pequim,17390301,,?,17200913.0,,Alemanha,0.0,
deh-caetano-lopes,Caetano Lopes,Pequim,17270000,,?,17070217.0,,Portugal,0.0,
deh-felix-da-rocha,Félix da Rocha,Pequim,17560321,,?,17280501.0,,Portugal,0.0,




###  Remover colunas sem valores

---



###  Remove empty columns

In [14]:
df.dropna(how='all', axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 247 entries, deh-adrien-grelon to deh-vicent-caun
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  247 non-null    object
 1   estadia               247 non-null    object
 2   estadia.date          247 non-null    object
 3   estadia.obs           15 non-null     object
 4   jesuita-entrada       206 non-null    object
 5   jesuita-entrada.date  206 non-null    object
 6   jesuita-entrada.obs   3 non-null      object
 7   nacionalidade         222 non-null    object
 8   nacionalidade.date    222 non-null    object
 9   nacionalidade.obs     1 non-null      object
dtypes: object(10)
memory usage: 21.2+ KB


In [15]:
df.head(5)

Unnamed: 0_level_0,name,estadia,estadia.date,estadia.obs,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.obs,nacionalidade,nacionalidade.date,nacionalidade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
deh-adrien-grelon,Adrien Grelon,Pequim,16650613,,?,16430601,,França,0,
deh-adrien-grelon,Adrien Grelon,Pequim,16940900,,?,16430601,,França,0,
deh-alexandre-de-la-charme,Alexandre de La Charme,Pequim,17290300,,?,17120907,,França,0,
deh-aloys-kao,Aloys Kao,Pequim,17510107,,Paris,17590310,,China,0,
deh-aloys-kao,Aloys Kao,Pequim,17660509,,Paris,17590310,,China,0,



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [16]:
# create a column with the index values which are the id numbers
df['id'] = df.index.values

col = 'jesuita-entrada' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = df.groupby(col).agg({'id':'nunique',
                                                  'estadia':'min',
                                                  'estadia':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,estadia
jesuita-entrada,Unnamed: 1_level_1,Unnamed: 2_level_1
?,58,Pequim
Paris,16,Pequim
Lisboa,7,Pequim
Nancy,7,Pequim
Coimbra,6,Pequim
Avignon,6,Pequim
Roma,5,Pequim
"Viena, Áustria",4,Pequim
Génova,4,Pequim
Nápoles,4,Pequim



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [17]:
from timelink.pandas import attribute_values

df_totals = attribute_values('estadia', dates_between=("1580-00-00","1610-00-00"),db=tlnb.db)
df_totals.head(20)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Macau,38,15800000,16080000
Goa,15,15820426,16010501
Japão,15,15820220,16090000
Nanquim,14,15950521,16090000
Cantão,11,15800400,16060200
Nagasaki,8,15830725,16060000
Pequim,7,15980907,16090000
Roma,6,15820000,16080000
Shiuchow (Chao-tcheou),6,15891000,16080000
Chao-tcheou,4,15910700,16050100


#### Filtrar por datas

---

#### Filter by dates



In [18]:
df_totals = attribute_values('jesuita-entrada',dates_between=('1580','1600'),db=tlnb.db)

In [19]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,25,15830000,15980000
Coimbra,7,15840000,15961216
Roma,5,15800317,15941106
Douai,2,15941109,15990512
Nápoles,2,15910908,15971106
"Shiuchow, Chao-tcheou fou",2,15891100,15891100
Arona,1,15960921,15960921
Goa,1,15870000,15870000
Japão,1,15801200,15801200
"Messina, Sicília",1,15820000,15820000


## Visualizar registos

---

## View records





### Visualizar uma pessoa

---

### View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [20]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = 'deh-matteo-ricci'
pdf = person_attributes(ricci_ids,db=tlnb.db)  # note id in a list
pdf[['date','type','value','attr_obs']].sort_values(['date','type'])

Unnamed: 0_level_0,date,type,value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
deh-matteo-ricci,0,dehergne,684,"Ricci, Matteo (ital.) P. 684 Li Ma-teou, Si-T'..."
deh-matteo-ricci,0,dehergne,684,"P. 25 juill. 1580, à Cochin, où il célèbre sa ..."
deh-matteo-ricci,0,dehergne,684,"D'Elu III, 239-243. Biographie: en chinois: BN..."
deh-matteo-ricci,0,dehergne,684,"Tacchi-Venturi, Opère storiche del P. M. R., M..."
deh-matteo-ricci,0,dehergne,684,id.. L'apport scientifique du P. M. R à la Chi...
deh-matteo-ricci,0,dehergne,684,"Bettray, Johannes, S.V.D., Die akkomodationmet..."
deh-matteo-ricci,0,dehergne,684,"Sur l'art chinois: Sepp Schûller, P. M. Ricci ..."
deh-matteo-ricci,0,dehergne,684,"— Délia Entrata delta Comp. nella Cina, avec a..."
deh-matteo-ricci,0,dehergne,684,"pécimen d'écriture, nom chinois et romanisatio..."
deh-sebastien-fernandes-tchong-ref2,0,estadia,"Nanchang (Nantch'ang, Kiangsi)",


In [21]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Matteo Ricci/m/id=deh-matteo-ricci
  rel$function-in-act/n/dehergne-notices-r/0
  ls$nacionalidade/Itália/0
  ls$jesuita-estatuto/Padre/0
  ls$nome-chines/Li Ma-teou/0
  ls$nome-chines/Si-T'ai/0
  ls$nascimento@/"https://www.wikidata.org/wiki/Q13135"/15521006
  ls$nascimento/Macerata/15521006
  ls$jesuita-entrada@/"https://www.wikidata.org/wiki/Q220"/15710816
  ls$jesuita-entrada/Roma (Noviciado de Sant'Andrea al Quirinale)/15710816
  ls$embarque/S. Luís/15780324
  ls$wicky/200/15780324
  ls$wicky-viagem/25/15780324
  ls$chegada@/"https://www.wikidata.org/wiki/Q1171"/15780913
  ls$chegada/Goa/15780913
  ls$estadia@/"https://www.wikidata.org/wiki/Q1800"/15800726
  ls$estadia/Cochim/15800726
  ls$tarefa/Celebra a primeira missa/15800726
  ls$chegada@/"https://www.wikidata.org/wiki/Q14773"/15820807
  ls$chegada/Macau/15820807
  ls$estadia@/"https://www.wikidata.org/wiki/Q16572"/15830800
  ls$estadia/Cantão/15830800
  ls$estadia/Shiuhing (Tchao-k'ing fou)/15830910
  ls$jesuita-tarefa/Fun

#### Atributos de uma pessoa numa tabela, attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

In [22]:
# Get list of people with with a certain value in a specific attribute

id = 'deh-matteo-ricci'  # matheo ricci
# id = 'deh-antonio-de-magalhaes-ref1'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='estadia',  # we need a base attribute
                    the_value='Pequim',
                    more_info=['name'],
                    more_cols=['jesuita-entrada','jesuita-estatuto','morte'],
                    dates_in=('1580','1788'),
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','estadia','estadia.date','morte.date','jesuita-entrada','jesuita-entrada.date']
df.info()
df[view_cols].sort_values('estadia.date')

<class 'pandas.core.frame.DataFrame'>
Index: 212 entries, deh-adrien-grelon to deh-vicent-caun
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   name                   212 non-null    object
 1   estadia                212 non-null    object
 2   estadia.date           212 non-null    object
 3   estadia.obs            14 non-null     object
 4   jesuita-entrada        174 non-null    object
 5   jesuita-entrada.date   174 non-null    object
 6   jesuita-entrada.obs    3 non-null      object
 7   jesuita-estatuto       189 non-null    object
 8   jesuita-estatuto.date  189 non-null    object
 9   jesuita-estatuto.obs   0 non-null      object
 10  morte                  167 non-null    object
 11  morte.date             167 non-null    object
 12  morte.obs              6 non-null      object
dtypes: object(13)
memory usage: 23.2+ KB


Unnamed: 0_level_0,name,estadia,estadia.date,morte.date,jesuita-entrada,jesuita-entrada.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
deh-sebastien-fernandes-tchong,Sébastien Fernandes Tchong,Pequim,15980907,16211124.0,"Shiuchow, Chao-tcheou fou",15891100.0
deh-sebastien-fernandes-tchong-ref2,Matteo Ricci,Pequim,15980907,,,
deh-lazzaro-cattaneo,Lazzaro Cattaneo,Pequim,15980907,16400119.0,Roma (Noviciado de Sant'Andrea al Quirinale),15810227.0
deh-lazzaro-cattaneo,Lazzaro Cattaneo,Pequim,15981105,16400119.0,Roma (Noviciado de Sant'Andrea al Quirinale),15810227.0
deh-sebastien-fernandes-tchong-ref2,Matteo Ricci,Pequim,15981105,,,
deh-sebastien-fernandes-tchong,Sébastien Fernandes Tchong,Pequim,15981105,16211124.0,"Shiuchow, Chao-tcheou fou",15891100.0
deh-jacques-niva-ni,Jacques Niva Ni,Pequim,16020700,16381026.0,Nan-tch'ang,16080000.0
deh-sebastien-fernandes-tchong,Sébastien Fernandes Tchong,Pequim,16020920,16211124.0,"Shiuchow, Chao-tcheou fou",15891100.0
deh-antonio-leitao,António Leitão,Pequim,16030000,16111010.0,?,16030000.0
deh-sebastien-fernandes-tchong,Sébastien Fernandes Tchong,Pequim,16040000,16211124.0,"Shiuchow, Chao-tcheou fou",15891100.0


### Where abouts of those that were in Beijing

1. Extract the ids of those that were in Beijing

In [24]:
beijing_ids = df.index.unique()
print(beijing_ids)

Index(['deh-adrien-grelon', 'deh-alexandre-de-la-charme', 'deh-aloys-kao',
       'deh-andre-palmeiro', 'deh-andre-pereira',
       'deh-andrea-giovanni-lubelli', 'deh-antoine-de-beauvollier',
       'deh-antoine-gaubil', 'deh-antoine-thomas', 'deh-antoine-thomas-ref4',
       ...
       'deh-sebastiao-de-almeida', 'deh-sebastien-fernandes-tchong',
       'deh-sebastien-fernandes-tchong-ref2', 'deh-simao-da-cunha',
       'deh-simao-rodrigues', 'deh-thomas-jean-baptiste-lieou',
       'deh-tome-pereira', 'deh-tome-pereira-ref1', 'deh-tome-pereira-ref3',
       'deh-vicent-caun'],
      dtype='object', name='id', length=158)


2. Get their other places of stay

Note that we filter by ids and not by value

In [29]:
df = entities_with_attribute(
                    entity_type='person',
                    the_type='estadia',  # we need a base attribute
                    more_info=['name'],
                    more_cols=['morte'],
                    dates_in=('1580','1788'),
                    filter_by=list(beijing_ids),
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['estadia.date','name','estadia','morte.date']
df.info()
df[view_cols].sort_values('estadia.date')

<class 'pandas.core.frame.DataFrame'>
Index: 707 entries, deh-adrien-grelon to deh-vicent-caun
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   name          707 non-null    object
 1   estadia       707 non-null    object
 2   estadia.date  707 non-null    object
 3   estadia.obs   26 non-null     object
 4   morte         645 non-null    object
 5   morte.date    645 non-null    object
 6   morte.obs     20 non-null     object
dtypes: object(7)
memory usage: 44.2+ KB


Unnamed: 0_level_0,estadia.date,name,estadia,morte.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
deh-matteo-ricci,15800726,Matteo Ricci,Cochim,16100511.0
deh-matteo-ricci,15830800,Matteo Ricci,Cantão,16100511.0
deh-matteo-ricci,15830910,Matteo Ricci,Shiuhing (Tchao-k'ing fou),16100511.0
deh-matteo-ricci,15891000,Matteo Ricci,Shiuchow (Chao-tcheou),16100511.0
deh-lazzaro-cattaneo,15930000,Lazzaro Cattaneo,Macau,16400119.0
deh-joao-rodrigues-tcuzu,15940000,João Rodrigues Tçuzu,Macau,16330801.0
deh-lazzaro-cattaneo,15940000,Lazzaro Cattaneo,Shiuchow (Chao-tcheou),16400119.0
deh-matteo-ricci,15950521,Matteo Ricci,Nanquim,16100511.0
deh-matteo-ricci,15950615,Matteo Ricci,Nanquim,16100511.0
deh-joao-rodrigues-tcuzu,15960000,João Rodrigues Tçuzu,Japão,16330801.0


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [26]:
import pandas as pd
from sqlalchemy import select

from timelink.pandas.group_attributes import display_group_attributes
from timelink.api.models import Person


pd.set_option('display.max_rows',250)

no_show=['dehergne']

with tlnb.db.session() as session:
    stmt = select(Person.id).filter(Person.name.like('M%Ricci'))
    ricci = session.execute(stmt).scalars().all()

dup_ids = list(ricci) #

display_group_attributes(dup_ids,
                             header_cols=['estadia'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

TypeError: entities_with_attribute() got an unexpected keyword argument 'person_info'

In [27]:
dup_ids

['deh-joao-barradas-ref1',
 'deh-lazzaro-cattaneo-ref1',
 'deh-giovanni-cola-niccolo-ref1',
 'deh-sabatino-de-ursis-ref1',
 'deh-jean-fernandes-tchong-ref1',
 'deh-sebastien-fernandes-tchong-ref2',
 'deh-bento-de-gois-ref2',
 'deh-manuel-pereira-yeou-ref1',
 'deh-matteo-ricci',
 'deh-joao-rodrigues-tcuzu-ref2',
 'deh-michele-ruggiere-ref4']