# Using Timelink in notebooks: tutorial

> First time use: follow instructions in the [README.md](README.md) file in this directory.

## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [2]:
import timelink

timelink.version

'1.1.2'

In [4]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()

TimelinkNotebook created a database to store data and a Kleio server to 
translate transcriptions in Kleio notation into data that can be imported
into a database.

To get information about the database and the Kleio server do ```tnlb.print_info()```

In [5]:
tlnb.print_info()

Project name: dehergne
Project home: /Users/jrc/mhk-home/sources/dehergne
Database type: sqlite
Database name: dehergne
Kleio image: timelinkserver/kleio-server
Kleio server token: AF5HBiJo9lXY0KJZMnWu4yxMkvKXGHRD
Kleio server URL: http://127.0.0.1:8088
Kleio server home: /Users/jrc/mhk-home/sources/dehergne
Kleio server container: beautiful_goldstine
Kleio version requested: latest
Kleio server version: 12.4.567 (2024-02-07 13:02:03)
SQLite directory: /Users/jrc/mhk-home/sources/dehergne/database/sqlite
TimelinkNotebook(project_name=dehergne, project_home=/Users/jrc/mhk-home/sources/dehergne, db_type=sqlite, db_name=dehergne, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [6]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,0
1,attributes,0
2,class_attributes,63
3,classes,13
4,entities,13
5,geoentities,0
6,goods,0
7,kleiofiles,0
8,objects,0
9,persons,0


### Show the kleio files available 

In [7]:
kleio_files = tlnb.get_kleio_files()
kleio_files

Unnamed: 0,path,name,modified,status,translated,errors,warnings,import_status,import_errors,import_warnings,import_error_rpt,import_warning_rpt,imported,rpt_url,xml_url
0,sources/dehergne-a.cli,dehergne-a.cli,2024-02-07 12:33:42.038084+00:00,V,2023-10-13 04:46:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-a.rpt,/rest/exports/sources/dehergne-a.xml
1,sources/dehergne-b.cli,dehergne-b.cli,2024-02-07 12:33:42.043349+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-b.rpt,/rest/exports/sources/dehergne-b.xml
2,sources/dehergne-c.cli,dehergne-c.cli,2024-02-07 12:33:42.048912+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-c.rpt,/rest/exports/sources/dehergne-c.xml
3,sources/dehergne-d.cli,dehergne-d.cli,2024-02-07 12:33:42.064520+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-d.rpt,/rest/exports/sources/dehergne-d.xml
4,sources/dehergne-e.cli,dehergne-e.cli,2024-02-07 12:33:42.067451+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-e.rpt,/rest/exports/sources/dehergne-e.xml
5,sources/dehergne-f.cli,dehergne-f.cli,2024-02-07 12:33:42.068823+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-f.rpt,/rest/exports/sources/dehergne-f.xml
6,sources/dehergne-g.cli,dehergne-g.cli,2024-02-27 11:32:30.400622+00:00,T,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-g.rpt,/rest/exports/sources/dehergne-g.xml
7,sources/dehergne-h.cli,dehergne-h.cli,2024-02-07 12:33:42.084048+00:00,V,2023-10-13 04:47:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-h.rpt,/rest/exports/sources/dehergne-h.xml
8,sources/dehergne-i.cli,dehergne-i.cli,2024-02-07 12:33:42.088723+00:00,V,2023-07-16 20:43:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-i.rpt,/rest/exports/sources/dehergne-i.xml
9,sources/dehergne-j.cli,dehergne-j.cli,2024-02-07 12:33:42.091131+00:00,V,2023-07-16 20:43:00+00:00,0,0,N,0,0,0,0,0,/rest/reports/sources/dehergne-j.rpt,/rest/exports/sources/dehergne-j.xml


## Optinally clean translations 

In [20]:
tlnb.kleio_server.translation_clean("", recurse="yes")

['sources/dehergne-z.cli',
 'sources/dehergne-y.cli',
 'sources/dehergne-x.cli',
 'sources/dehergne-w.cli',
 'sources/dehergne-v.cli',
 'sources/dehergne-u.cli',
 'sources/dehergne-t.cli',
 'sources/dehergne-s.cli',
 'sources/dehergne-r.cli',
 'sources/dehergne-q.cli',
 'sources/dehergne-p.cli',
 'sources/dehergne-other.cli',
 'sources/dehergne-o.cli',
 'sources/dehergne-n.cli',
 'sources/dehergne-m.cli',
 'sources/dehergne-locations-1701.cli',
 'sources/dehergne-locations-1644.cli',
 'sources/dehergne-l.cli',
 'sources/dehergne-k.cli',
 'sources/dehergne-j.cli',
 'sources/dehergne-i.cli',
 'sources/dehergne-h.cli',
 'sources/dehergne-g.cli',
 'sources/dehergne-f.cli',
 'sources/dehergne-e.cli',
 'sources/dehergne-d.cli',
 'sources/dehergne-c.cli',
 'sources/dehergne-b.cli',
 'sources/dehergne-a.cli',
 'identifications/mhk_identification_toliveira.cli']


# Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

In [24]:
import logging
logging.basicConfig(level=logging.DEBUG)

tlnb.update_from_sources()

## Check the status of the files

Check the import status of the translated files

I: Imported

E: Imported with error

W: Imported with warnings no errors

N: Not imported

U: Translation updated need to reimport

In [25]:
imported_files_df = tlnb.get_import_status()
imported_files_df[["import_status","import_errors","import_warnings","name","errors","warnings","path"]].sort_values("name")


Unnamed: 0,import_status,import_errors,import_warnings,name,errors,warnings,path
0,I,0,0,dehergne-a.cli,0,0,sources/dehergne-a.cli
1,I,0,0,dehergne-b.cli,0,0,sources/dehergne-b.cli
2,I,0,0,dehergne-c.cli,0,0,sources/dehergne-c.cli
3,I,0,0,dehergne-d.cli,0,0,sources/dehergne-d.cli
4,I,0,0,dehergne-e.cli,0,0,sources/dehergne-e.cli
5,I,0,0,dehergne-f.cli,0,0,sources/dehergne-f.cli
6,I,0,0,dehergne-g.cli,0,0,sources/dehergne-g.cli
7,I,0,0,dehergne-h.cli,0,0,sources/dehergne-h.cli
8,I,0,0,dehergne-i.cli,0,0,sources/dehergne-i.cli
9,I,0,0,dehergne-j.cli,0,0,sources/dehergne-j.cli


### Check the translation report

In [26]:
rpt = tlnb.get_translation_report(imported_files_df, 12)
print(rpt)


KleioTranslator - server version 12.4 - build 567 2024-02-07 13:02:03
27-2-2024 13-49

Processing data file dehergne-locations-1644.cli
-------------------------------------------
Generic Act translation module with geoentities (XML).
     Joaquim Ramos de Carvalho (joaquim@uc.pt) 
** New document: kleio
kleio translation started
Structure: gacto2.str
Prefix: 
Autorel: 
Translation count: 44
Obs: 
** Processing source fonte$dehergne-locations-1644
96: geodesc$deh-chre-1644
*** End of File


Structure file: /kleio-home/structures/sources.str
Structure processing report: /kleio-home/structures/sources.srpt
Structure in JSON: /kleio-home/structures/sources.str.json

Kleio file: /kleio-home/sources/dehergne-locations-1644.cli
Original file: /kleio-home/sources/dehergne-locations-1644.org
Previous version: /kleio-home/sources/dehergne-locations-1644.old
Temp file with ids: /kleio-home/sources/dehergne-locations-1644.ids
** - /kleio-home/sources/dehergne-locations-1644.cli-renamed to- /kleio

### Get the import report for a file

In case there are errors in the import phase check the import report.

In [16]:
rpt = tlnb.get_import_rpt(imported_files_df, rows=[6])
print(rpt)

No errors


## Todo

This as data frame in a single function
* TimelinkNotebook.translate([files_df,paths=List[str], rows=List[int],status="T"])
* TimelinkNotebook.import([files_df,paths=List[str], rows=List[int],status="V"])



# Obter dados
---

# Getting data

### Procurar pessoa, mostrar em notação Kleio

---
### Search for people, show in Kleio notation

In [28]:
from timelink.api.models import Person

show_only=10

with tlnb.db.session() as session:
    persons = session.query(Person).filter(Person.name.like('%Ricci')).all()

    for person in persons[:show_only]:
        print(person.to_kleio())
        print()

referido$Matteo Ricci/m/id=deh-joao-barradas-ref1
  rel$function-in-act/referido/dehergne-notices-b/0
  ls$estadia/rio Kan/15940000

referido$Matteo Ricci/m/id=deh-lazzaro-cattaneo-ref1
  rel$function-in-act/referido/dehergne-notices-c/0
  ls$estadia/Shiuchow (Chao-tcheou)/15940000
  ls$jesuita-tarefa/Toma o hábito de letrado/0
  ls$tarefa/Romaniza os cinco tons do chinês/0

referido$Matteo Ricci/m/id=deh-giovanni-cola-niccolo-ref1
  rel$function-in-act/referido/dehergne-notices-c/0
  ls$chegada@wikidata/"https://www.wikidata.org/wiki/Q14773"/15820807
  ls$chegada/Macau/15820807

referido$Matteo Ricci/m/id=deh-sabatino-de-ursis-ref1
  rel$function-in-act/referido/dehergne-notices-d/0
  rel$institucional/Envia a Pequim/Sabatino De Ursis/deh-sabatino-de-ursis/0

referido$Matteo Ricci/m/id=deh-jean-fernandes-tchong-ref1
  rel$function-in-act/referido/dehergne-notices-f/0
  rel$sociabilidade/Envia em auxílio de Bento de Góis/Jean Fernandes Tchong/deh-jean-fernandes-tchong/16060000

referid

### Search for people with specific attribute

In [30]:
from sqlalchemy import select, and_
from sqlalchemy.orm import joinedload
from timelink.api.models import Person, Attribute

show_only=10

with tlnb.db.session() as session:
    stmt = select(Person).join(Person.attributes.and_(Attribute.the_type == 'jesuita-entrada',
                                                      Attribute.the_value.like('Coimbra')))
    print(stmt)
    persons = session.execute(stmt).scalars().all()

    for person in persons[:show_only]:
        p = tlnb.db.get_person(person.id)
        print(p.to_kleio())


SELECT persons.id, entities.id AS id_1, entities.class, entities.inside, entities.the_order, entities.the_level, entities.the_line, entities.groupname, entities.updated, entities.indexed, persons.name, persons.sex, persons.obs 
FROM entities JOIN persons ON entities.id = persons.id JOIN (entities AS entities_1 JOIN attributes AS attributes_1 ON attributes_1.id = entities_1.id) ON entities.id = attributes_1.entity AND attributes_1.the_type = :the_type_1 AND attributes_1.the_value LIKE :the_value_1
referido$António de Abreu/m/id=deh-antonio-de-abreu-ref1
  rel$function-in-act/referido/dehergne-notices-a/0
  ls$nacionalidade/Portugal/0
  ls$jesuita-cargo/Provincial de Portugal/16270000
  ls$jesuita-cargo/Provincial de Portugal/16290000
  ls$nascimento@wikidata/"https://www.wikidata.org/wiki/Q597"/15610000
  ls$nascimento/Lisboa/15610000
  ls$jesuita-entrada@wikidata/"https://www.wikidata.org/wiki/Q45412"/15760000
  ls$jesuita-entrada/Coimbra/15760000
  ls$dehergne/1-ref1/0/obs=(Distinct d

  print(stmt)




###  Obter um dataframe a partir de atributos

---


###  Get a Dataframe from attributes


In [37]:
from timelink.pandas import entities_with_attribute


# Get list of people with with a certain value in a specific attribute
df = entities_with_attribute(
                    entity_type='person',
                    the_type='jesuita-entrada',
                    the_value='Coimbra',
                    show_elements=['name'],
                    more_attributes=['nacionalidade','nascimento','estadia','morte'],
                    db=tlnb.db,
                    sql_echo=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 181 entries, deh-afonso-aires to deh-tome-pereira
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  181 non-null    object
 1   jesuita-entrada       181 non-null    object
 2   jesuita-entrada.date  181 non-null    object
 3   jesuita-entrada.obs   5 non-null      object
 4   nacionalidade         181 non-null    object
 5   nacionalidade.date    181 non-null    object
 6   nacionalidade.obs     0 non-null      object
 7   nascimento            181 non-null    object
 8   nascimento.date       181 non-null    object
 9   nascimento.obs        0 non-null      object
 10  estadia               176 non-null    object
 11  estadia.date          176 non-null    object
 12  estadia.obs           17 non-null     object
 13  morte                 172 non-null    object
 14  morte.date            172 non-null    object
 15  morte.obs        

In [38]:
df.head(5)

Unnamed: 0_level_0,name,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.obs,nacionalidade,nacionalidade.date,nacionalidade.obs,nascimento,nascimento.date,nascimento.obs,estadia,estadia.date,estadia.obs,morte,morte.date,morte.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
deh-afonso-aires,Afonso Aires,Coimbra,16490000,,Portugal,0,,Santarém,16300000,,,,,"[No mar, depois do Cabo da Boa Esperança]",16570000,
deh-andre-carneiro,André Carneiro,Coimbra,16640325,,Portugal,0,,Porto,16500000,,"Chinchinim, Goa",17050000.0,,Goa,17090612,
deh-antao-dantas,Antão Dantas,Coimbra,16930310,,Portugal,0,,Porto,16741102,,Macau,17050000.0,,Goa,17210522,
deh-antao-dantas,Antão Dantas,Coimbra,16930310,,Portugal,0,,Porto,16741102,,Manila,17070000.0,,Goa,17210522,
deh-antao-dantas,Antão Dantas,Coimbra,16930310,,Portugal,0,,Porto,16741102,,Macau,17120000.0,,Goa,17210522,




###  Remover colunas sem valores

---



###  Remove empty columns

In [34]:
df.dropna(how='all', axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 43 entries, deh-pedro-de-alcacova to deh-inacio-pires
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  43 non-null     object
 1   jesuita-entrada       43 non-null     object
 2   jesuita-entrada.date  43 non-null     object
 3   jesuita-entrada.obs   1 non-null      object
 4   nacionalidade         43 non-null     object
 5   nacionalidade.date    43 non-null     object
 6   nascimento            43 non-null     object
 7   nascimento.date       43 non-null     object
 8   morte                 36 non-null     object
 9   morte.date            36 non-null     object
 10  morte.obs             2 non-null      object
dtypes: object(11)
memory usage: 5.1+ KB


In [36]:
df.head(15)

Unnamed: 0_level_0,name,jesuita-entrada,jesuita-entrada.date,jesuita-entrada.obs,nacionalidade,nacionalidade.date,nascimento,nascimento.date,morte,morte.date,morte.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
deh-pedro-de-alcacova,Pedro de Alcáçova,Coimbra,15420000,saiu a primeira vez,Portugal,0,?,15230000,?,15790000.0,
deh-belchior-nunes-barreto,Belchior Nunes Barreto,Coimbra,15430311,,Portugal,0,Porto,15190000,Goa,15710810.0,
deh-belchior-miguel-carneiro-leitao,Belchior Miguel Carneiro Leitão,Coimbra,15430425,,Portugal,0,Coimbra,15190000,Macau,15830819.0,
deh-francisco-perez,Francisco Pérez,Coimbra,15440125,,Espanha,0,"Villanueva de Barcarrota, diocese de Badajoz",15140000,Negapatam,15830222.0,
deh-tiburcio-de-quadros,Tibúrcio de Quadros,Coimbra,15440418,,Portugal,0,Santarém,15280000,Goa,15721121.0,
deh-goncalo-alvares,Gonçalo Álvares,Coimbra,15490101,,Portugal,0,"Vila Viçosa, diocese de Évora",15270000,[A caminho do Japão],15730700.0,morre afogado com quatro companheiros no camin...
deh-cristovao-da-costa,Cristóvão da Costa,Coimbra,15500103,,Portugal,0,?,15290000,Macau,15820000.0,
deh-joao-rodrigues-de-sainan,João Rodrigues de Sainan,Coimbra,15550000,,Portugal,0,?,15310000,,,
deh-pedro-martins,Pedro Martins,Coimbra,15560525,,Portugal,0,Coimbra,15420000,"[No mar, a caminho de Malaca]",15980213.0,
deh-nicolau-pimenta,Nicolau Pimenta,Coimbra,15620502,,Portugal,0,Santarém,15461206,Goa (Colégio Novo),16130306.0,



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [52]:
# create a column with the index values which are the id numbers
df['id'] = df.index.values

col = 'estadia' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = df.groupby(col).agg({'id':'count','estadia.date':['min', 'max']
                                 })

df_totals.sort_values(('id','count'),ascending= False).head(30)

Unnamed: 0_level_0,id,estadia.date,estadia.date
Unnamed: 0_level_1,count,min,max
estadia,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Macau,27,15630829,17480000
Goa,14,0,17270000
Cantão,11,15550000,17190000
Pequim,8,16110503,17440000
Nanquim,7,0,17140000
Japão,5,15520000,16251217
Hainan,5,17110000,17250000
Malaca,5,0,15720000
Índia,5,0,17220000
Sião,4,0,17450000



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [None]:
from timelink.pandas import attribute_values

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)


In [None]:
df_totals.head(10)


Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,31,15550000,17581126
Coimbra,6,15420000,16770701
Goa,3,15480502,16660000
Lisboa,3,16680000,17460223
Avignon,2,17350731,17370927
Paris,2,16700926,16771029
Saragoça,2,16740613,16771115
Évora,2,15710304,17230000
Alcalá de Henares,1,15580413,15580413
Bolonha,1,17250728,17250728


#### Filtrar por datas

---

#### Filter by dates



In [None]:
df_totals = attribute_values('grau',dates_between=('1535','1600'),db=tlnb.db)

In [None]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bacharel,9,1554-07-19,1593-06-09
Formatura,6,1574-07-24,1593-05-12
Bacharel em Artes,3,1574-03-14,1588-03-12
Licenciado,3,1574-06-03,1597-05-06
Doutor,2,1560-12-22,1589-10-27
Licenciado em Artes,2,1574-05-15,1589-01-13


## Visualizar registos

---

## View records





### Search people by name, results in dataframe

In [None]:
from timelink.pandas import pname_to_df
people = pname_to_df(name='%Abegão%',
                     db=tlnb.db,)
people.head()

Unnamed: 0,id,name,sex,obs
0,140338,Estevão de Matos Abegão,m,"""""""\n\n Id: 140338\n Cód..."
1,140339,Jerónimo Rodrigues Abegão,m,"""""""\n\n Id: 140339\n Cód..."
2,140339-pai,Jerónimo Rodrigues Abegão,m,


### Visualizar uma pessoa

---

### View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [None]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = '140338'
pdf = person_attributes([id],db=tlnb.db)  # note id in a list
pdf[['the_date','the_type','the_value','attr_obs']].sort_values(['the_date','the_type'])

Unnamed: 0_level_0,the_date,the_type,the_value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140338,1728-10-01,faculdade,Cânones,
140338,1728-10-01,faculdade.ano,Cânones.1728,
140338,1728-10-01,instituta,1728-10-01,01.10.1728 1728-10-01
140338,1728-10-01,instituta.ano,1728,01.10.1728 1728-10-01
140338,1728-10-01,naturalidade,Elvas,
140338,1728-10-01,naturalidade.ano,Elvas.1728,
140338,1728-10-01,nome,Estevão de Matos Abegão,
140338,1728-10-01,nome-geografico,Elvas,
140338,1728-10-01,nome-geografico.ano,Elvas.1728,
140338,1728-10-01,nome.apelido,Matos Abegão,


In [None]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Estevão de Matos Abegão/m/id=140338/obs="""
      """

                  Id: 140338
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000002

                  Nome        : Estevão de Matos Abegão
                  Data inicial: 1728-10-01
                  Data final  : 1733-07-03
                  Filiação: Lourenço de Matos Pereira
                  Naturalidade: Elvas
                  Faculdade: Cânones

                  Matrícula(s): 01.10.1729
                  01.10.1730
                  01.10.1732

                  Instituta: 01.10.1728

                  Bacharel: 03.07.1733
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000002""/2021-05-17
  atr$data-do-registo/2021-05-17/2021-05-17
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140338""/2021-05-17
  ls$uc.entrada/1728-10-01/1728-10-01
  ls$uc.saida/1733-07-03/1733-07-03
  ls$uc.entrada.ano/1728/1728-1

#### Pessoas com determinados atributos, em colunas

Permite criar uma tabela com vários attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

Generate a table of people with attributes in columns

In [None]:
# Get list of people with with a certain value in a specific attribute
from timelink.pandas import entities_with_attribute

id = 140338
# id = '316297'  # João Pedro Ribeiro
# id = '234295'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='faculdade',  # we need a base attribute
                    the_value='Teologia',
                    show_elements=['name'],
                    more_attributes=['naturalidade','uc.entrada','uc.saida'],
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','naturalidade','uc.entrada','faculdade','uc.saida','faculdade.obs']
df.info()
df[view_cols].sort_values(['name','uc.entrada'])

<class 'pandas.core.frame.DataFrame'>
Index: 29 entries, 140719 to 269145
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               29 non-null     object
 1   faculdade          29 non-null     object
 2   faculdade.date     29 non-null     object
 3   faculdade.obs      3 non-null      object
 4   naturalidade       22 non-null     object
 5   naturalidade.date  22 non-null     object
 6   naturalidade.obs   0 non-null      object
 7   uc.entrada         29 non-null     object
 8   uc.entrada.date    29 non-null     object
 9   uc.entrada.obs     0 non-null      object
 10  uc.saida           29 non-null     object
 11  uc.saida.date      29 non-null     object
 12  uc.saida.obs       0 non-null      object
dtypes: object(13)
memory usage: 4.2+ KB


Unnamed: 0_level_0,name,naturalidade,uc.entrada,faculdade,uc.saida,faculdade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
140511,Abílio Ferreira Gomes de Abreu,"Cuenca, Coimbra",1854-10-02,Teologia,1856-07-09,"""Teologia#29"""
140516,Agostinho Machado de Abreu,Gesteira,1721-10-01,Teologia,1726-10-01,
140375,Alexandre de Brito e Sousa Abranches,Santa Ovaia,1863-10-12,Teologia,1867-06-12,
140534,Antão de Abreu,,1638-07-20,Teologia,1639-11-20,
140350,António Camacho Guerreiro de Brito e Aboim,Mértola,1770-10-01,Teologia,1770-10-01,
140545,António Dinis de Abreu,Midões,1695-12-09,Teologia,1699-10-01,
140593,António Lopes Coelho de Abreu,Azóia,1861-10-11,Teologia,1864-06-20,
140552,António de Abreu,Sertã,1603-11-03,Teologia,1605-10-26,
140553,António de Abreu,Setúbal,1626-11-19,Teologia,1626-11-19,
140395,Augusto Cândido de Abranches,Moçambique,1852-10-12,Teologia,1861-05-27,


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [9]:
import pandas as pd
from timelink.pandas.group_attributes import display_group_attributes, group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome.apelido','nome.primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','matricula.ano','nome-geografico','instituta.ano',
         'uc.entrada.ano','uc.saida.ano']

dup_ids = ['140490','140503',]

display_group_attributes(dup_ids,
                            entity_type='person',
                            header_elements=['name'],
                            #header_attributes=['nome.nota','uc.entrada'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_type','the_value','the_date',],
                             db=tlnb.db)

Unnamed: 0,name,the_type,the_type.date,the_type.obs,id
0,Manuel de Abrantes,1705-10-01,1705-10-01,,140490
1,Manuel de Abrantes,1706-10-01,1706-10-01,,140503


Unnamed: 0,name,the_type,the_value,the_date,attr_obs,id
0,Manuel de Abrantes,faculdade,Teologia,1705-10-01,,140490
1,Manuel de Abrantes,faculdade,Teologia,1706-10-01,,140503
2,Manuel de Abrantes,matricula,Teologia,1705-10-01,01.10.1705,140490
3,Manuel de Abrantes,matricula,Teologia,1706-10-01,01.10.1706,140503
4,Manuel de Abrantes,matricula,Teologia,1707-10-01,01.10.1707,140503
5,Manuel de Abrantes,matricula,Teologia,1708-10-01,01.10.1708,140503
6,Manuel de Abrantes,naturalidade,Vila de Matos,1706-10-01,,140503
7,Manuel de Abrantes,nome,Manuel de Abrantes,1705-10-01,,140490
8,Manuel de Abrantes,nome,Manuel de Abrantes,1706-10-01,,140503
9,Manuel de Abrantes,nome.nota,padre,1705-10-01,,140490
