
**[EN]** English only



# Cross references

The digital version of the FA carries over a cross-reference mechanism from the card catalog. As was usual at the time, extra cards were inserted into the catalog to guide users searching for name variants of the main entry. These cards have a “base” name, followed the word “vide” and a “name expression”. 

This notebook analyses the cross-reference information in the FA.




To run this file: follow instructions in the README.md file in this directory.



#### This is a very long file. In VS CODE use the outline view in the left pane to quickly jump to sections

##  Setup

In [1]:
from timelink.api.database import TimelinkDatabase
from ucalumni.config import default_db_url

print(f"Creating TimelinkDatabase instance from {default_db_url}")
db = TimelinkDatabase(db_url=default_db_url)

Creating TimelinkDatabase instance from sqlite:///../database/sqlite3/fauc2.db?check_same_thread=False


Prepare a dataframe to collect the results of cross reference analysis


In [2]:
import pandas as pd

columns = ['data','sequential','random']
vars = ['vide','vide_plus',
        'see','see_matched','see_matched_ok','nodate_novide',
        'aka','nodate','nodate_novide','aka_matched','aka_matched_ok',
        'records_matched','records_matched_ok','records_error',
        'matched_pairs','matched_pairs_ok','records_see_aka','records_aka_see','records_aka_aka', 'records_see_see',
        'records_transitive','records_asymmetric']

match_info = pd.DataFrame(index=vars,columns=columns)
match_records = dict([(k,dict.fromkeys(columns)) for k in vars])
match_info.sort_index(inplace=True)


## Get records which contain a "see" note (vide)

Note that records with more than one faculty and/or more than one geographic name 
generate more than one line. So the number of lines in the data frame is greater
than the number of records.

**To obtain the real number of records in a data frame it is necessary to count the number of unique record identifiers (six-digit numbers) in the data frame index.**

> nvide = len(vide.index.unique())

In [3]:
from timelink.pandas import entities_with_attribute


# Get list of people with attribute nome-vide and add nome-geografico, nome-pai, entry date and faculdade
vide = entities_with_attribute(
                    the_type='nome-vide',
                    entity_type='person',
                    show_elements=['name'],
                    more_attributes=['nome-geografico','faculdade','nome-pai','uc-entrada','uc-saida'],
                    sql_echo=False,
                    db=db)
# drop columns that are not usefull
vide.drop(['nome-vide.date','nome-vide.obs','nome-geografico.date','nome-geografico.obs','nome-pai.date','nome-pai.obs','uc-entrada.date','uc-entrada.obs'],axis=1, inplace=True)
nvide = len(vide.index.unique())
print("Number of records with 'vide' cross reference:'",nvide)
match_info.loc['vide','data'] = nvide
match_records['vide']['data'] = vide.index.unique()
print()
print(vide.info())



Number of records with 'vide' cross reference:' 8626

<class 'pandas.core.frame.DataFrame'>
Index: 9286 entries, 140351 to 285250
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             9286 non-null   object
 1   nome-vide.type   9286 non-null   object
 2   nome-vide        9286 non-null   object
 3   nome-vide.line   9286 non-null   int64 
 4   nome-vide.level  9286 non-null   int64 
 5   nome-geografico  8786 non-null   object
 6   faculdade        4794 non-null   object
 7   faculdade.date   4794 non-null   object
 8   faculdade.obs    4794 non-null   object
 9   nome-pai         3485 non-null   object
 10  uc-entrada       9286 non-null   object
 11  uc-saida         9286 non-null   object
 12  uc-saida.date    9286 non-null   object
 13  uc-saida.obs     9286 non-null   object
dtypes: int64(2), object(12)
memory usage: 1.3+ MB
None


In [4]:

print()
print("Check a few:")
vide.head(5)


Check a few:


Unnamed: 0_level_0,name,nome-vide.type,nome-vide,nome-vide.line,nome-vide.level,nome-geografico,faculdade,faculdade.date,faculdade.obs,nome-pai,uc-entrada,uc-saida,uc-saida.date,uc-saida.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
140351,António Pestana Guerreiro Camacho Aboim,nome-vide,António José Pestana da Silva,524,2,Casalinho,Cânones,0000-00-00,Faculdade inferida,,0000-00-00,0000-00-00,0000-00-00,
140357,Gonçalo de Aboim,nome-vide,Brito,731,2,Santarém,,,,,0000-00-00,0000-00-00,0000-00-00,
140377,André de Sequeira Abranches,nome-vide,Sequeira,1380,2,Várzea de Meruge,,,,,0000-00-00,0000-00-00,0000-00-00,
268681,José Alexandre de Abrantes,nome-vide,Correia,2158,2,Cabanas,,,,,0000-00-00,0000-00-00,0000-00-00,
140406,Domingos de Abranches,nome-vide,Tavares,2355,2,Lourosa,Cânones,0000-00-00,Cânones,Feliciano de Abranches,0000-00-00,0000-00-00,0000-00-00,


#### Problems in processing 'vide' notes with multiple "vide"

There are a few cases in the form  "vide _...name..._ e vide _...name..._"

1. https://pesquisa.auc.uc.pt/details?id=141274
2. https://pesquisa.auc.uc.pt/details?id=147377
3. https://pesquisa.auc.uc.pt/details?id=147659
4. https://pesquisa.auc.uc.pt/details?id=150350
5. https://pesquisa.auc.uc.pt/details?id=150562
6. https://pesquisa.auc.uc.pt/details?id=152472
7. https://pesquisa.auc.uc.pt/details?id=189389
8. https://pesquisa.auc.uc.pt/details?id=190076
9. https://pesquisa.auc.uc.pt/details?id=191599
10. https://pesquisa.auc.uc.pt/details?id=192039
11. https://pesquisa.auc.uc.pt/details?id=196728
12. https://pesquisa.auc.uc.pt/details?id=197167
13. https://pesquisa.auc.uc.pt/details?id=207991
14. https://pesquisa.auc.uc.pt/details?id=209208
15. https://pesquisa.auc.uc.pt/details?id=216619
16. https://pesquisa.auc.uc.pt/details?id=244099
17. https://pesquisa.auc.uc.pt/details?id=248624
18. https://pesquisa.auc.uc.pt/details?id=266150

19. https://pesquisa.auc.uc.pt/details?id=130281 
      * Nuno da Câmara	is tricky, because it combines note and vide, and the vide part has two names Nuno da Câmara (D.), 
        vide Nuno Casimiro da Câmara e Nuno José da Câmara it links with  130516 and 130517
        Handling these requires changing the grammar rules, scheduled for next version.

### Determine the type of cross reference

__Forward cross references (“see”)__
* Almost empty records with a name with “vide”
* A few with more than one (…vide... e vide…)
* No dates (empty “UnitDateInitial” field)
* Other than the name:
    * 93% place of birth
    * 27% father’s name
    * 23% faculty 
  
__Back cross references (“also knows as/aka”)__
* Normal records with “vide” in the name.
* Dates (valid “UnitDateInitial” field)
* Contain all types of information:
    * 97% place of birth
    * 53% father’s name
    * 99% faculty
    * degrees, enrolment, and so on.
* Can be matched with “see” records.
* These records are the non preferred form of the name and should link to a preferred form.



#### "See"  or forward cross-references: "vide" and no dates

These records are the non preferred form of the name and should link to a preferred form.

In [5]:

zdate_filter = vide['uc-entrada'] == '0000-00-00'
vide.loc[zdate_filter,'rec_type'] = 'see'

see_vide = vide[zdate_filter]
nsee_vide = len(see_vide.index.unique())
match_info.loc['see','data'] = nsee_vide
match_records['see']['data'] = list(see_vide.index.unique())
print("Number of vide records with zero dates (forward cross references):",nsee_vide)

nsee_vide_geo = len(see_vide[see_vide['nome-geografico'].notnull()].index.unique())
match_info.loc['see_geo','data'] = nsee_vide_geo
print(f"    of which {nsee_vide_geo} with place of birth {nsee_vide_geo/nsee_vide:.2%}")

nsee_vide_pai = len(see_vide[see_vide['nome-pai'].notnull()].index.unique())
match_info.loc['see_pai','data'] = nsee_vide_pai
print(f"    of which {nsee_vide_pai} with father's name  {nsee_vide_pai/nsee_vide:.2%}")

nsee_vide_fac = len(see_vide[see_vide['faculdade'].notnull()].index.unique())
match_info.loc['see_fac','data'] = nsee_vide_fac
print(f"    of which {nsee_vide_fac} with faculty        {nsee_vide_fac/nsee_vide:.2%}")

print()
base_vide_cols=['nome-vide','nome-geografico','faculdade','nome-pai']


Number of vide records with zero dates (forward cross references): 5580
    of which 5170 with place of birth 92.65%
    of which 1515 with father's name  27.15%
    of which 1322 with faculty        23.69%



In [6]:
# Show some
see_vide.head()

Unnamed: 0_level_0,name,nome-vide.type,nome-vide,nome-vide.line,nome-vide.level,nome-geografico,faculdade,faculdade.date,faculdade.obs,nome-pai,uc-entrada,uc-saida,uc-saida.date,uc-saida.obs,rec_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
140351,António Pestana Guerreiro Camacho Aboim,nome-vide,António José Pestana da Silva,524,2,Casalinho,Cânones,0000-00-00,Faculdade inferida,,0000-00-00,0000-00-00,0000-00-00,,see
140357,Gonçalo de Aboim,nome-vide,Brito,731,2,Santarém,,,,,0000-00-00,0000-00-00,0000-00-00,,see
140377,André de Sequeira Abranches,nome-vide,Sequeira,1380,2,Várzea de Meruge,,,,,0000-00-00,0000-00-00,0000-00-00,,see
268681,José Alexandre de Abrantes,nome-vide,Correia,2158,2,Cabanas,,,,,0000-00-00,0000-00-00,0000-00-00,,see
140406,Domingos de Abranches,nome-vide,Tavares,2355,2,Lourosa,Cânones,0000-00-00,Cânones,Feliciano de Abranches,0000-00-00,0000-00-00,0000-00-00,,see


#### "Aka"  or back references: records with "vide" other types of information

These are the records that should be linked back to zero date vide records.

There are too few of them!

In [7]:
# count vide record with a proper (non-zero) date
aka_filter = vide['uc-entrada'] != '0000-00-00'
vide.loc[aka_filter,'rec_type'] = 'aka'
aka_vide = vide[aka_filter]

naka_vide = len(set(aka_vide.index.values))
match_info.loc['aka','data'] = naka_vide
print("Number of records with vide and proper date (aka):",naka_vide)
match_records['aka']['data'] = list(aka_vide.index.unique())

naka_vide_geo = len(aka_vide[aka_vide['nome-geografico'].notnull()].index.unique())
match_info.loc['aka_geo','data'] = naka_vide_geo
print(f"    of which {naka_vide_geo} with place of birth {naka_vide_geo/naka_vide:.2%}")

naka_vide_pai = len(aka_vide[aka_vide['nome-pai'].notnull()].index.unique())
match_info.loc['aka_pai','data'] = naka_vide_pai
print(f"    of which {naka_vide_pai} with father's name  {naka_vide_pai/naka_vide:.2%}")

naka_vide_fac = len(aka_vide[aka_vide['faculdade'].notnull()].index.unique())
match_info.loc['aka_fac','data'] = naka_vide_fac
print(f"    of which {naka_vide_fac} with faculty        {naka_vide_fac/naka_vide:.2%}")

print("Number of records with vide and zero date (see):",nsee_vide)
# we subtract
print("Number of zero date records in excess of dated vide records         :", nsee_vide-naka_vide)
match_info.sort_index(inplace=True)

Number of records with vide and proper date (aka): 3046
    of which 2958 with place of birth 97.11%
    of which 1617 with father's name  53.09%
    of which 3020 with faculty        99.15%
Number of records with vide and zero date (see): 5580
Number of zero date records in excess of dated vide records         : 2534


In [8]:
# Show some
aka_vide.head()

Unnamed: 0_level_0,name,nome-vide.type,nome-vide,nome-vide.line,nome-vide.level,nome-geografico,faculdade,faculdade.date,faculdade.obs,nome-pai,uc-entrada,uc-saida,uc-saida.date,uc-saida.obs,rec_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
211389,Aires Gomes de Sá,nome-vide,Gomes,231,2,Coimbra,Cânones,1537-06-07,Faculdade corrigida,Mendo Gomes,1537-06-07,1559-06-29,1559-06-29,,aka
211389,Aires Gomes de Sá,nome-vide,Gomes,231,2,Coimbra,Leis,1537-06-07,Faculdade corrigida,Mendo Gomes,1537-06-07,1559-06-29,1559-06-29,,aka
205306,António Mendes,nome-vide,Neto,1068,2,Santarém,Cânones,1540-04-11,Cânones,,1540-04-11,1549-07-10,1549-07-10,,aka
212454,Simão de Sá,nome-vide,Pereira,13336,2,Coimbra,Cânones,1540-10-01,Cânones,Rui de Sá,1540-10-01,1558-00-00,1558-00-00,,aka
206063,Jorge de Vabo,nome-vide,Ribeiro,11,2,Porto,Cânones,1540-10-07,Cânones,Jacome Rodrigo,1540-10-07,1552-07-03,1552-07-03,,aka


### Look at other records with no dates, even if they have no "vide" expression

To test if all zero date records are part of the cross reference scheme.
Maybe the "vide" expression was missed during input in the database.

Frist collect all records th zero date.

In [18]:
from timelink.pandas import entities_with_attribute
from timelinknb import Session

with Session() as session:
    session.begin()

# Get list of people with no start-date and add nome-geografico, nome-pai, nome-vide and faculdade
zero_date = entities_with_attribute(
                    the_type='uc-entrada',
                    entity_type='person',
                    the_value='0000-00-00',
                    show_elements=['name'],
                    more_attributes=['uc-saida',
                                     'nome-vide',
                                     'nome-geografico',
                                     'nome-pai',
                                     'faculdade',
                                     'data-registo',
                                     ],
                    sql_echo=False,
                    db=db)
zero_date.drop(['nome-vide.date','nome-vide.obs','nome-geografico.date','nome-geografico.obs','nome-pai.date','nome-pai.obs','uc-entrada.date','uc-entrada.obs'],axis=1, inplace=True)
nzero_date = len(set(zero_date.index.unique()))
print()
print("Total number of rows with zero date:", len(zero_date))
print("Total number of records with zero date:", nzero_date)
match_info.loc['nodate','data'] = nzero_date

base_vide_cols=['nome-vide','nome-geografico','faculdade','nome-pai']
zero_date[base_vide_cols].count(axis=0)



Total number of rows with zero date: 6679
Total number of records with zero date: 6284


nome-vide          5860
nome-geografico    6151
faculdade          2135
nome-pai           1841
dtype: int64

#### List of records with no date and no "vide": are they part of the cross references?

These are zero date records with no vide information, 
which means that there are no name transformations 
to be used in searching of matching records.
But since they have no dates they might be part of 
the cross-reference set.

In late April 2022 there were around 200 records.


In [19]:

# From the zero date set filter those with no "vide"
zd_no_vide = zero_date[zero_date['nome-vide'].isnull()]
nzd_no_vide = len(set(zd_no_vide.index.values))
print()
print("Number of records with zero date and no 'vide':",nzd_no_vide)




Number of records with zero date and no 'vide': 704


In [22]:
zd_no_vide.sample(10)

Unnamed: 0_level_0,name,uc-entrada.type,uc-entrada,uc-entrada.line,uc-entrada.level,uc-saida,uc-saida.date,uc-saida.obs,nome-vide,nome-geografico,nome-pai,faculdade,faculdade.date,faculdade.obs,data-registo
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
290132,João José de Oliveira Frazão Castelo Branco,uc-entrada,0000-00-00,14202,2,0000-00-00,0000-00-00,,,Guarda,João de Oliveira Frazão Castelo Branco,Leis,0000-00-00,Leis,
178630,Francisco Gomes,uc-entrada,0000-00-00,2279,2,1555-02-25,1555-02-25,,,Évora,,Teologia,0000-00-00,Teologia,
160092,Fernão Fontes,uc-entrada,0000-00-00,7686,2,1557-07-22,1557-07-22,,,Crato,,Cânones,0000-00-00,Faculdade inferida,
128371,João Cabral,uc-entrada,0000-00-00,623,2,0000-00-00,0000-00-00,,,Torres Vedras,António Teixeira,,,,
175791,João Galvão,uc-entrada,0000-00-00,6129,2,1604-10-11,1604-10-11,,,Torres Novas,Francisco Botelho,Teologia,0000-00-00,Teologia,
159662,Matias da Fonseca,uc-entrada,0000-00-00,2608,2,1561-07-31,1561-07-31,,,Coimbra,,Cânones,0000-00-00,Faculdade inferida,
175222,João de Melo,uc-entrada,0000-00-00,12664,2,1540-01-16,1540-01-16,,,Montemor,André Juzarte,Leis,0000-00-00,Leis,
131036,Rui Ferreira,uc-entrada,0000-00-00,9382,2,0000-00-00,0000-00-00,,,Vila Real,,Leis,0000-00-00,Faculdade corrigida,
251029,Álvaro Monteiro,uc-entrada,0000-00-00,5579,2,0000-00-00,0000-00-00,,,Porto,,,,,
179053,Pedro Gomes,uc-entrada,0000-00-00,12847,2,1560-05-14,1560-05-14,,,Évora,,Leis,0000-00-00,Faculdade inferida,


#### Check if the unit dates were left blank by mistake

If a record with no unit dates contains neverthless dated information, 
then it would be possible to register the unit dates from that information,
and the blank unit dates are an error.

First collect all the attributes available for those "zero date no vide" records.

In [23]:
from timelink.pandas import group_attributes

zdnv_group = group_attributes(set(zd_no_vide.index.values),
                              entity_type='person',
                              show_elements=['name'],
                                             db=db)

In [25]:
zdnv_group.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14213 entries, 128371 to 357161
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       14213 non-null  object
 1   the_type   14213 non-null  object
 2   the_value  14213 non-null  object
 3   the_date   14213 non-null  object
 4   attr_obs   14213 non-null  object
dtypes: object(5)
memory usage: 666.2+ KB


Next search for attributes with valid dates in that set.

In [None]:
zdnv_with_dates = (zdnv_group['the_date']>'0000-00-00') & (zdnv_group['the_date'] < '1917-12-31')
false_zd = zdnv_group[zdnv_with_dates]
nfalse_zd = len(set(false_zd.index.values))
print("Number of records with dates in attributes but not unit dates:",nfalse_zd)
print("These are not cross-reference records, just records with unfilled unit dates")


Number of records with dates in attributes but not unit dates: 440
These are not cross-reference records, just records with unfilled unit dates
Sample:


Unnamed: 0_level_0,name,the_date,the_type,the_value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
130963,Pedro Ferreira,1564-07-31,uc-saida,1564-07-31,
130963,Pedro Ferreira,1564-07-31,uc-saida.ano,1564,
130963,Pedro Ferreira,1560-07-16,grau,Bacharel em Cânones,Exame Bacharel e grau em bacharel em Cânones ...
130963,Pedro Ferreira,1560-07-16,grau.ano,Bacharel em Cânones.1560,Exame Bacharel e grau em bacharel em Cânones ...
131036,Rui Ferreira,1556-06-16,grau,Bacharel em Cânones,Exame para Bacharel em cânones e grau de Bacha...
131036,Rui Ferreira,1556-06-16,grau.ano,Bacharel em Cânones.1556,Exame para Bacharel em cânones e grau de Bacha...
131475,Diogo Fialho,1656-03-30,grau,Licenciado em Artes,Licenciado em Artes 30.03.1656: Atos e Graus L...
131475,Diogo Fialho,1656-03-30,grau.ano,Licenciado em Artes.1656,Licenciado em Artes 30.03.1656: Atos e Graus L...
131544,Manuel Fialho,1547-05-00,grau,Bacharel em Leis,"Bacharel em Leis, em Maio de 1547"
131544,Manuel Fialho,1547-05-00,grau.ano,Bacharel em Leis.1547,"Bacharel em Leis, em Maio de 1547"


In [27]:

print("Sample:")
false_zd.head(10)[['name','the_date','the_type','the_value','attr_obs']]

Sample:


Unnamed: 0_level_0,name,the_date,the_type,the_value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
130963,Pedro Ferreira,1564-07-31,uc-saida,1564-07-31,
130963,Pedro Ferreira,1564-07-31,uc-saida.ano,1564,
130963,Pedro Ferreira,1560-07-16,grau,Bacharel em Cânones,Exame Bacharel e grau em bacharel em Cânones ...
130963,Pedro Ferreira,1560-07-16,grau.ano,Bacharel em Cânones.1560,Exame Bacharel e grau em bacharel em Cânones ...
131036,Rui Ferreira,1556-06-16,grau,Bacharel em Cânones,Exame para Bacharel em cânones e grau de Bacha...
131036,Rui Ferreira,1556-06-16,grau.ano,Bacharel em Cânones.1556,Exame para Bacharel em cânones e grau de Bacha...
131475,Diogo Fialho,1656-03-30,grau,Licenciado em Artes,Licenciado em Artes 30.03.1656: Atos e Graus L...
131475,Diogo Fialho,1656-03-30,grau.ano,Licenciado em Artes.1656,Licenciado em Artes 30.03.1656: Atos e Graus L...
131544,Manuel Fialho,1547-05-00,grau,Bacharel em Leis,"Bacharel em Leis, em Maio de 1547"
131544,Manuel Fialho,1547-05-00,grau.ano,Bacharel em Leis.1547,"Bacharel em Leis, em Maio de 1547"


We remove those records from the possible cross reference aditions.

In late April 2022 there were 60 of such records. They are not cross references.

Removing those from the zero dated, no "vide" records, around 140 remain.


In [28]:
zd_no_vide_clean = zd_no_vide.drop(false_zd.index)
zd_no_vide_clean['rec_type'] = 'see'
nzd_no_vide_clean = len(zd_no_vide_clean.index.unique())
print()
print("Number of records with zero date and no 'vide' (cleaned):",nzd_no_vide_clean)
match_info.loc['nodate_novide','data'] = nzd_no_vide_clean
match_records['nodate_novide','data'] = zd_no_vide_clean.index.unique()
print("Information contained in these records:")
base_vide_cols=['nome-vide','nome-geografico','faculdade','nome-pai','uc-entrada']
zd_no_vide_clean[base_vide_cols].count(axis=0)


Number of records with zero date and no 'vide' (cleaned): 264
Information contained in these records:


nome-vide            0
nome-geografico    218
faculdade          207
nome-pai            88
uc-entrada         284
dtype: int64

Lets see what they look like

In [29]:
zd_no_vide_clean[['name','nome-vide','nome-pai','nome-geografico','faculdade','faculdade.obs','uc-entrada','rec_type']].head().sort_values('name').fillna(" ")

Unnamed: 0_level_0,name,nome-vide,nome-pai,nome-geografico,faculdade,faculdade.obs,uc-entrada,rec_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
140674,Diogo Gomes de Abreu,,João Gomes,Nisa,,,0000-00-00,see
140459,"Manuel Corte-Real Abranches, vide Corte-Real",,,Serpa,Cânones,Cânones,0000-00-00,see
140367,Manuel de Vargas de Aboim,,António de Aboim,Coimbra,Cânones,Cânones,0000-00-00,see
141424,Martim Afonso,,Pedro Mexia,Campo Maior,Leis,Leis,0000-00-00,see
270344,Pedro de Almeida,,,,,,0000-00-00,see


#### Add zero date records with no 'vide' to records to be matched

We join the zero date no 'vide' records to the vide records,.

We assume that zero date records are also "see also" records which were not flagged as 'vide' due to input variations.

But we know this is not always the case, some of the zero date records are normal records where the unit dates were not recorded for some reason.

In [30]:
import pandas as pd

vide_plus = pd.concat([vide,zd_no_vide_clean])
nvide_plus = len(vide_plus.index.unique())
match_info.loc['vide_plus','data'] = nvide_plus
match_records['vide_plus']['data'] = vide_plus.index.unique()
print(f"Number of unique records involved in the cross references: {nvide_plus}")
vide_plus.info()

Number of unique records involved in the cross references: 8890
<class 'pandas.core.frame.DataFrame'>
Index: 9570 entries, 140351 to 218480
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              9570 non-null   object 
 1   nome-vide.type    9286 non-null   object 
 2   nome-vide         9286 non-null   object 
 3   nome-vide.line    9286 non-null   float64
 4   nome-vide.level   9286 non-null   float64
 5   nome-geografico   9004 non-null   object 
 6   faculdade         5001 non-null   object 
 7   faculdade.date    5001 non-null   object 
 8   faculdade.obs     5001 non-null   object 
 9   nome-pai          3573 non-null   object 
 10  uc-entrada        9570 non-null   object 
 11  uc-saida          9570 non-null   object 
 12  uc-saida.date     9570 non-null   object 
 13  uc-saida.obs      9570 non-null   object 
 14  rec_type          9570 non-null   object 
 15  uc-entrada.type   284 n

#### Update information on "see" type referecences

Taking into account the new zero date, no vide records (around 140)

In [31]:
see_vide = vide_plus[vide_plus['uc-entrada'] == '0000-00-00']

nsee_vide = len(see_vide.index.unique())
match_info.loc['see','data'] = nsee_vide
match_records['see']['data']=list(see_vide.index.unique())
print("Number of vide records with zero dates (forward cross references) updated:",nsee_vide)

nsee_vide_geo = len(see_vide[see_vide['nome-geografico'].notnull()].index.unique())
match_info.loc['see_geo','data'] = nsee_vide_geo
print(f"    of which {nsee_vide_geo} with place of birth {nsee_vide_geo/nsee_vide:.2%}")

nsee_vide_pai = len(see_vide[see_vide['nome-pai'].notnull()].index.unique())
match_info.loc['see_pai','data'] = nsee_vide_pai
print(f"    of which {nsee_vide_pai} with father's name  {nsee_vide_pai/nsee_vide:.2%}")

nsee_vide_fac = len(see_vide[see_vide['faculdade'].notnull()].index.unique())
match_info.loc['see_fac','data'] = nsee_vide_fac
print(f"    of which {nsee_vide_fac} with faculty        {nsee_vide_fac/nsee_vide:.2%}")
print()

base_vide_cols=['nome-vide','nome-geografico','faculdade','nome-pai']


Number of vide records with zero dates (forward cross references) updated: 5844
    of which 5369 with place of birth 91.87%
    of which 1596 with father's name  27.31%
    of which 1514 with faculty        25.91%



#### Closer look at "see" references


##### Presence of place of birth in zero date records

Most of them have place of birth information.


In [32]:
see_vide_with_geo = see_vide[see_vide['nome-geografico'].notnull()]
nsee_vide_with_geo = len(set(see_vide_with_geo.index.values))
print("See references with geo info (unique records):",
       nsee_vide_with_geo,
       "out of",nsee_vide,
       f'= {nsee_vide_with_geo/nzero_date:.2%}')
print("Other information (note that some records have more than one geographic name)")
see_vide_with_geo[base_vide_cols].count(axis=0)


See references with geo info (unique records): 5369 out of 5844 = 85.44%
Other information (note that some records have more than one geographic name)


nome-vide          5450
nome-geografico    5668
faculdade          1494
nome-pai           1664
dtype: int64

##### See references with no birth place

Check which information is available when place of birth is missing.

The values are similar to normal "see" records.


In [33]:
see_vide_nogeo = see_vide[see_vide['nome-geografico'].isnull()]
nsee_vide_nogeo = len(set(see_vide_nogeo.index.values))
print("Zero date records without geo info:",
       nsee_vide_nogeo,
       "out of",nzero_date,
       f'= {nsee_vide_nogeo/nzero_date:.2%}')
print()
print("Other information:")
see_vide_nogeo[base_vide_cols].count(axis=0)

Zero date records without geo info: 475 out of 6284 = 7.56%

Other information:


nome-vide          410
nome-geografico      0
faculdade          107
nome-pai            23
dtype: int64

### Final ist of records involved in cross references



In [34]:
import pandas as pd

pd.set_option('display.max_rows',50)
vide_plus.info()
vide_plus.head(10)

<class 'pandas.core.frame.DataFrame'>
Index: 9570 entries, 140351 to 218480
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              9570 non-null   object 
 1   nome-vide.type    9286 non-null   object 
 2   nome-vide         9286 non-null   object 
 3   nome-vide.line    9286 non-null   float64
 4   nome-vide.level   9286 non-null   float64
 5   nome-geografico   9004 non-null   object 
 6   faculdade         5001 non-null   object 
 7   faculdade.date    5001 non-null   object 
 8   faculdade.obs     5001 non-null   object 
 9   nome-pai          3573 non-null   object 
 10  uc-entrada        9570 non-null   object 
 11  uc-saida          9570 non-null   object 
 12  uc-saida.date     9570 non-null   object 
 13  uc-saida.obs      9570 non-null   object 
 14  rec_type          9570 non-null   object 
 15  uc-entrada.type   284 non-null    object 
 16  uc-entrada.line   284 non-null    float6

Unnamed: 0_level_0,name,nome-vide.type,nome-vide,nome-vide.line,nome-vide.level,nome-geografico,faculdade,faculdade.date,faculdade.obs,nome-pai,uc-entrada,uc-saida,uc-saida.date,uc-saida.obs,rec_type,uc-entrada.type,uc-entrada.line,uc-entrada.level,data-registo
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
140351,António Pestana Guerreiro Camacho Aboim,nome-vide,António José Pestana da Silva,524.0,2.0,Casalinho,Cânones,0000-00-00,Faculdade inferida,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
140357,Gonçalo de Aboim,nome-vide,Brito,731.0,2.0,Santarém,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
140377,André de Sequeira Abranches,nome-vide,Sequeira,1380.0,2.0,Várzea de Meruge,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268681,José Alexandre de Abrantes,nome-vide,Correia,2158.0,2.0,Cabanas,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
140406,Domingos de Abranches,nome-vide,Tavares,2355.0,2.0,Lourosa,Cânones,0000-00-00,Cânones,Feliciano de Abranches,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268701,Albino Abranches Figueiredo Freire Abreu,nome-vide,Albino Abranches Freire de Figueiredo,2501.0,2.0,Pisão de Coja,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268763,Bernardo José de Abreu,nome-vide,Melo,4027.0,2.0,Viseu,Cânones,0000-00-00,Cânones,Bernardo Pereira de Melo,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268800,Francisco Rafael Bezerra Sanches de Lima Abreu,nome-vide,Francisco Rafael Bezerra Sanches da Horta Abre...,4629.0,2.0,Aguada de Cima,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268820,Jerónimo do Couto Ribeiro de Abreu,nome-vide,Ribeiro,5074.0,2.0,,,,,,0000-00-00,0000-00-00,0000-00-00,,see,,,,
268857,João de Sá Pinto Abreu,nome-vide,João de Sá Pinto Soutomaior,5480.0,2.0,Viana,,,,Camilo António de Sá,0000-00-00,0000-00-00,0000-00-00,,see,,,,


## Analyse geographic names for variations

Birth place is key to maching, but there are many variations for the same place name
and many spelling varitants.

We make an index of variations in geographic names and 
find name matches within the set of names from similar spelled places.

For similarity we use the so called "gestalt pattern matching" from the Python library see: https://docs.python.org/3/library/difflib.html

The threshold ratio was determined empirically. It is not a problem that some false varitations are detected since 
a further check is done with matching the names.

The following code compares all the the geographic names 
and prints out those considered to be variants of the same 
place, with the similarity ratio. 

Note that this ratio is sensitive to length, 
and fails the threshold in short forms like "Algozo/Algoso" 
or when the two forms are of different lengths like "Poiares/Vila Nova de Poiares".

Still it detects many usefull variants.

In [35]:
import difflib

# we have many variations in geografic names
have_geoname_filter = vide_plus['nome-geografico'].notnull()
geonames_index = sorted(vide_plus[have_geoname_filter]['nome-geografico'].unique())
print("Number of different geo names:",len(geonames_index))
geo_similars = {}
diff_threshold = .85

for geo in [g for g in geonames_index if g is not None]:

    for similar in [s for s in geonames_index if s is not None and s > geo] :
        diff = difflib.SequenceMatcher(None, geo, similar).ratio()
        if  diff >= diff_threshold and geo is not None and similar is not None:
            pass
            print(f"{geo} / {similar} diff:{diff:.3}")
            geo_similars[geo] = geo_similars.get(geo,[]) + [similar]
            geo_similars[similar] = geo_similars.get(similar,[]) + [geo]


Number of different geo names: 1449
Aguas Santas / Águas Santas diff:0.917
Alcacer / Alcácer diff:0.857
Alcaide / Alcaíde diff:0.857
Alenquer / Alquer diff:0.857
Alhos Vedras / Alhos Vedros diff:0.917
Almalaguer / Almalaguez diff:0.9
Ameixilhoeira da Carregação / Mexilhoeira da Carregação diff:0.923
Anadaluzia / Andaluzia diff:0.947
Angra do Heroismo / Angra do Heroísmo diff:0.941
Arco / Arcos diff:0.889
Arcos Valdevez / Arcos de Valdevez diff:0.903
Arcos Valdevez / Arcos de Valedevez diff:0.875
Arcos de Valdevez / Arcos de Valedevez diff:0.971
Arcozelo / Arcozelos diff:0.941
Arrifana de Sousa / Arrifana do Sousa diff:0.941
Atei / Athei diff:0.889
Bairros / Barrosa diff:0.857
Barcelos / Barcos diff:0.857
Barcos / Buarcos diff:0.923
Barreira / Barreiria diff:0.941
Barreira / Barreiro diff:0.875
Barrosa / Barrosas diff:0.933
Bemviver / Benviver diff:0.875
Cabacos / Cabaços diff:0.857
Cabananas / Cabanas diff:0.875
Cabeceira de Basto / Cabeceira de Bastos diff:0.973
Cabeceira de Basto / C

## Matching "vide" references

Try to match "forward" and "backward" references by generating the target names from "vide"
expressions 

### Generation of target names from "vide" expressions

 
There are four types of "vide" expressions:

1. “Cut”: António Veloso Cabral, vide Veloso, result: António Veloso. The “vide” expression is a family name before the last; the target name is computed as the base name up to and including the “vide” expression; the resulting name is a shorter version of the base name, with the last family name(s) removed
2. “Add”: André de Campos, vide Cordeiro, result: André de Campos Cordeiro. The “vide” expression is not present in the base name; the target name is the base name with the “vide” expression added at the end; in some cases, the real target name will have and extra particle before the vide expression, like “de”, “e”, ... etc...). 
3.	“Replace”: Adriano Sisnando Brotero de Avelar Quintino, vide Adriano Sisnando Brotero Quintino de Avelar, result: Adriano Sisnando Brotero Quintino de Avelar. The “vide” expression is a full name, sharing the first name with the base name. This happens when the transformation of family names cannot be expressed by “cut” and “add”, so the author of the card wrote the full target name after “vide” for clarity.
4.	“Partial replace”: Francisco António Campos, vide de Novais Campos, result: Francisco António de Novais Campos. The “vide” expression replaces part of the base name; the “base name” and the “vide” expression overlap at the end; the matched part in the “base name” is replaced by the “vide” expression.



#### Collect first names from database, filter rare ones

We need the first names in the database for the next step.


In [37]:
# collect possible first names from current database
from timelink.pandas import attribute_values
from timelinknb import Session


# collect list of first names, ignore the less frequent ones
#
threshold = 5
pnomes = []
pnomes_table = attribute_values('nome-primeiro', db=db)
for id,linha in pnomes_table.iterrows():
    pnome = id
    count = linha['count']
    if count>threshold:
        pnomes.append(pnome)

print(f"Number of first names with more than {threshold} occurrences {len(pnomes)}")
print("Use this to copy to other places:")
print()
print("[")
for i in range(len(pnomes)):
    print(f"'{pnomes[i]}',", end='')
    if int((i+1)%5) == 0:
        print()
print("]")

Number of first names with more than 5 occurrences 277
Use this to copy to other places:

[
'Manuel','António','José','João','Francisco',
'Luís','Pedro','Joaquim','Domingos','Diogo',
'Bernardo','Miguel','Sebastião','Gaspar','Jerónimo',
'Simão','Bento','André','Inácio','Tomás',
'Paulo','Gonçalo','Alexandre','Lourenço','Agostinho',
'Jorge','Caetano','Vicente','Cristóvão','Álvaro',
'Fernando','Bartolomeu','Henrique','Baltasar','Carlos',
'Jacinto','Rodrigo','Mateus','Filipe','Estevão',
'Fernão','Afonso','Nicolau','Matias','Augusto',
'Belchior','Alberto','Tomé','Brás','Duarte',
'Gregório','Custódio','Felix','Nuno','Bernardino',
'Gabriel','Rui','Salvador','Eduardo','Alfredo',
'Dionísio','Martim','Amaro','Martinho','Marcos',
'Feliciano','Lopo','Silvestre','Roque','Adriano',
'Leonardo','Teotónio','Artur','Júlio','Lucas',
'Teodósio','Constantino','Veríssimo','Cipriano','Rafael',
'Guilherme','Vasco','Aires','Pantaleão','Damião',
'Valentim','Heitor','Ricardo','Isidoro','Ambrósio',
'Clemente','Pas


#### Apply vide expression transformation for records, get the target name

Echo replacements that involve changing the first name, because they are error prone.




In [38]:
from os.path import commonprefix
import re


vide_plus['loookup']=''
vide_plus['vide_type']=''

for id,linha in vide_plus.iterrows():
    nome =  linha['name']
    if not pd.isnull(linha['nome-vide']) :
        nome_vide = linha['nome-vide']
        nomes = nome.split(" ")
        nomes_vide = nome_vide.split(" ")
        # find a common suffix (invert names, use commonprefix, invert result)
        terminacao_comum = commonprefix([nome[::-1],nome_vide[::-1]])[::-1]
        # check it is a separate name and not just common letters at the end
        # a proper family name should share a starting space
        if len(terminacao_comum) > 0:
            if terminacao_comum[0] != ' ':
                terminacao_comum = ''    # not a separate name, abandom
            else:
                terminacao_comum = terminacao_comum.strip()
        # currently using common suffix lowers the matches why?
        # terminacao_comum = ''

        # Type CUT: vide is a inner part of the original name
        # e.g. André Vaz Cabaço, vide Vaz
        # but also Manuel de Almeida Cabral, vide de Almeida
        pos = nome.find(nome_vide)
        if pos > -1:
            lookup_name = nome[0:pos] + nome_vide
            vtype="cut"
        # Type REP: vide name looks like a full name
        # e.g. António de Abreu Bacelar de Azevedo, vide António Abreu Bacelar
        # relaxing the same first name rule, lots  of leaks
        #  This leaks a lot : elif len(nomes_vide)>1 and nomes_vide[0] in pnomes :
        elif nomes[0] == nomes_vide[0]:
            lookup_name = nome_vide
            vtype='rep'
        # Type REPAP: vide overlaps end of name
        # e.g. Joaquim Carvalho, vide Ramos de Carvalho
        # but vide must not contain first name
        # in that case probably a REP
        # otherwise generates leaks and lowers mumbers of matches
        elif terminacao_comum > '':
            if not nomes_vide[0] in pnomes :
                lookup_name = re.sub(f'{terminacao_comum}$',nome_vide,nome)
                vtype='repap'
            else:  # if common termination and first name better replace
                lookup_name = nome_vide
                vtype='rep'
        else:
            # TYPE ADD vide name is not part of original nor a full name
            # so it must be an aditional surname
            # e.g. Fernão Cabral, vide Albuquerque = Fernão Cabral Albuquerque
            lookup_name = nome+" "+nome_vide
            vtype='add'
    else:
        lookup_name = nome
        vtype='novid'


    # we try to recover cases where there was replacement of first name
    # they are missed by the REP amd REPAP rules above and end up
    # producing lookup which are the sobreposition of two names
    # this was added by examining bad "ADD" and "REPAP" results
    # if the result is a long name (>5 names), both name and vide start
    # with first names and vide also long (>4) then probable a replace
    # that changes the first name.
    nomes_lookup = lookup_name.split()
    if vtype != 'rep' \
         and nomes[0] in pnomes and nomes_vide[0] in pnomes\
         and nomes[0] != nomes_vide[0]\
         and len(nomes_vide) > 3\
         and len(nomes_lookup) > 5:
        old_lookup = lookup_name
        lookup_name = nome_vide
        vtype = 'rep+'
        print(id,nome,"vide", nome_vide,"--->",lookup_name,"\n  instead of", old_lookup)

    # print(f'{type} :{id:7}{nome:40}{nome_vide:40} = {lookup_name}')
    vide_plus.loc[id,'lookup'] = lookup_name
    vide_plus.loc[id,'vide_type'] = vtype


205068 Joaquim Rodrigues Couceiro vide José Rodrigues da Assunção ---> José Rodrigues da Assunção 
  instead of Joaquim Rodrigues Couceiro José Rodrigues da Assunção
243778 Francisco Ferraz vide Silvestre de Nossa Senhora ---> Silvestre de Nossa Senhora 
  instead of Francisco Ferraz Silvestre de Nossa Senhora
179121 João Xavier Mousinho da Silveira Gomide vide José Xavier Mousinho Gomide da Silveira ---> José Xavier Mousinho Gomide da Silveira 
  instead of João Xavier Mousinho da Silveira Gomide José Xavier Mousinho Gomide da Silveira
214868 Tomás de Lemos vide Tomé Dias de Lemos ---> Tomé Dias de Lemos 
  instead of Tomás de Lemos Tomé Dias de Lemos
233204 Francisco António Moreira vide António Moreira da Silva ---> António Moreira da Silva 
  instead of Francisco António Moreira António Moreira da Silva
288307 Domingos José Cardoso de Oliveira vide Pedro Paulo Januário Marcondes e Cardoso ---> Pedro Paulo Januário Marcondes e Cardoso 
  instead of Domingos José Cardoso de Oliveira 

#### Collect stats on type of vide transformation applied

In [39]:
vide_types = vide_plus.groupby('vide_type').count()[['name']]
vide_types['perc'] = vide_types['name']/ vide_types['name'].sum()
vide_types

Unnamed: 0_level_0,name,perc
vide_type,Unnamed: 1_level_1,Unnamed: 2_level_1
add,4057,0.423929
cut,4126,0.431139
novid,284,0.029676
rep,1057,0.110449
rep+,20,0.00209
repap,26,0.002717


#### Double check partial replace transformation

* Partial replace:
    * Francisco António Campos, vide de Novais Campos, result: Francisco António de Novais Campos. 
        * the “vide” expression replaces part of the base name; 
        * the “base name” and the “vide” expression overlap at the end; 
        * the matched part in the “base name” is replaced by the “vide” expression.

They are sensistive to misspelling in first names.

In [40]:
# Check if we got many cases of vide with overlap, and if they are handled right
repap = vide_plus[['name','nome-vide','vide_type', 'lookup','nome-geografico','faculdade','uc-entrada']].sort_values(['name','nome-vide','vide_type', 'lookup'])
repap[repap.vide_type == 'repap']

Unnamed: 0_level_0,name,nome-vide,vide_type,lookup,nome-geografico,faculdade,uc-entrada
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
151153,António Barreiros,Rodrigues Barreiros,repap,António Rodrigues Barreiros,Lisboa,Leis,1655-10-16
171604,Belchior de Matos,Rodrigues de Matos,repap,Belchior Rodrigues de Matos,Vila Viçosa,Cânones,1621-10-11
171604,Belchior de Matos,Rodrigues de Matos,repap,Belchior Rodrigues de Matos,Vila Viçosa,Leis,1621-10-11
192525,Diogo Chamorro,Homem Chamorro,repap,Diogo Homem Chamorro,Porto,,0000-00-00
130230,Francisco António Campos,de Novais Campos,repap,Francisco António de Novais Campos,Azeitão,,0000-00-00
135280,Francisco Tavares de Figueiredo,Farncisco Xavier Tavares de Figueiredo,repap,Francisco Farncisco Xavier Tavares de Figueiredo,Meãs,Cânones,1762-10-01
135280,Francisco Tavares de Figueiredo,Farncisco Xavier Tavares de Figueiredo,repap,Francisco Farncisco Xavier Tavares de Figueiredo,vila,Cânones,1762-10-01
209659,Gaspar da Cunha,Macedo da Cunha,repap,Gaspar Macedo da Cunha,Amarante,,0000-00-00
165045,Isidoro da Cunha de Eça,dos Santos de Eça,repap,Isidoro da Cunha dos Santos de Eça,Alvorninha,,0000-00-00
165046,Isidoro dos Santos de Eça,da Cunha de Eça,repap,Isidoro dos Santos da Cunha de Eça,Alvorninha,Cânones,1735-10-01


Current fails:

* 135280	Francisco Tavares de Figueiredo	__vide__ Farncisco Xavier Tavares de Figueiredo	repap	Francisco Farncisco Xavier Tavares de Figueiredo __first name misspelled__	
* 245474    Jerónimo de Magalhães Mexia	jerónimo __vide__ josé de Macêdo Magalhães Mexia	repap	Jerónimo de jerónimo josé de Macêdo Magalhães ...	__first name misspelled__
* 277264	José Luís Alves Feijó __vide__ Angelo do Santíssimo Sacramento Alves Feijó	repap	José Luís Angelo do Santíssimo Sacramento Alve...	__first name misspelled__
* 228003	José da Fonseca Marques da Silva __vide__ da Fonseca da Silva	repap	José da Fonseca Marques da Fonseca da Silva: __bad expression should be a replace__
* 204835    Luís de Figueiredo	__vide__ Figueiredo Lobo ou Lobo de Figueiredo	repap	Luís Figueiredo Lobo ou Lobo de Figueiredo __bad expression not understandable__


#### Remove particles from names

To increase the chance of matches we make a copy of names and target names derived from "vide"
without the particles that are used in Portuguese names (not very uniformely)


In [42]:

def remove_particles(name):
    particles = ("de","da","e","das","dos","do")
    return " ".join([n for n in name.split() if n not in particles])

vide_plus['name_sp'] = vide_plus['name'].apply(lambda name: remove_particles(name))
vide_plus['lookup_sp'] = vide_plus['lookup'].apply(lambda name: remove_particles(name))
vide_plus[vide_plus['name']!=vide_plus['name_sp']][['name','name_sp', 'lookup','lookup_sp']].head(10)

Unnamed: 0_level_0,name,name_sp,lookup,lookup_sp
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140357,Gonçalo de Aboim,Gonçalo Aboim,Gonçalo de Aboim Brito,Gonçalo Aboim Brito
140377,André de Sequeira Abranches,André Sequeira Abranches,André de Sequeira,André Sequeira
268681,José Alexandre de Abrantes,José Alexandre Abrantes,José Alexandre de Abrantes Correia,José Alexandre Abrantes Correia
140406,Domingos de Abranches,Domingos Abranches,Domingos de Abranches Tavares,Domingos Abranches Tavares
268763,Bernardo José de Abreu,Bernardo José Abreu,Bernardo José de Abreu Melo,Bernardo José Abreu Melo
268800,Francisco Rafael Bezerra Sanches de Lima Abreu,Francisco Rafael Bezerra Sanches Lima Abreu,Francisco Rafael Bezerra Sanches da Horta Abre...,Francisco Rafael Bezerra Sanches Horta Abreu Lima
268820,Jerónimo do Couto Ribeiro de Abreu,Jerónimo Couto Ribeiro Abreu,Jerónimo do Couto Ribeiro,Jerónimo Couto Ribeiro
268857,João de Sá Pinto Abreu,João Sá Pinto Abreu,João de Sá Pinto Soutomaior,João Sá Pinto Soutomaior
268864,Joaquim Benedito de Barbosa Abreu,Joaquim Benedito Barbosa Abreu,Joaquim Benedito de Barbosa,Joaquim Benedito Barbosa
268873,José Alexandre de Abreu,José Alexandre Abreu,José Alexandre de Abreu Coutinho,José Alexandre Abreu Coutinho


#### Save name transformations for reference

Output table with the generation of target names from base names and vide expressions.

File available at /inferences/cross-references/vide_transform.csv

This table allows comparing between sucessive versions for fine tuning.

In [43]:
# save for change tracking
vide_plus[['name','name_sp','nome-vide','vide_type','lookup', 'lookup_sp']].sort_values('name_sp').to_csv('../inferences/cross-references/vide_transform.csv',sep=',')

### Match

#### Sort by geographic name, name and lookup

First attemp is to order the records so that matching cross references end up in consecutive rows.
We sort by place of birth and inside place of birth by name and target vide name.

This is a type of similarity filter, that puts many matches in sucessive rows.


In [44]:

# sort by naturalidade, 'lookup', 'name' (with the  name and lookup ordered alfabetically)
# this should put the vide pair in sucessive rows, but misses some cases due to ordering issues
cols = ['lookup_sp','name_sp']
vide_plus['sort_key'] = vide_plus[cols].apply(lambda row: '-'.join(sorted(row.values.astype(str))), axis=1)
vide_plus.sort_values(['nome-geografico','sort_key'], inplace=True)
vide_plus[['nome-geografico','sort_key', 'name_sp','lookup_sp','nome-vide','vide_type','uc-entrada','nome-pai','faculdade']].head(10)

Unnamed: 0_level_0,nome-geografico,sort_key,name_sp,lookup_sp,nome-vide,vide_type,uc-entrada,nome-pai,faculdade
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
198423,Constância,António Homem Magalhães-António Homem Magalhãe...,António Homem Magalhães Corte Real,António Homem Magalhães,Magalhães,cut,0000-00-00,,Leis
202622,Constância,Fernão Álvares-Fernão Álvares Temudo,Fernão Álvares Temudo,Fernão Álvares,Álvares,cut,0000-00-00,,
144388,Constância,Fernão Álvares-Fernão Álvares Temudo,Fernão Álvares,Fernão Álvares Temudo,Temudo,add,1573-11-13,Pantaleão Rosado,Cânones
129617,Constância,João Claúdio Ferreira Calado-João Claúdio Ferr...,João Claúdio Ferreira Calado,João Claúdio Ferreira Calado Oliveiro,Oliveiro,add,0000-00-00,,
171438,Constância,João Veiga-João Veiga Mendes Nogueira,João Veiga Mendes Nogueira,João Veiga,João da Veiga,cut,0000-00-00,,Leis
213495,Constância,João Veiga-João Veiga Mendes Nogueira,João Veiga,João Veiga Mendes Nogueira,João da Veiga Mendes Nogueira,rep,1757-10-01,,Leis
214577,Constância,Julião Velho-Julião Velho Almeida,Julião Velho,Julião Velho Almeida,Almeida,add,0000-00-00,,
143676,Constância,Julião Velho-Julião Velho Almeida,Julião Velho Almeida,Julião Velho,Velho,cut,1663-07-10,,Cânones
203159,Constância,Manuel Costa-Manuel Costa Oliveira,Manuel Costa,Manuel Costa Oliveira,Oliveira,add,0000-00-00,Manuel da Costa,
176277,Constância,Manuel Costa-Manuel Costa Oliveira,Manuel Costa Oliveira,Manuel Costa,Costa,cut,1672-01-24,Manuel da Costa,Cânones


In [37]:
# we set records with no nome geográfico temporarly to a string
# so that we can have them together for consideration
vide_plus.loc[vide_plus['nome-geografico'].isnull(),'nome-geografico'] = '***NA***'


#### Sequential matching

In late April 2022 around 3600 records found a match by this process.

But some matches are inconsistent (assymetric or ambiguous).


In [46]:
def compare_names(name1, name2):
    return remove_particles(name1) == remove_particles(name2)

previous_lookup = ''
previous_nome = ''
previous_id = ''
previous_date = ''
sequential_matches = []

for id,linha in vide_plus.iterrows():
    nome = linha['name_sp']
    lookup_name = linha['lookup_sp']
    uc_date = linha['uc-entrada']
    rec_type = linha['rec_type']

    if compare_names(previous_lookup,nome)\
         and compare_names(previous_nome,lookup_name)\
         and id != previous_id:
        # we store the direction of the match see-aka or aka-see
        from_type = rec_type
        to_type = previous_rec_type
        to_tuple = (id,previous_id,f'{from_type}-{to_type}')
        from_tuple = (previous_id,id,f'{to_type}-{from_type}')
        if to_tuple in sequential_matches:
            # print("Skipping duplicate match",to_tuple)
            pass
        else:
            sequential_matches.append((id,previous_id,f'{from_type}-{to_type}'))
        if from_tuple in sequential_matches:
            # print("Skipping duplicate match",to_tuple)
            pass
        else:
            sequential_matches.append((previous_id,id,f'{to_type}-{from_type}'))

    previous_id = id
    previous_nome = nome
    previous_lookup = lookup_name
    previous_date = uc_date
    previous_rec_type = rec_type


In [49]:
match_info.info()

<class 'pandas.core.frame.DataFrame'>
Index: 28 entries, aka to vide_plus
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   data        13 non-null     object
 1   sequential  8 non-null      object
 2   random      0 non-null      object
dtypes: object(3)
memory usage: 1.9+ KB


##### Analyse sequential match results

In [51]:
method = 'sequential'
match_records['matched_pairs'][method] = list(set(sequential_matches))
match_info.loc['matched_pairs', method] = len(match_records['matched_pairs'][method])

# pairs
pairs_see_aka = [(o,d,mtype) for (o,d,mtype) in sequential_matches if mtype == 'see-aka']
pairs_aka_see = [(o,d,mtype) for (o,d,mtype) in sequential_matches if mtype == 'aka-see']
pairs_aka_aka = [(o,d,mtype) for (o,d,mtype) in sequential_matches if mtype == 'aka-aka']
pairs_see_see = [(o,d,mtype) for (o,d,mtype) in sequential_matches if mtype == 'see-see']

# records
rec_matched = set([id for (id,d,t) in sequential_matches]+
                  [id for (o,id,t) in sequential_matches])
rec_see_aka = set([id for (id,d,t) in pairs_see_aka])
rec_aka_see = set([id for (o,id,t) in pairs_aka_see])
rec_see_see = set([id for (id,d,t) in pairs_see_see] +
                  [id for (o,id,t) in pairs_see_see])
rec_aka_aka = set([id for (id,d,t) in pairs_aka_aka] +
                  [id for (o,id,t) in pairs_aka_aka])

match_records['records_matched'][method] = list(rec_matched)
match_records['records_see_aka'][method] = list(rec_see_aka)
match_records['records_aka_see'][method] = list(rec_aka_see)
match_records['records_aka_aka'][method] = list(rec_aka_aka)
match_records['records_see_see'][method] = list(rec_see_see)

match_info.loc['records_matched', method] = len(rec_matched)
match_info.loc['records_see_aka', method] = len(rec_see_aka)
match_info.loc['records_aka_see', method] = len(rec_aka_see)
match_info.loc['records_aka_aka', method] = len(rec_aka_aka)
match_info.loc['records_see_see', method] = len(rec_see_see)

# new
match_info.loc['aka_matched', method] = len(rec_aka_see.union(rec_aka_aka))
match_records['aka_matched'][method] = list(rec_aka_see.union(rec_aka_aka))
match_info.loc['see_matched' ,method] = len(rec_see_aka.union(rec_see_see))
match_records['see_matched'][method] = list(rec_see_aka.union(rec_see_see))

##### Check for ambigous matches
Look for records matched with more than one or involved in transitive matching (A->B->C)

Note that sequential methid only matches symmetric links and so no asymmetric cases are generated in this method

In [53]:
import networkx as nx

method = 'sequential'

matched_pairs = match_records['matched_pairs'][method]
records = match_records['records_matched'][method]

matched_multiple = []
matched_single = []

origins = [o for (o,d,t) in matched_pairs]
destinations = [d for (o,d,t) in matched_pairs]
rec_in_matches = origins + destinations
for i in rec_in_matches:
    c = rec_in_matches.count(i)
    if c >2:
        matched_multiple.append(i)
    elif c == 1:
        matched_single.append(i)
matched_multiple = list(set(matched_multiple))
matched_single = list(set(matched_single))


# alternative method, perhaps more informative:
pairs_to_check = match_records['matched_pairs'][method]

asymmetric_pairs = []
for (o,d,t) in pairs_to_check:
    if t == 'see-aka':
        rt = 'aka-see'
    elif t == 'aka-see':
        rt = 'see-aka'
    else:
        rt = t
    if (d,o,rt) not in pairs_to_check:
        asymmetric_pairs.append((o,d,t))
        print("asymmetic match:",(o,d,t))

print("Records with multiple matches         :", len(matched_multiple))
print("Records with just one match           :", len(matched_single))

G = nx.Graph()
simple_pairs = [(o,d) for (o,d,t) in matched_pairs]
G.add_edges_from(simple_pairs)
transitive  = [c for c in nx.connected_components(G) if len(c) > 2]
# number of records in ambiguous matches
amb_records = [item for amb in transitive for item in amb]
namb_records = len(set(amb_records))
print("Records in ambiguous matches          :", namb_records)
for amb in transitive:
    print(amb)
print("Are multiple in ambiguous?            :",set(matched_multiple).issubset(set(amb_records)))

rec_errors = set(amb_records).union(matched_multiple).union(matched_single)
rec_ok = set(records).difference(rec_errors)

match_records['records_error'][method] = list(rec_errors)
match_info.at['records_error', method] = len(rec_errors)
match_records['records_matched_ok'][method] = list(rec_ok)
match_info.at['records_matched_ok', method] = len(rec_ok)
match_records['records_asymmetric'][method] = list(matched_single)
match_info.at['records_asymmetric', method] = len(matched_single)
match_records['records_transitive'][method] = list(amb_records)
match_info.at['records_transitive', method] = namb_records

# new
aka = match_records['aka']['data']
aka_ok = set(aka).intersection(rec_ok)
see = match_records['see']['data']
see_ok = set(see).intersection(rec_ok)

match_info.loc['aka_matched_ok', method] = len(aka_ok)
match_records['aka_matched_ok'][method] = list(aka_ok)
match_info.loc['see_matched_ok', method] = len(see_ok)
match_records['see_matched_ok'][method] = list(see_ok)

pairs_ok = set([(o,d,t) for (o,d,t) in match_records['matched_pairs'][method]
                                                        if o not in rec_errors and d not in rec_errors])
match_records['matched_pairs_ok'][method] = list(pairs_ok)
match_info.loc['matched_pairs_ok', method] = len(pairs_ok)

vide_plus.loc[matched_single,'match_error'] = False
vide_plus.loc[matched_single,'match_obs'] = "W01-Single match (asymmetric) "+method
vide_plus.loc[matched_multiple,'match_error'] = True
vide_plus.loc[matched_multiple,'match_obs'] = "E02-Multiple match "+method
vide_plus.loc[amb_records,'match_error'] = True
vide_plus.loc[amb_records,'match_obs'] = "E01-Ambiguity in match "+method

match_info.fillna('')

Records with multiple matches         : 6
Records with just one match           : 0
Records in ambiguous matches          : 18
{'216361', '251534', '251536'}
{'143426', '175730', '175731'}
{'168662', '211706', '211704'}
{'166756', '164823', '164824'}
{'232079', '210090', '232128'}
{'316381', '201515', '194771'}
Are multiple in ambiguous?            : True


Unnamed: 0,data,sequential,random
aka,3046.0,,
aka_fac,3020.0,,
aka_geo,2958.0,,
aka_matched,,1910.0,
aka_matched_ok,,1894.0,
aka_pai,1617.0,,
matched_pairs,,3644.0,
matched_pairs_ok,,3620.0,
nodate,6284.0,,
nodate_novide,264.0,,


In [54]:
vide_plus.loc['217701']

name                                          José de Santo António
nome-vide.type                                            nome-vide
nome-vide                                                 Lencastre
nome-vide.line                                               5027.0
nome-vide.level                                                 2.0
nome-geografico                                                 NaN
faculdade                                                       NaN
faculdade.date                                                  NaN
faculdade.obs                                                   NaN
nome-pai                                                        NaN
uc-entrada                                               0000-00-00
uc-saida                                                 0000-00-00
uc-saida.date                                            0000-00-00
uc-saida.obs                                                       
rec_type                                        

##### Show some of the ambiguous records

Pandas and timelink in colorful relation.

TODO: this should be saved to a file, one per ambiguous set, fo further analysis.

In [68]:
from timelink.pandas import display_group_attributes

pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

show_only = 8
for ambiguous_records in transitive[:show_only]:
    print("ambiguous_records",ambiguous_records)
    for id in ambiguous_records:
        print(f"https://pesquisa.auc.uc.pt/details?id={id}")

    display_group_attributes(ambiguous_records,
                             entity_type='person',
                             header_elements=['id', 'name'],
                             header_attributes=['nome-vide','naturalidade','uc-entrada','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_date','the_type','the_value'],
                             cmap_name='Pastel1',
                             db=db)

ambiguous_records {'216361', '251534', '251536'}
https://pesquisa.auc.uc.pt/details?id=216361
https://pesquisa.auc.uc.pt/details?id=251534
https://pesquisa.auc.uc.pt/details?id=251536


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,216361,Manuel Vidigal,nome-vide,Morais,0000-00-00,10904,2,,Évora,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,,,,216361
1,251534,Manuel Vidigal de Morais,nome-vide,Vidigal,0000-00-00,11765,2,,Évora,0000-00-00,,0000-00-00,0000-00-00,,,,,Crispim Luís,0000-00-00,,251534
2,251536,Manuel Vidigal de Morais,nome-vide,Vidigal,1667-10-21,11840,2,,Évora,1667-10-21,,1667-10-21,1667-10-21,,Cânones,1667-10-21,Cânones,,,,251536


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,216361,Manuel Vidigal,faculdade,Cânones,0000-00-00,Cânones,216361
1,216361,Manuel Vidigal,naturalidade,Évora,0000-00-00,,216361
2,251534,Manuel Vidigal de Morais,naturalidade,Évora,0000-00-00,,251534
3,216361,Manuel Vidigal,nome,Manuel Vidigal,0000-00-00,,216361
4,251534,Manuel Vidigal de Morais,nome,Manuel Vidigal,0000-00-00,"Manuel Vidigal de Morais, vide Vidigal",251534
5,216361,Manuel Vidigal,nome,Manuel Vidigal Morais,0000-00-00,"Manuel Vidigal, vide Morais",216361
6,251534,Manuel Vidigal de Morais,nome,Manuel Vidigal de Morais,0000-00-00,,251534
7,251534,Manuel Vidigal de Morais,nome-pai,Crispim Luís,0000-00-00,,251534
8,216361,Manuel Vidigal,nome-vide,Morais,0000-00-00,,216361
9,251534,Manuel Vidigal de Morais,nome-vide,Vidigal,0000-00-00,,251534


ambiguous_records {'143426', '175730', '175731'}
https://pesquisa.auc.uc.pt/details?id=143426
https://pesquisa.auc.uc.pt/details?id=175730
https://pesquisa.auc.uc.pt/details?id=175731


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,175730,João de Oliveira,nome-vide,Almeida,0000-00-00,5504,2,,Coimbra,0000-00-00,,0000-00-00,0000-00-00,,,,,,175730
1,143426,João de Oliveira de Almeida,nome-vide,Oliveira,1664-10-29,1024,2,,Coimbra,1664-10-29,,1664-10-29,1664-10-29,,Cânones,1664-10-29,Cânones,,143426
2,175731,João de Oliveira,nome-vide,Almeida,1753-10-01,5521,2,,Coimbra,1753-10-01,,1753-10-01,1753-10-01,,Cânones,1753-10-01,Cânones,,175731


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,175730,João de Oliveira,naturalidade,Coimbra,0000-00-00,,175730
1,175730,João de Oliveira,nome,João de Oliveira,0000-00-00,,175730
2,175730,João de Oliveira,nome,João de Oliveira Almeida,0000-00-00,"João de Oliveira, vide Almeida",175730
3,175730,João de Oliveira,nome-vide,Almeida,0000-00-00,,175730
4,175730,João de Oliveira,uc-entrada,0000-00-00,0000-00-00,,175730
5,175730,João de Oliveira,uc-saida,0000-00-00,0000-00-00,,175730
6,143426,João de Oliveira de Almeida,faculdade,Cânones,1664-10-29,Cânones,143426
7,143426,João de Oliveira de Almeida,matricula-faculdade,Cânones,1664-10-29,1664/10/29,143426
8,143426,João de Oliveira de Almeida,naturalidade,Coimbra,1664-10-29,,143426
9,143426,João de Oliveira de Almeida,nome,João de Oliveira,1664-10-29,"João de Oliveira de Almeida, vide Oliveira",143426


ambiguous_records {'168662', '211706', '211704'}
https://pesquisa.auc.uc.pt/details?id=168662
https://pesquisa.auc.uc.pt/details?id=211706
https://pesquisa.auc.uc.pt/details?id=211704


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,211706,António Vaz,nome-vide,Frois,1568-12-04,12503,2,,Coimbra,1568-12-04,,1568-12-04,1568-12-04,,Cânones,1568-12-04,Cânones,,,,211706
1,168662,António Vaz Frois,nome-vide,Vaz,1601-10-10,7302,2,,Coimbra,1601-10-10,,1601-10-10,1601-10-10,,Cânones,1601-10-10,Cânones,António Frenandes,1601-10-10,,168662
2,211704,António Vaz,nome-vide,Frois,1602-03-16,12480,2,,Coimbra,1602-03-16,,1602-03-16,1602-03-16,,Artes,1602-03-16,Faculdade inferida,,,,211704


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,211706,António Vaz,faculdade,Cânones,1568-12-04,Cânones,211706
1,211706,António Vaz,naturalidade,Coimbra,1568-12-04,,211706
2,211706,António Vaz,nome,António Vaz,1568-12-04,,211706
3,211706,António Vaz,nome,António Vaz Frois,1568-12-04,"António Vaz, vide Frois",211706
4,211706,António Vaz,nome-vide,Frois,1568-12-04,,211706
5,211706,António Vaz,uc-entrada,1568-12-04,1568-12-04,,211706
6,211706,António Vaz,uc-entrada.ano,1568,1568-12-04,,211706
7,211706,António Vaz,instituta,1568-12-04:1569-06-30,1568-12-04:1569-06-30,curso: Instituta e Cânones: 04.12.1568 até 30.06.1569,211706
8,211706,António Vaz,exame,Exame e Grau de Bacharel em Cânones,1571-07-23,,211706
9,211706,António Vaz,grau,Bacharel em Cânones,1571-07-23,Exame e Grau de Bacharel em Cânones: 23.07.1571,211706


ambiguous_records {'166756', '164823', '164824'}
https://pesquisa.auc.uc.pt/details?id=166756
https://pesquisa.auc.uc.pt/details?id=164823
https://pesquisa.auc.uc.pt/details?id=164824


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,164823,João Freire,nome-vide,de Andrade,0000-00-00,56,2,,Lamego,0000-00-00,,0000-00-00,0000-00-00,,,,,,,,164823
1,166756,João Freire de Andrade,nome-vide,Freire,1676-10-01,9902,2,,Lamego,1676-10-01,,1676-10-01,1676-10-01,,Cânones,1676-10-01,Cânones,,,,166756
2,164824,João Freire,nome-vide,de Andrade,1677-10-06,73,2,,Lamego,1677-10-06,,1677-10-06,1677-10-06,,Cânones,1677-10-06,Cânones,Francisco Freire,1677-10-06,,164824


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,164823,João Freire,naturalidade,Lamego,0000-00-00,,164823
1,164823,João Freire,nome,João Freire,0000-00-00,,164823
2,164823,João Freire,nome,João Freire de Andrade,0000-00-00,"João Freire, vide de Andrade",164823
3,164823,João Freire,nome-vide,de Andrade,0000-00-00,,164823
4,164823,João Freire,uc-entrada,0000-00-00,0000-00-00,,164823
5,164823,João Freire,uc-saida,0000-00-00,0000-00-00,,164823
6,166756,João Freire de Andrade,faculdade,Cânones,1676-10-01,Cânones,166756
7,166756,João Freire de Andrade,instituta,1676-10-01,1676-10-01,01.10.1676 1676-10-01,166756
8,166756,João Freire de Andrade,naturalidade,Lamego,1676-10-01,,166756
9,166756,João Freire de Andrade,nome,João Freire,1676-10-01,"João Freire de Andrade, vide Freire",166756


ambiguous_records {'232079', '210090', '232128'}
https://pesquisa.auc.uc.pt/details?id=232079
https://pesquisa.auc.uc.pt/details?id=210090
https://pesquisa.auc.uc.pt/details?id=232128


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,232128,Francisco Soares,nome-vide,Pacheco,0000-00-00,12863,2,,Lisboa,0000-00-00,,0000-00-00,0000-00-00,,Medicina,0000-00-00,Medicina,Manuel Soares,0000-00-00,,232128
1,210090,Francisco Soares Pacheco,nome-vide,Soares,1624-10-05,3794,2,,Lisboa,1624-10-05,,1624-10-05,1624-10-05,,Medicina,1624-10-05,Medicina,Manuel Soares,1624-10-05,,210090
2,232079,Francisco Soares,nome-vide,Pacheco,1625-05-17,12031,2,,Lisboa,1625-05-17,,1625-05-17,1625-05-17,,Artes,1625-05-17,Faculdade inferida,,,,232079


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,232128,Francisco Soares,faculdade,Medicina,0000-00-00,Medicina,232128
1,232128,Francisco Soares,naturalidade,Lisboa,0000-00-00,,232128
2,232128,Francisco Soares,nome,Francisco Soares,0000-00-00,,232128
3,232128,Francisco Soares,nome,Francisco Soares Pacheco,0000-00-00,"Francisco Soares, vide Pacheco",232128
4,232128,Francisco Soares,nome-pai,Manuel Soares,0000-00-00,,232128
5,232128,Francisco Soares,nome-vide,Pacheco,0000-00-00,,232128
6,232128,Francisco Soares,uc-entrada,0000-00-00,0000-00-00,,232128
7,232128,Francisco Soares,uc-saida,0000-00-00,0000-00-00,,232128
8,210090,Francisco Soares Pacheco,faculdade,Medicina,1624-10-05,Medicina,210090
9,210090,Francisco Soares Pacheco,matricula-faculdade,Medicina,1624-10-05,05.10.1624,210090


ambiguous_records {'316381', '201515', '194771'}
https://pesquisa.auc.uc.pt/details?id=316381
https://pesquisa.auc.uc.pt/details?id=201515
https://pesquisa.auc.uc.pt/details?id=194771


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,201515,Manuel José de Magalhães Teixeira,nome-vide,Manuel José,0000-00-00,10235,2,,Braga,0000-00-00,,0000-00-00,0000-00-00,,,,,,,,201515
1,194771,Manuel José,nome-vide,Magalhães Teixeira,1735-01-30,2724,2,,Braga,1735-01-30,,1735-01-30,1735-01-30,,Cânones,1735-01-30,Cânones,,,,194771
2,316381,Manuel José de Magalhães Teixeira,nome-vide,Manuel José,1741-06-27,10402,2,,Braga,1741-06-27,,1741-06-27,1741-06-27,,Cânones,1741-06-27,Cânones,Domingos Teixeira de Magalhães,1741-06-27,,316381


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,201515,Manuel José de Magalhães Teixeira,naturalidade,Braga,0000-00-00,,201515
1,201515,Manuel José de Magalhães Teixeira,nome,Manuel José,0000-00-00,"Manuel José de Magalhães Teixeira, vide Manuel José",201515
2,201515,Manuel José de Magalhães Teixeira,nome,Manuel José de Magalhães Teixeira,0000-00-00,,201515
3,201515,Manuel José de Magalhães Teixeira,nome-vide,Manuel José,0000-00-00,,201515
4,201515,Manuel José de Magalhães Teixeira,uc-entrada,0000-00-00,0000-00-00,,201515
5,201515,Manuel José de Magalhães Teixeira,uc-saida,0000-00-00,0000-00-00,,201515
6,194771,Manuel José,faculdade,Cânones,1735-01-30,Cânones,194771
7,194771,Manuel José,instituta,1735-01-30,1735-01-30,30.01.1735 1735-01-30,194771
8,194771,Manuel José,naturalidade,Braga,1735-01-30,,194771
9,194771,Manuel José,nome,Manuel José,1735-01-30,,194771


##### aka-aka matches in sequential mode

In sequential mode we do not filter by record type so some aka-aka and see-see matches occur.
matches 

In [69]:
from timelink.pandas import display_group_attributes

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

pairs = match_records['matched_pairs']['sequential']
show_pairs = [(o,d,t) for o,d,t in pairs if t == 'aka-aka' and o<d]
show_only = 10
print(f"aka-aka matches in sequential mode (show only {show_only}) of {len(show_pairs)}:")
for o,d,t in show_pairs[:show_only]:
    print(f"https://pesquisa.auc.uc.pt/details?id={o}")
    print(f"https://pesquisa.auc.uc.pt/details?id={d}")
    display_group_attributes([o,d], entity_type='person',
                             header_elements=['id', 'name'],
                             header_attributes=['nome-vide','naturalidade','uc-entrada','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_date','the_type','the_value'],
                             cmap_name='Pastel1',
                             db=db)

aka-aka matches in sequential mode (show only 10) of 107:
https://pesquisa.auc.uc.pt/details?id=245698
https://pesquisa.auc.uc.pt/details?id=252234


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,252234,Gonçalo Fernandes,nome-vide,Ferreira,1642-10-15,12403,2,,Campo Maior,1642-10-15,,1642-10-15,1642-10-15,,Medicina,1642-10-15,Medicina,Manuel Ferreira,1642-10-15,,252234
1,245698,Gonçalo Fernandes Ferreira,nome-vide,Fernandes,1644-03-10,13842,2,,Campo Maior,1644-03-10,,1644-03-10,1644-03-10,,Medicina,1644-03-10,Medicina,Manuel Ferreira,1644-03-10,,245698


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,252234,Gonçalo Fernandes,faculdade,Medicina,1642-10-15,Medicina,252234
1,252234,Gonçalo Fernandes,matricula-faculdade,Medicina,1642-10-15,15.10.1642,252234
2,252234,Gonçalo Fernandes,naturalidade,Campo Maior,1642-10-15,,252234
3,252234,Gonçalo Fernandes,nome,Gonçalo Fernandes,1642-10-15,,252234
4,252234,Gonçalo Fernandes,nome,Gonçalo Fernandes Ferreira,1642-10-15,"Gonçalo Fernandes, vide Ferreira",252234
5,252234,Gonçalo Fernandes,nome-pai,Manuel Ferreira,1642-10-15,,252234
6,252234,Gonçalo Fernandes,nome-vide,Ferreira,1642-10-15,,252234
7,252234,Gonçalo Fernandes,uc-entrada,1642-10-15,1642-10-15,,252234
8,252234,Gonçalo Fernandes,uc-entrada.ano,1642,1642-10-15,,252234
9,252234,Gonçalo Fernandes,matricula-faculdade,Medicina,1643-10-15,15.10.1643,252234


https://pesquisa.auc.uc.pt/details?id=213910
https://pesquisa.auc.uc.pt/details?id=253753


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,213910,Rui Lopes da Veiga,nome-vide,Lopes,1558-02-03,1588,2,,Coimbra,1558-02-03,,1558-02-03,1558-02-03,,Artes,1558-02-03,Faculdade inferida,,213910
1,253753,Rui Lopes,nome-vide,da Veiga,1568-12-23,2329,2,,Coimbra,1568-12-23,,1568-12-23,1568-12-23,,Leis,1568-12-23,Leis,,253753


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,213910,Rui Lopes da Veiga,faculdade,Artes,1558-02-03,Faculdade inferida,213910
1,213910,Rui Lopes da Veiga,grau,Bacharel em Artes,1558-02-03,ter o tempo que se requer para Bacharel em Artes: 03.02.1558,213910
2,213910,Rui Lopes da Veiga,naturalidade,Coimbra,1558-02-03,,213910
3,213910,Rui Lopes da Veiga,nome,Rui Lopes,1558-02-03,"Rui Lopes da Veiga, vide Lopes",213910
4,213910,Rui Lopes da Veiga,nome,Rui Lopes da Veiga,1558-02-03,,213910
5,213910,Rui Lopes da Veiga,nome-vide,Lopes,1558-02-03,,213910
6,213910,Rui Lopes da Veiga,uc-entrada,1558-02-03,1558-02-03,,213910
7,213910,Rui Lopes da Veiga,uc-entrada.ano,1558,1558-02-03,,213910
8,213910,Rui Lopes da Veiga,grau,Licenciado em Artes,1560-05-23,23.05.1560,213910
9,213910,Rui Lopes da Veiga,uc-saida,1560-05-23,1560-05-23,,213910


https://pesquisa.auc.uc.pt/details?id=150411
https://pesquisa.auc.uc.pt/details?id=252562


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,150411,Manuel Lopes Batista,nome-vide,Lopes,1702-03-30,3492,2,,Coimbra,1702-03-30,,1702-03-30,1702-03-30,,Medicina,1702-03-30,Medicina,Matias Lopes,1702-03-30,,150411
1,252562,Manuel Lopes,nome-vide,Batista,1702-10-01,11060,2,,Coimbra,1702-10-01,,1702-10-01,1702-10-01,,Medicina,1702-10-01,Medicina,Matias Lopes,1702-10-01,,252562


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,150411,Manuel Lopes Batista,faculdade,Medicina,1702-03-30,Medicina,150411
1,150411,Manuel Lopes Batista,grau,Bacharel em Artes,1702-03-30,1702/03/30,150411
2,150411,Manuel Lopes Batista,naturalidade,Coimbra,1702-03-30,,150411
3,150411,Manuel Lopes Batista,nome,Manuel Lopes,1702-03-30,"Manuel Lopes Batista, vide Lopes",150411
4,150411,Manuel Lopes Batista,nome,Manuel Lopes Batista,1702-03-30,,150411
5,150411,Manuel Lopes Batista,nome-pai,Matias Lopes,1702-03-30,,150411
6,150411,Manuel Lopes Batista,nome-vide,Lopes,1702-03-30,,150411
7,150411,Manuel Lopes Batista,uc-entrada,1702-03-30,1702-03-30,,150411
8,150411,Manuel Lopes Batista,uc-entrada.ano,1702,1702-03-30,,150411
9,150411,Manuel Lopes Batista,uc-saida,1702-03-30,1702-03-30,,150411


https://pesquisa.auc.uc.pt/details?id=149046
https://pesquisa.auc.uc.pt/details?id=216902


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,149046,Bernardo José de Azevedo,nome-vide,Vieira,1725-10-01,3497,2,,Paredes,1725-10-01,,1725-10-01,1725-10-01,,Cânones,1725-10-01,Cânones,José de Azevedo Vieira,1725-10-01,,149046
1,216902,Bernardo José de Azevedo Vieira,nome-vide,Azevedo,1726-10-01,1101,2,,Paredes,1726-10-01,,1726-10-01,1726-10-01,,Cânones,1726-10-01,Cânones,,,,216902


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,149046,Bernardo José de Azevedo,faculdade,Cânones,1725-10-01,Cânones,149046
1,149046,Bernardo José de Azevedo,instituta,1725-10-01,1725-10-01,01.10.1725 1725-10-01,149046
2,149046,Bernardo José de Azevedo,naturalidade,Paredes,1725-10-01,,149046
3,149046,Bernardo José de Azevedo,nome,Bernardo José de Azevedo,1725-10-01,,149046
4,149046,Bernardo José de Azevedo,nome,Bernardo José de Azevedo Vieira,1725-10-01,"Bernardo José de Azevedo, vide Vieira",149046
5,149046,Bernardo José de Azevedo,nome-nota,padre,1725-10-01,,149046
6,149046,Bernardo José de Azevedo,nome-pai,José de Azevedo Vieira,1725-10-01,,149046
7,149046,Bernardo José de Azevedo,nome-vide,Vieira,1725-10-01,,149046
8,149046,Bernardo José de Azevedo,padre,sim,1725-10-01,padre,149046
9,149046,Bernardo José de Azevedo,uc-entrada,1725-10-01,1725-10-01,,149046


https://pesquisa.auc.uc.pt/details?id=217492
https://pesquisa.auc.uc.pt/details?id=247266


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,247266,Jerónimo Lopes,nome-vide,Vieira,1622-04-26,4847,2,,Santarém,1622-04-26,,1622-04-26,1622-04-26,,Medicina,1622-04-26,Medicina,Francisco Lopes,1622-04-26,,247266
1,217492,Jerónimo Lopes Vieira,nome-vide,Lopes,1623-05-04,4693,2,,Santarém,1623-05-04,,1623-05-04,1623-05-04,,Artes,1623-05-04,Faculdade inferida,Francisco Lopes,1623-05-04,,217492


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,247266,Jerónimo Lopes,faculdade,Medicina,1622-04-26,Medicina,247266
1,247266,Jerónimo Lopes,matricula-faculdade,Medicina,1622-04-26,26.04.1622,247266
2,247266,Jerónimo Lopes,naturalidade,Santarém,1622-04-26,,247266
3,247266,Jerónimo Lopes,nome,Jerónimo Lopes,1622-04-26,,247266
4,247266,Jerónimo Lopes,nome,Jerónimo Lopes Vieira,1622-04-26,"Jerónimo Lopes, vide Vieira",247266
5,247266,Jerónimo Lopes,nome-pai,Francisco Lopes,1622-04-26,,247266
6,247266,Jerónimo Lopes,nome-vide,Vieira,1622-04-26,,247266
7,247266,Jerónimo Lopes,uc-entrada,1622-04-26,1622-04-26,,247266
8,247266,Jerónimo Lopes,uc-entrada.ano,1622,1622-04-26,,247266
9,217492,Jerónimo Lopes Vieira,faculdade,Artes,1623-05-04,Faculdade inferida,217492


https://pesquisa.auc.uc.pt/details?id=168704
https://pesquisa.auc.uc.pt/details?id=205306


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,205306,António Mendes,nome-vide,Neto,1540-04-11,1068,2,,Santarém,1540-04-11,,1540-04-11,1540-04-11,,Cânones,1540-04-11,Cânones,,205306
1,168704,António Mendes Neto,nome-vide,Mendes,1549-04-11,7628,2,,Santarém,1549-04-11,,1549-04-11,1549-04-11,,Cânones,1549-04-11,Faculdade corrigida,,168704


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,205306,António Mendes,faculdade,Cânones,1540-04-11,Cânones,205306
1,205306,António Mendes,naturalidade,Santarém,1540-04-11,,205306
2,205306,António Mendes,nome,António Mendes,1540-04-11,,205306
3,205306,António Mendes,nome,António Mendes Neto,1540-04-11,"António Mendes, vide Neto",205306
4,205306,António Mendes,nome-vide,Neto,1540-04-11,,205306
5,205306,António Mendes,uc-entrada,1540-04-11,1540-04-11,,205306
6,205306,António Mendes,uc-entrada.ano,1540,1540-04-11,,205306
7,168704,António Mendes Neto,faculdade,Cânones,1549-04-11,Faculdade corrigida,168704
8,168704,António Mendes Neto,faculdade,Leis,1549-04-11,Faculdade corrigida,168704
9,168704,António Mendes Neto,faculdade-original,Cânones,1549-04-11,,168704


https://pesquisa.auc.uc.pt/details?id=144662
https://pesquisa.auc.uc.pt/details?id=204182


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,144662,Pedro Álvares,nome-vide,Costa,1613-11-09,7169,2,,Chaves,1613-11-09,,1613-11-09,1613-11-09,,Cânones,1613-11-09,Cânones,Baltasar da Costa,1613-11-09,,144662
1,204182,Pedro Álvares da Costa,nome-vide,Álvares,1619-11-17,14009,2,,Chaves,1619-11-17,,1619-11-17,1619-11-17,,Cânones,1619-11-17,Cânones,Baltasar da Costa,1619-11-17,,204182


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,144662,Pedro Álvares,faculdade,Cânones,1613-11-09,Cânones,144662
1,144662,Pedro Álvares,instituta,1613-11-09,1613-11-09,1613/11/09 1613-11-09,144662
2,144662,Pedro Álvares,naturalidade,Chaves,1613-11-09,,144662
3,144662,Pedro Álvares,nome,Pedro Álvares,1613-11-09,,144662
4,144662,Pedro Álvares,nome,Pedro Álvares Costa,1613-11-09,"Pedro Álvares, vide Costa",144662
5,144662,Pedro Álvares,nome-pai,Baltasar da Costa,1613-11-09,,144662
6,144662,Pedro Álvares,nome-vide,Costa,1613-11-09,,144662
7,144662,Pedro Álvares,uc-entrada,1613-11-09,1613-11-09,,144662
8,144662,Pedro Álvares,uc-entrada.ano,1613,1613-11-09,,144662
9,144662,Pedro Álvares,matricula-faculdade,Cânones,1615-10-16,1615/10/16,144662


https://pesquisa.auc.uc.pt/details?id=133131
https://pesquisa.auc.uc.pt/details?id=180160


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,180160,Manuel de Gouveia,nome-vide,Quintela,1656-10-01,12267,2,,Lisboa,1656-10-01,,1656-10-01,1656-10-01,,Cânones,1656-10-01,Cânones,,,,180160
1,133131,Manuel de Gouveia Quintela,nome-vide,Gouveia,1656-10-07,1186,2,,Lisboa,1656-10-07,,1656-10-07,1656-10-07,,Cânones,1656-10-07,Cânones,João de Gouveia,1656-10-07,,133131


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,180160,Manuel de Gouveia,faculdade,Cânones,1656-10-01,Cânones,180160
1,180160,Manuel de Gouveia,naturalidade,Lisboa,1656-10-01,,180160
2,180160,Manuel de Gouveia,nome,Manuel de Gouveia,1656-10-01,,180160
3,180160,Manuel de Gouveia,nome,Manuel de Gouveia Quintela,1656-10-01,"Manuel de Gouveia, vide Quintela",180160
4,180160,Manuel de Gouveia,nome-vide,Quintela,1656-10-01,,180160
5,180160,Manuel de Gouveia,uc-entrada,1656-10-01,1656-10-01,,180160
6,180160,Manuel de Gouveia,uc-entrada.ano,1656,1656-10-01,,180160
7,133131,Manuel de Gouveia Quintela,faculdade,Cânones,1656-10-07,Cânones,133131
8,133131,Manuel de Gouveia Quintela,instituta,1656-10-07,1656-10-07,07.10.1656 1656-10-07,133131
9,180160,Manuel de Gouveia,instituta,1656-10-07,1656-10-07,07.10.1656 1656-10-07,180160


https://pesquisa.auc.uc.pt/details?id=142388
https://pesquisa.auc.uc.pt/details?id=171377


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,142388,André de Matos de Almada,nome-vide,Matos,1576-10-01,12164,2,,Lisboa,1576-10-01,,1576-10-01,1576-10-01,,Leis,1576-10-01,Leis,Jerónimo de Matos,1576-10-01,,142388
1,171377,André de Matos,nome-vide,Almada,1639-11-07,3928,2,,Lisboa,1639-11-07,,1639-11-07,1639-11-07,,Cânones,1639-11-07,Cânones,Fernão de Matos,1639-11-07,,171377


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,142388,André de Matos de Almada,faculdade,Leis,1576-10-01,Leis,142388
1,142388,André de Matos de Almada,naturalidade,Lisboa,1576-10-01,,142388
2,142388,André de Matos de Almada,nome,André de Matos,1576-10-01,"André de Matos de Almada, vide Matos",142388
3,142388,André de Matos de Almada,nome,André de Matos de Almada,1576-10-01,,142388
4,142388,André de Matos de Almada,nome-pai,Jerónimo de Matos,1576-10-01,,142388
5,142388,André de Matos de Almada,nome-vide,Matos,1576-10-01,,142388
6,142388,André de Matos de Almada,uc-entrada,1576-10-01,1576-10-01,,142388
7,142388,André de Matos de Almada,uc-entrada.ano,1576,1576-10-01,,142388
8,142388,André de Matos de Almada,instituta,1576-10-01:1577-04-30,1576-10-01:1577-04-30,curso: Instituta e Leis: 01.10.1576 até 30.04.1577,142388
9,142388,André de Matos de Almada,matricula-faculdade,Leis,1578-10-02,1578/10/02,142388


https://pesquisa.auc.uc.pt/details?id=149277
https://pesquisa.auc.uc.pt/details?id=205921


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,149277,Jerónimo de Azevedo,nome-vide,Coutinho,1630-02-07,10593,2,,Lisboa,1630-02-07,,1630-02-07,1630-02-07,,Cânones,1630-02-07,Cânones,Álvaro de Azevedo,1630-02-07,,149277
1,205921,Jerónimo de Azevedo Coutinho,nome-vide,Azevedo,1641-10-15,2558,2,,Lisboa,1641-10-15,,1641-10-15,1641-10-15,,Cânones,1641-10-15,Cânones,Álvaro de Azevedo,1641-10-15,,205921


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,149277,Jerónimo de Azevedo,faculdade,Cânones,1630-02-07,Cânones,149277
1,149277,Jerónimo de Azevedo,instituta,1630-02-07,1630-02-07,07.02.1630 1630-02-07,149277
2,149277,Jerónimo de Azevedo,naturalidade,Lisboa,1630-02-07,,149277
3,149277,Jerónimo de Azevedo,nome,Jerónimo de Azevedo,1630-02-07,,149277
4,149277,Jerónimo de Azevedo,nome,Jerónimo de Azevedo Coutinho,1630-02-07,"Jerónimo de Azevedo, vide Coutinho",149277
5,149277,Jerónimo de Azevedo,nome-pai,Álvaro de Azevedo,1630-02-07,,149277
6,149277,Jerónimo de Azevedo,nome-vide,Coutinho,1630-02-07,,149277
7,149277,Jerónimo de Azevedo,uc-entrada,1630-02-07,1630-02-07,,149277
8,149277,Jerónimo de Azevedo,uc-entrada.ano,1630,1630-02-07,,149277
9,149277,Jerónimo de Azevedo,instituta,1631-10-01,1631-10-01,01.10.1631 1631-10-01,149277


In [70]:
vide_plus.loc['217701']

name                                          José de Santo António
nome-vide.type                                            nome-vide
nome-vide                                                 Lencastre
nome-vide.line                                               5027.0
nome-vide.level                                                 2.0
nome-geografico                                                 NaN
faculdade                                                       NaN
faculdade.date                                                  NaN
faculdade.obs                                                   NaN
nome-pai                                                        NaN
uc-entrada                                               0000-00-00
uc-saida                                                 0000-00-00
uc-saida.date                                            0000-00-00
uc-saida.obs                                                       
rec_type                                        

#### Non sequential matching

In [71]:
import pandas as pd
import numpy as np

vide_plus['match_error']=False

previous_geo_name = ''

# list of record to need to be debugged
# use a break point in "pass" statement of the "if"
#  at the start of the loop
# problematic = ['169888','169890','214417']  # add to list what you what to debug
#
# 211703 See matches 168662 which in turn matches 211704 e 211706
problematic = ['168662','211703','211704','211706']


# Loop through the vide records
random_matches = []
for id,linha in vide_plus.iterrows():

    if id in problematic:
        pass  # do breakpoint here to debug problematic records

    nome = linha['name_sp']
    lookup_name = linha['lookup_sp']
    nome_geo = linha['nome-geografico']
    rec_type = linha['rec_type']

    # we now check for similar geo names
    # and load names from variants
    if nome_geo != previous_geo_name:
        simile = geo_similars.get(nome_geo,[])
        if len(simile) > 0 :
            # we have similar geo names
            local_records = vide_plus[vide_plus['nome-geografico'].isin([nome_geo] + simile)]
            pass
        else:   # if no variants just load names with this place of birth
            local_records = vide_plus[vide_plus['nome-geografico'] == nome_geo]

        previous_geo_name = nome_geo

    # search for records with name matching the vide expression of this record
    candidates = []

    found_lookup_name = local_records['name_sp']==lookup_name

    for cid,same_name in local_records[found_lookup_name].iterrows():
        if same_name['lookup_sp'] == nome and cid != id:
            candidates.append(cid)

    # if nothing found search for records with vide expression equal to this one
    if len(candidates) == 0:
        found_lookup_name = local_records['lookup_sp']==lookup_name

        for cid,same_name in local_records[found_lookup_name].iterrows():
            if same_name['lookup_sp'] == nome and cid != id:
                candidates.append(cid)

    if len(candidates) > 0:  # we found some candidates
        for cand in candidates:
            mrec_type = vide_plus.loc[[cand]].iloc[0]['rec_type']
            mtype = f'{rec_type}-{mrec_type}'
            match = (id,cand,mtype)
            if match not in random_matches:
                random_matches.append(match)




##### Analyse results (random)

In [73]:
method = 'random'
match_records['matched_pairs'][method] = list(set(random_matches))
match_info.loc['matched_pairs', method] = len(match_records['matched_pairs'][method])

# pairs
pairs_see_aka = [(o,d,mtype) for (o,d,mtype) in random_matches if mtype == 'see-aka']
pairs_aka_see = [(o,d,mtype) for (o,d,mtype) in random_matches if mtype == 'aka-see']
pairs_aka_aka = [(o,d,mtype) for (o,d,mtype) in random_matches if mtype == 'aka-aka']
pairs_see_see = [(o,d,mtype) for (o,d,mtype) in random_matches if mtype == 'see-see']


# records
rec_matched = set([id for (id,d,t) in random_matches]+
                  [id for (o,id,t) in random_matches])
rec_see_aka = set([id for (id,d,t) in pairs_see_aka])
rec_aka_see = set([id for (o,id,t) in pairs_aka_see])
rec_see_see = set([id for (id,d,t) in pairs_see_see] +
                  [id for (o,id,t) in pairs_see_see])
rec_aka_aka = set([id for (id,d,t) in pairs_aka_aka] +
                  [id for (o,id,t) in pairs_aka_aka])

match_records['records_matched'][method] = list(rec_matched)
match_records['records_see_aka'][method] = list(rec_see_aka)
match_records['records_aka_see'][method] = list(rec_aka_see)
match_records['records_aka_aka'][method] = list(rec_aka_aka)
match_records['records_see_see'][method] = list(rec_see_see)

match_info.at['records_matched', method] = len(rec_matched)
match_info.at['records_see_aka', method] = len(rec_see_aka)
match_info.at['records_aka_see', method] = len(rec_aka_see)
match_info.at['records_aka_aka', method] = len(rec_aka_aka)
match_info.at['records_see_see', method] = len(rec_see_see)

# new
match_info.loc['aka_matched', method] = len(rec_aka_see.union(rec_aka_aka))
match_records['aka_matched'][method] = list(rec_aka_see.union(rec_aka_aka))
match_info.loc['see_matched', method] = len(rec_see_aka.union(rec_see_see))
match_records['see_matched'][method] = list(rec_see_aka.union(rec_see_see))

matched_rand_df = pd.DataFrame(random_matches, columns=['from','to','type'])
matched_rand_df.groupby('type').count()

Unnamed: 0_level_0,from,to
type,Unnamed: 1_level_1,Unnamed: 2_level_1
aka-aka,214,214
aka-see,1730,1730
see-aka,1755,1755
see-see,40,40


#####  Check the matches for errors (random)

In [75]:
import networkx as nx

method = 'random'

matched_pairs = match_records['matched_pairs'][method]
records = match_records['records_matched'][method]

matched_multiple = []
matched_single = []

origins = [o for (o,d,t) in matched_pairs]
destinations = [d for (o,d,t) in matched_pairs]
rec_in_matches = origins + destinations
for i in rec_in_matches:
    c = rec_in_matches.count(i)
    if c >2:
        matched_multiple.append(i)
    elif c == 1:
        matched_single.append(i)
matched_multiple = list(set(matched_multiple))
matched_single = list(set(matched_single))

print("Records with more than two matches    :", len(matched_multiple))
print("Records asymmetric (one match only)   :", len(matched_single))

G = nx.Graph()
simple_pairs = [(o,d) for (o,d,t) in matched_pairs]
G.add_edges_from(simple_pairs)
transitive  = [c for c in nx.connected_components(G) if len(c) > 2]
# number of records in ambiguous matches
amb_records = [item for amb in transitive for item in amb]
namb_records = len(set(amb_records))
print("Records in ambiguous matches          :", namb_records)
for amb in transitive:
    print(amb)
print("Are multiple in ambiguous             :",set(matched_multiple).issubset(set(amb_records)))

rec_errors = set(amb_records).union(matched_multiple).union(matched_single)
rec_ok = set(records).difference(rec_errors)

match_records['records_error'][method] = list(rec_errors)
match_info.at['records_error', method] = len(rec_errors)
match_records['records_matched_ok'][method] = list(rec_ok)
match_info.at['records_matched_ok', method] = len(rec_ok)
match_records['records_asymmetric'][method] = list(matched_single)
match_info.at['records_asymmetric', method] = len(matched_single)
match_records['records_transitive'][method] = list(amb_records)
match_info.at['records_transitive', method] = namb_records

# new
aka = match_records['aka']['data']
aka_ok = set(aka).intersection(rec_ok)
see = match_records['see']['data']
see_ok = set(see).intersection(rec_ok)
match_info.loc['aka_matched_ok', method] = len(aka_ok)
match_records['aka_matched_ok'][method] = list(aka_ok)
match_info.loc['see_matched_ok', method] = len(see_ok)
match_records['see_matched_ok'][method] = list(see_ok)

pairs_ok = set([(o,d,t) for (o,d,t) in match_records['matched_pairs'][method]
                                                        if o not in rec_errors and d not in rec_errors])
match_records['matched_pairs_ok'][method] = list(pairs_ok)
match_info.loc['matched_pairs_ok', method] = len(pairs_ok)

vide_plus.loc[matched_single,'match_error'] = False # we dont consider a single match an error
vide_plus.loc[matched_single,'match_obs'] = "W01-Single match (asymmetric) "+method
vide_plus.loc[matched_multiple,'match_error'] = True
vide_plus.loc[matched_multiple,'match_obs'] = "E02-Multiple match "+method
vide_plus.loc[amb_records,'match_error'] = True
vide_plus.loc[amb_records,'match_obs'] = "E03-Ambiguity in match "+method
match_info.fillna('')

Records with more than two matches    : 39
Records asymmetric (one match only)   : 74
Records in ambiguous matches          : 119
{'206151', '161932', '255769', '203494', '203487'}
{'199474', '193320', '199475'}
{'216361', '251534', '251536'}
{'172681', '190285', '172677'}
{'206536', '233838', '206540'}
{'241045', '148119', '241043'}
{'242104', '238985', '242102'}
{'169757', '136283', '162809'}
{'139146', '139166', '153316'}
{'143426', '175730', '175731'}
{'183306', '235009', '183307'}
{'171377', '142388', '142386'}
{'152894', '152890', '243711'}
{'245531', '245530', '146206'}
{'233524', '245070', '233525'}
{'201733', '201728', '254410'}
{'168662', '211704', '211706', '211703'}
{'166756', '164823', '164824'}
{'266130', '266114', '146547'}
{'231025', '169177', '231024'}
{'238845', '233035', '238842'}
{'230250', '239516', '230251'}
{'169888', '214147', '169890'}
{'222615', '222621', '172529'}
{'233103', '221393', '233094'}
{'240879', '147410', '240882'}
{'160196', '160158', '152482'}
{'2

Unnamed: 0,data,sequential,random
aka,3046.0,,
aka_fac,3020.0,,
aka_geo,2958.0,,
aka_matched,,1910.0,1927.0
aka_matched_ok,,1894.0,1853.0
aka_pai,1617.0,,
matched_pairs,,3644.0,3739.0
matched_pairs_ok,,3620.0,3546.0
nodate,6284.0,,
nodate_novide,264.0,,


##### Show some of the ambiguous records


In [77]:
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

show_only = 6
for ambiguous_records in transitive[:show_only]:
    print("ambiguous_records",ambiguous_records)
    for id in ambiguous_records:
        print(f"https://pesquisa.auc.uc.pt/details?id={id}")

    display_group_attributes(ambiguous_records,
                             entity_type='person',
                             header_elements=['id', 'name'],
                             header_attributes=['nome-vide','naturalidade','uc-entrada','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_date','the_type','the_value'],
                             cmap_name='Pastel1',
                             db=db)

ambiguous_records {'206151', '161932', '255769', '203494', '203487'}
https://pesquisa.auc.uc.pt/details?id=206151
https://pesquisa.auc.uc.pt/details?id=161932
https://pesquisa.auc.uc.pt/details?id=255769
https://pesquisa.auc.uc.pt/details?id=203494
https://pesquisa.auc.uc.pt/details?id=203487


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,206151,Francisco Rodrigues de Valadares,nome-vide,Rodrigues,0000-00-00,1331.0,2.0,,Vila Viçosa,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,Rodrigo Rodrigues,0000-00-00,,206151
1,203494,Francisco Rodrigues,nome-vide,Valadares,1605-10-20,10427.0,2.0,,Vila Viçosa,1605-10-20,,1605-10-20,1605-10-20,,Cânones,1605-10-20,Cânones,Rodrigo Rodrigues,1605-10-20,,203494
2,255769,Francisco Rodrigues,nome-vide,Valadares,1605-10-20,14054.0,2.0,,Vila Viçosa,1605-10-20,,1605-10-20,1605-10-20,,Cânones,1605-10-20,Cânones,Rodrigo Rodrigues,1605-10-20,,255769
3,161932,Francisco Rodrigues Ribeiro,nome-vide,Rodrigues,1612-10-22,7331.0,2.0,,Vila Viçosa,1612-10-22,,1612-10-22,1612-10-22,,Leis,1612-10-22,Leis,Manuel Ribeiro,1612-10-22,,161932
4,203487,Francisco Rodrigues,,,,,,,Vila Viçosa,0000-00-00,,0000-00-00,0000-00-00,,Leis,0000-00-00,leis,Manuel Ribeiro,0000-00-00,,203487


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,206151,Francisco Rodrigues de Valadares,faculdade,Cânones,0000-00-00,Cânones,206151
1,203487,Francisco Rodrigues,faculdade,Leis,0000-00-00,leis,203487
2,203487,Francisco Rodrigues,naturalidade,Vila Viçosa,0000-00-00,,203487
3,206151,Francisco Rodrigues de Valadares,naturalidade,Vila Viçosa,0000-00-00,,206151
4,203487,Francisco Rodrigues,nome,Francisco Rodrigues,0000-00-00,,203487
5,206151,Francisco Rodrigues de Valadares,nome,Francisco Rodrigues,0000-00-00,"Francisco Rodrigues de Valadares, vide Rodrigues",206151
6,206151,Francisco Rodrigues de Valadares,nome,Francisco Rodrigues de Valadares,0000-00-00,,206151
7,203487,Francisco Rodrigues,nome-pai,Manuel Ribeiro,0000-00-00,,203487
8,206151,Francisco Rodrigues de Valadares,nome-pai,Rodrigo Rodrigues,0000-00-00,,206151
9,206151,Francisco Rodrigues de Valadares,nome-vide,Rodrigues,0000-00-00,,206151


ambiguous_records {'199474', '193320', '199475'}
https://pesquisa.auc.uc.pt/details?id=199474
https://pesquisa.auc.uc.pt/details?id=193320
https://pesquisa.auc.uc.pt/details?id=199475


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,199475,António Coelho da Costa,nome-vide,Coelho,0000-00-00,2902.0,2.0,,Lisboa,0000-00-00,,0000-00-00,0000-00-00,,,,,Sebastião Coelho,0000-00-00,,199475
1,193320,António Coelho,nome-vide,da Costa,1611-10-04,11564.0,2.0,,Lisboa,1611-10-04,,1611-10-04,1611-10-04,,Cânones,1611-10-04,Cânones,Sebastião Coelho,1611-10-04,,193320
2,199474,António Coelho da Costa,,,,,,,Lisboa,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,Sebastião Coelho,0000-00-00,,199474


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,199474,António Coelho da Costa,faculdade,Cânones,0000-00-00,Cânones,199474
1,199474,António Coelho da Costa,naturalidade,Lisboa,0000-00-00,,199474
2,199475,António Coelho da Costa,naturalidade,Lisboa,0000-00-00,,199475
3,199475,António Coelho da Costa,nome,António Coelho,0000-00-00,"António Coelho da Costa, vide Coelho",199475
4,199474,António Coelho da Costa,nome,António Coelho da Costa,0000-00-00,,199474
5,199475,António Coelho da Costa,nome,António Coelho da Costa,0000-00-00,,199475
6,199474,António Coelho da Costa,nome-pai,Sebastião Coelho,0000-00-00,,199474
7,199475,António Coelho da Costa,nome-pai,Sebastião Coelho,0000-00-00,,199475
8,199475,António Coelho da Costa,nome-vide,Coelho,0000-00-00,,199475
9,199474,António Coelho da Costa,uc-entrada,0000-00-00,0000-00-00,,199474


ambiguous_records {'216361', '251534', '251536'}
https://pesquisa.auc.uc.pt/details?id=216361
https://pesquisa.auc.uc.pt/details?id=251534
https://pesquisa.auc.uc.pt/details?id=251536


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,216361,Manuel Vidigal,nome-vide,Morais,0000-00-00,10904,2,,Évora,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,,,,216361
1,251534,Manuel Vidigal de Morais,nome-vide,Vidigal,0000-00-00,11765,2,,Évora,0000-00-00,,0000-00-00,0000-00-00,,,,,Crispim Luís,0000-00-00,,251534
2,251536,Manuel Vidigal de Morais,nome-vide,Vidigal,1667-10-21,11840,2,,Évora,1667-10-21,,1667-10-21,1667-10-21,,Cânones,1667-10-21,Cânones,,,,251536


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,216361,Manuel Vidigal,faculdade,Cânones,0000-00-00,Cânones,216361
1,216361,Manuel Vidigal,naturalidade,Évora,0000-00-00,,216361
2,251534,Manuel Vidigal de Morais,naturalidade,Évora,0000-00-00,,251534
3,216361,Manuel Vidigal,nome,Manuel Vidigal,0000-00-00,,216361
4,251534,Manuel Vidigal de Morais,nome,Manuel Vidigal,0000-00-00,"Manuel Vidigal de Morais, vide Vidigal",251534
5,216361,Manuel Vidigal,nome,Manuel Vidigal Morais,0000-00-00,"Manuel Vidigal, vide Morais",216361
6,251534,Manuel Vidigal de Morais,nome,Manuel Vidigal de Morais,0000-00-00,,251534
7,251534,Manuel Vidigal de Morais,nome-pai,Crispim Luís,0000-00-00,,251534
8,216361,Manuel Vidigal,nome-vide,Morais,0000-00-00,,216361
9,251534,Manuel Vidigal de Morais,nome-vide,Vidigal,0000-00-00,,251534


ambiguous_records {'172681', '190285', '172677'}
https://pesquisa.auc.uc.pt/details?id=172681
https://pesquisa.auc.uc.pt/details?id=190285
https://pesquisa.auc.uc.pt/details?id=172677


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,190285,Amaro Nunes Homem,nome-vide,Nunes,0000-00-00,11264,2,,Ventosa,0000-00-00,,0000-00-00,0000-00-00,,,,,,,,190285
1,172681,Amaro Nunes,nome-vide,Homem,1614-10-09,14526,2,,Ventosa,1614-10-09,,1614-10-09,1614-10-09,,Medicina,1614-10-09,Medicina,Simão Pires,1614-10-09,,172681
2,172677,Amaro Nunes,nome-vide,Homem,1636-11-10,14472,2,,Ventosa,1636-11-10,,1636-11-10,1636-11-10,,Medicina,1636-11-10,Medicina,Simão Pires de Morais,1636-11-10,,172677


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,190285,Amaro Nunes Homem,naturalidade,Ventosa,0000-00-00,,190285
1,190285,Amaro Nunes Homem,nome,Amaro Nunes,0000-00-00,"Amaro Nunes Homem, vide Nunes",190285
2,190285,Amaro Nunes Homem,nome,Amaro Nunes Homem,0000-00-00,,190285
3,190285,Amaro Nunes Homem,nome-vide,Nunes,0000-00-00,,190285
4,190285,Amaro Nunes Homem,uc-entrada,0000-00-00,0000-00-00,,190285
5,190285,Amaro Nunes Homem,uc-saida,0000-00-00,0000-00-00,,190285
6,172681,Amaro Nunes,faculdade,Medicina,1614-10-09,Medicina,172681
7,172681,Amaro Nunes,matricula-faculdade,Medicina,1614-10-09,09.10.1614,172681
8,172681,Amaro Nunes,matricula-outra,Matrícula(s): 1619,1614-10-09,,172681
9,172681,Amaro Nunes,naturalidade,Ventosa,1614-10-09,,172681


ambiguous_records {'206536', '233838', '206540'}
https://pesquisa.auc.uc.pt/details?id=206536
https://pesquisa.auc.uc.pt/details?id=233838
https://pesquisa.auc.uc.pt/details?id=206540


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,233838,António do Vale Soutomaior,nome-vide,Vale,0000-00-00,6731,2,,Lisboa,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,,233838
1,206540,António do Vale,nome-vide,Soutomaior,1648-10-15,4135,2,,Lisboa,1648-10-15,,1648-10-15,1648-10-15,,Cânones,1648-10-15,Cânones,,206540
2,206536,António do Vale,nome-vide,Soutomaior,1667-10-01,4090,2,,Lisboa,1667-10-01,,1667-10-01,1667-10-01,,Cânones,1667-10-01,Cânones,,206536


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,233838,António do Vale Soutomaior,faculdade,Cânones,0000-00-00,Cânones,233838
1,233838,António do Vale Soutomaior,naturalidade,Lisboa,0000-00-00,,233838
2,233838,António do Vale Soutomaior,nome,António do Vale,0000-00-00,"António do Vale Soutomaior, vide Vale",233838
3,233838,António do Vale Soutomaior,nome,António do Vale Soutomaior,0000-00-00,,233838
4,233838,António do Vale Soutomaior,nome-nota,padre,0000-00-00,,233838
5,233838,António do Vale Soutomaior,nome-vide,Vale,0000-00-00,,233838
6,233838,António do Vale Soutomaior,padre,sim,0000-00-00,padre,233838
7,233838,António do Vale Soutomaior,uc-entrada,0000-00-00,0000-00-00,,233838
8,233838,António do Vale Soutomaior,uc-saida,0000-00-00,0000-00-00,,233838
9,206540,António do Vale,faculdade,Cânones,1648-10-15,Cânones,206540


ambiguous_records {'241045', '148119', '241043'}
https://pesquisa.auc.uc.pt/details?id=241045
https://pesquisa.auc.uc.pt/details?id=148119
https://pesquisa.auc.uc.pt/details?id=241043


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,241043,Manuel de Araújo e Sousa,nome-vide,Araújo,0000-00-00,9810,2,,Ilha da Madeira,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,,,,241043
1,241045,Manuel de Araújo e Sousa,nome-vide,Araújo,0000-00-00,9828,2,,Ilha da Madeira,0000-00-00,,0000-00-00,0000-00-00,,Cânones,0000-00-00,Cânones,Pedro Rodrigues,0000-00-00,,241045
2,148119,Manuel de Araújo,nome-vide,Sousa,1651-01-08,5834,2,,Ilha da Madeira,1651-01-08,,1651-01-08,1651-01-08,,Cânones,1651-01-08,Cânones,António Fernandes,1651-01-08,,148119


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,241043,Manuel de Araújo e Sousa,faculdade,Cânones,0000-00-00,Cânones,241043
1,241045,Manuel de Araújo e Sousa,faculdade,Cânones,0000-00-00,Cânones,241045
2,241043,Manuel de Araújo e Sousa,naturalidade,Ilha da Madeira,0000-00-00,,241043
3,241045,Manuel de Araújo e Sousa,naturalidade,Ilha da Madeira,0000-00-00,,241045
4,241043,Manuel de Araújo e Sousa,nome,Manuel de Araújo,0000-00-00,"Manuel de Araújo e Sousa, vide Araújo",241043
5,241045,Manuel de Araújo e Sousa,nome,Manuel de Araújo,0000-00-00,"Manuel de Araújo e Sousa, vide Araújo",241045
6,241043,Manuel de Araújo e Sousa,nome,Manuel de Araújo e Sousa,0000-00-00,,241043
7,241045,Manuel de Araújo e Sousa,nome,Manuel de Araújo e Sousa,0000-00-00,,241045
8,241045,Manuel de Araújo e Sousa,nome-pai,Pedro Rodrigues,0000-00-00,,241045
9,241043,Manuel de Araújo e Sousa,nome-vide,Araújo,0000-00-00,,241043


#### Show some of the aka-aka records (potential duplicates)

In [54]:
from timelinknb.pandas import display_group_attributes

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

pairs = match_records['matched_pairs']['random']
show_pairs = [(o,d,t) for o,d,t in pairs if t == 'aka-aka' and o<d]
show_only = 4
print(f"aka-aka matches in sequential mode (show only {show_only}) of {len(show_pairs)}:")
for o,d,t in show_pairs[:show_only]:
    display_group_attributes([o,d],
                             header_cols=['name','nome-vide','naturalidade','uc-entrada','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

aka-aka matches in sequential mode (show only 4) of 109:


Unnamed: 0,id,name,nome-vide,naturalidade,uc-entrada,faculdade,nome-pai
0,200599,João Teixeira,Morais,Bragança,1615-10-26,Cânones,Jacome de Morais
1,251249,João Teixeira de Morais,Teixeira,Bragança,1616-02-22,Cânones,


Unnamed: 0,date,id,type,value,attr_obs
0,1615-10-26,200599,faculdade,Cânones,Cânones
1,1615-10-26,200599,grau,Bacharel em Cânones,
2,1615-10-26,200599,matricula-faculdade,Cânones,26.10.1615
3,1615-10-26,200599,naturalidade,Bragança,
4,1615-10-26,200599,nome,João Teixeira,
5,1615-10-26,200599,nome,João Teixeira Morais,"João Teixeira, vide Morais"
6,1615-10-26,200599,nome-pai,Jacome de Morais,
7,1615-10-26,200599,nome-vide,Morais,
8,1615-10-26,200599,uc-entrada,1615-10-26,
9,1615-10-26,200599,uc-entrada.ano,1615,


Unnamed: 0,id,name,nome-vide,naturalidade,uc-entrada,faculdade,nome-pai
0,207604,João Rodrigues,Pinheiro,Montemor-o-Velho,1558-02-03,Teologia,
1,244182,João Rodrigues Pinheiro,Rodrigues,Montemor-o-Velho,1566-02-27,Teologia,


Unnamed: 0,date,id,type,value,attr_obs
0,1558-02-03,207604,faculdade,Teologia,Faculdade inferida
1,1558-02-03,207604,naturalidade,Montemor-o-Velho,
2,1558-02-03,207604,nome,João Rodrigues,
3,1558-02-03,207604,nome,João Rodrigues Pinheiro,"João Rodrigues, vide Pinheiro"
4,1558-02-03,207604,nome-vide,Pinheiro,
5,1558-02-03,207604,uc-entrada,1558-02-03,
6,1558-02-03,207604,uc-entrada.ano,1558,
7,1559-02-26,207604,grau,Bacharel em Artes,Bacharel em artes 1559.02.26
8,1559-02-26,207604,matricula-faculdade,Artes,
9,1560-00-00:1561-05-00,207604,matricula-faculdade,Teologia,


Unnamed: 0,id,name,nome-vide,naturalidade,uc-entrada,faculdade,nome-pai
0,149046,Bernardo José de Azevedo,Vieira,Paredes,1725-10-01,Cânones,José de Azevedo Vieira
1,216902,Bernardo José de Azevedo Vieira,Azevedo,Paredes,1726-10-01,Cânones,


Unnamed: 0,date,id,type,value,attr_obs
0,1725-10-01,149046,faculdade,Cânones,Cânones
1,1725-10-01,149046,instituta,1725-10-01,01.10.1725 1725-10-01
2,1725-10-01,149046,naturalidade,Paredes,
3,1725-10-01,149046,nome,Bernardo José de Azevedo,
4,1725-10-01,149046,nome,Bernardo José de Azevedo Vieira,"Bernardo José de Azevedo, vide Vieira"
5,1725-10-01,149046,nome-nota,padre,
6,1725-10-01,149046,nome-pai,José de Azevedo Vieira,
7,1725-10-01,149046,nome-vide,Vieira,
8,1725-10-01,149046,padre,sim,padre
9,1725-10-01,149046,uc-entrada,1725-10-01,


Unnamed: 0,id,name,nome-vide,naturalidade,uc-entrada,faculdade,nome-pai
0,169177,Amaro Simões das Neves,Simões,Lousã,1669-10-01,Teologia,
1,231025,Amaro Simões,Neves,Lousã,1661-11-21,Cânones,
2,231025,Amaro Simões,Neves,Lousã,1661-11-21,Teologia,


Unnamed: 0,date,id,type,value,attr_obs
0,1661-11-21,231025,faculdade,Cânones,Faculdade corrigida
1,1661-11-21,231025,faculdade,Teologia,Faculdade corrigida
2,1661-11-21,231025,faculdade-original,Teologia,
3,1661-11-21,231025,instituta,1661-11-21,21.11.1661 1661-11-21
4,1661-11-21,231025,naturalidade,Lousã,
5,1661-11-21,231025,nome,Amaro Simões,
6,1661-11-21,231025,nome,Amaro Simões Neves,"Amaro Simões, vide Neves"
7,1661-11-21,231025,nome-vide,Neves,
8,1661-11-21,231025,uc-entrada,1661-11-21,
9,1661-11-21,231025,uc-entrada.ano,1661,


In [78]:
vide_plus.loc['217701']

name                                          José de Santo António
nome-vide.type                                            nome-vide
nome-vide                                                 Lencastre
nome-vide.line                                               5027.0
nome-vide.level                                                 2.0
nome-geografico                                                 NaN
faculdade                                                       NaN
faculdade.date                                                  NaN
faculdade.obs                                                   NaN
nome-pai                                                        NaN
uc-entrada                                               0000-00-00
uc-saida                                                 0000-00-00
uc-saida.date                                            0000-00-00
uc-saida.obs                                                       
rec_type                                        

In [79]:

# set back the missing nome_geografico to null
no_geo_filter = vide_plus['nome-geografico'] == '***NA***'
vide_plus.loc[no_geo_filter,'nome-geografico'] = np.nan
print(len(vide_plus[vide_plus['nome-geografico'] == '***NA***']))
vide_plus.info()

0
<class 'pandas.core.frame.DataFrame'>
Index: 9570 entries, 198423 to 230315
Data columns (total 27 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              9570 non-null   object 
 1   nome-vide.type    9286 non-null   object 
 2   nome-vide         9286 non-null   object 
 3   nome-vide.line    9286 non-null   float64
 4   nome-vide.level   9286 non-null   float64
 5   nome-geografico   9004 non-null   object 
 6   faculdade         5001 non-null   object 
 7   faculdade.date    5001 non-null   object 
 8   faculdade.obs     5001 non-null   object 
 9   nome-pai          3573 non-null   object 
 10  uc-entrada        9570 non-null   object 
 11  uc-saida          9570 non-null   object 
 12  uc-saida.date     9570 non-null   object 
 13  uc-saida.obs      9570 non-null   object 
 14  rec_type          9570 non-null   object 
 15  uc-entrada.type   284 non-null    object 
 16  uc-entrada.line   284 non-null    floa


Why:
* 230791	Abrantes	Manuel Fernandes da Silveira	None	Manuel Fernandes da Silveira	Fernandes da Silveira	0000-00-00	NaN
* 230756	Abrantes	Manuel Fernandes da Silveira	None	Manuel da Silveira	Manuel Fernandes da Silveira	1598-10-12	Leis

### Merge the result of the two methods, check for errors, again

Since we are mixing matches from two different methods it can happen that, together, they introduce new errrors,  specially transitive matches.

In [83]:
import networkx as nx
method = 'data'  # short for merged methods

matched_rand_pairs = match_records['matched_pairs']['random']
matched_seq_pairs = match_records['matched_pairs']['sequential']
matched_pairs = list(set(matched_rand_pairs + matched_seq_pairs))
print("Number of matched pairs (union of both methods)  :",len(matched_pairs))
match_records['matched_pairs'][method] = matched_pairs
match_info.loc['matched_pairs',method] = len(matched_pairs)

rec_errors_seq = match_records['records_error']['sequential']
rec_errors_rand = match_records['records_error']['random']


# And now filter, this is necessary because error detected are different in each method
matched_multiple = []
matched_single = []

origins = [o for (o,d,t) in matched_pairs]
destinations = [d for (o,d,t) in matched_pairs]
rec_in_matches = origins + destinations
for i in rec_in_matches:
    c = rec_in_matches.count(i)
    if i == '172681':
        pass
    if c >2:
        matched_multiple.append(i)
    elif c == 1:
        matched_single.append(i)


matched_multiple = list(set(matched_multiple))
matched_single = list(set(matched_single))
print("Number of matches random              :",len(matched_rand_pairs))
print("Number of matches sequential          :",len(matched_seq_pairs))
print("Number of matches both                :",len(matched_pairs))
print("Records with more than two matches    :", len(matched_multiple))
print("Records with just one match           :", len(matched_single))

# alternative method, perhaps more informative:
pairs_to_check = matched_pairs
print()
print("The following pairs have to reverse match:")
asymmetric_pairs = []
for (o,d,t) in matched_pairs:
    if t == 'see-aka':
        rt = 'aka-see'
    elif t == 'aka-see':
        rt = 'see-aka'
    else:
        rt = t
    if (d,o,rt) not in matched_pairs:
        asymmetric_pairs.append((o,d,t))
        print((o,d,t))

# now test for transitivity
G = nx.Graph()
simple_pairs = [(o,d) for (o,d,t) in matched_pairs]
G.add_edges_from(simple_pairs)
transitive  = [c for c in nx.connected_components(G) if len(c) > 2]

# number of records in ambiguous matchesamb_records = [item for amb in transitive for item in amb]
namb_records = len(set(amb_records))
print("Records in ambiguous matches          :", namb_records)

match_records['records_transitive'][method] = list(amb_records)
match_info.loc['records_transitive', method] = namb_records

for amb in transitive:
    print(amb)
print("Are multiple in ambiguous             :",set(matched_multiple).issubset(set(amb_records)))

vide_plus.loc[matched_single,'match_error'] = False
vide_plus.loc[matched_single,'match_obs'] = "W01-Single match (asymmetric) "+method

vide_plus.loc[matched_multiple,'match_error'] = True
vide_plus.loc[matched_multiple,'match_obs'] = "E03-Multiple match"+method

vide_plus.loc[amb_records,'match_error'] = True
vide_plus.loc[amb_records,'match_obs'] = "E04-Ambiguity in match "+method

# do a new list of records in error
rec_errors = set(amb_records).union(matched_multiple)

records = set(rec_in_matches)
rec_ok = records.difference(rec_errors)

print("Records involved in matches           :", len(records))
print("Records matched without errors        :", len(rec_ok))
print("Records matched with errors           :", len(rec_errors))

match_records['records_error'][method] = list(rec_errors)
match_info.loc['records_error', method] = len(rec_errors)
match_records['records_matched_ok'][method] = list(rec_ok)
match_info.loc['records_matched_ok', method] = len(rec_ok)
match_records['records_matched'][method] = list(records)
match_info.loc['records_matched', method] = len(records)

aka = match_records['aka']['data']
aka_ok = set(aka).intersection(rec_ok)
see = match_records['see']['data']
see_ok = set(see).intersection(rec_ok)
match_info.loc['aka_matched_ok', method] = len(aka_ok)
match_records['aka_matched_ok'][method] = list(aka_ok)
match_info.loc['see_matched_ok', method] = len(see_ok)
match_records['see_matched_ok'][method] = list(see_ok)

pairs_ok = set([(o,d,t) for (o,d,t) in match_records['matched_pairs'][method]
                                                        if o not in rec_errors and d not in rec_errors])
match_records['matched_pairs_ok'][method] = list(pairs_ok)
match_info.loc['matched_pairs_ok', method] = len(pairs_ok)


Number of matched pairs (union of both methods)  : 3821
Number of matches random              : 3739
Number of matches sequential          : 3644
Number of matches both                : 3821
Records with more than two matches    : 39
Records with just one match           : 74

The following pairs have to reverse match:
('181367', '214929', 'see-aka')
('225717', '173224', 'aka-see')
('148963', '182602', 'see-aka')
('256874', '141866', 'see-aka')
('164227', '248237', 'see-see')
('206505', '158689', 'see-aka')
('182659', '233550', 'aka-see')
('130534', '163482', 'see-aka')
('205772', '209662', 'see-aka')
('186417', '132440', 'see-aka')
('209320', '151354', 'see-aka')
('129384', '139423', 'see-aka')
('221053', '207054', 'see-aka')
('204089', '196842', 'see-aka')
('166987', '163021', 'aka-see')
('233397', '230482', 'see-aka')
('177796', '184419', 'aka-see')
('251029', '131947', 'see-aka')
('198768', '188508', 'aka-see')
('241346', '197630', 'see-aka')
('252478', '245318', 'see-aka')
('24102

Detect asymmetries in matches with no errors (asymmetries are not considered errors)

In [84]:
method = 'data'  # short for merged methods

pairs_ok = match_records['matched_pairs_ok'][method]
# pairs
pairs_see_aka = [(o,d,mtype) for (o,d,mtype) in pairs_ok if mtype == 'see-aka']
pairs_aka_see = [(o,d,mtype) for (o,d,mtype) in pairs_ok if mtype == 'aka-see']
pairs_aka_aka = [(o,d,mtype) for (o,d,mtype) in pairs_ok if mtype == 'aka-aka']
pairs_see_see = [(o,d,mtype) for (o,d,mtype) in pairs_ok if mtype == 'see-see']

aka_in_see_aka = [d for (o,d,mtype) in pairs_ok if mtype == 'see-aka']
aka_in_aka_see = [o for (o,d,mtype) in pairs_ok if mtype == 'aka-see']
asymmetry_aka = sorted(list(set(aka_in_see_aka) ^ set(aka_in_aka_see)))

print("Asymmetry for aka:", asymmetry_aka )
print("N Asymmetry for aka:", len(asymmetry_aka) )

see_in_see_aka = [o for (o,d,mtype) in pairs_ok if mtype == 'see-aka']
see_in_aka_see = [d for (o,d,mtype) in pairs_ok if mtype == 'aka-see']
asymmetry_see = sorted(list(set(see_in_see_aka) ^ set(see_in_aka_see)))

print("Asymmetry for see:", asymmetry_see )
print("N Asymmetry for see:", len(asymmetry_see))

# alternative method, perhaps more informative:
print()
pairs_to_check = pairs_ok
print("Asymmetric matches: for each match bellow the reverse one was not found")
asymmetric_pairs = []
for (o,d,t) in pairs_to_check:
    if t == 'see-aka':
        rt = 'aka-see'
    elif t == 'aka-see':
        rt = 'see-aka'
    else:
        rt = t
    if (d,o,rt) not in pairs_to_check:
        asymmetric_pairs.append((o,d,t))
        print((o,d,t))

asymmetric_records = set([o for (o,d,t) in asymmetric_pairs] + [d for (o,d,t) in asymmetric_pairs])
print()

match_info.loc['records_asymmetric',method] = len(asymmetric_records)
match_records['records_asymmetric'][method] = list(asymmetric_records)
# records
rec_matched = set([id for (id,d,t) in pairs_ok]+
                  [id for (o,id,t) in pairs_ok])
rec_see_aka = set([id for (id,d,t) in pairs_see_aka])
rec_aka_see = set([id for (o,id,t) in pairs_aka_see])
rec_see_see = set([id for (id,d,t) in pairs_see_see] +
                  [id for (o,id,t) in pairs_see_see])
rec_aka_aka = set([id for (id,d,t) in pairs_aka_aka] +
                  [id for (o,id,t) in pairs_aka_aka])

match_records['records_see_aka'][method] = list(rec_see_aka)
match_records['records_aka_see'][method] = list(rec_aka_see)
match_records['records_aka_aka'][method] = list(rec_aka_aka)
match_records['records_see_see'][method] = list(rec_see_see)

match_info.at['records_see_aka', method] = len(rec_see_aka)
match_info.at['records_aka_see', method] = len(rec_aka_see)
match_info.at['records_aka_aka', method] = len(rec_aka_aka)
match_info.at['records_see_see', method] = len(rec_see_see)

# new
match_info.loc['aka_matched', method] = len(rec_aka_see.union(rec_aka_aka))
match_records['aka_matched'][method] = list(rec_aka_see.union(rec_aka_aka))
match_info.loc['see_matched', method] = len(rec_see_aka.union(rec_see_see))
match_records['see_matched'][method] = list(rec_see_aka.union(rec_see_see))

matched_rand_df = pd.DataFrame(pairs_ok, columns=['from','to','type'])
matched_rand_df.groupby('type').count()

Asymmetry for aka: ['128907', '131947', '139423', '141866', '142554', '151170', '151354', '158689', '160216', '163482', '166987', '177796', '179399', '181415', '182602', '182659', '196842', '197630', '198768', '200713', '203369', '207054', '208712', '209662', '214929', '225717', '230482', '230756', '239847', '241012', '242752', '245318', '247449', '252345']
N Asymmetry for aka: 34
Asymmetry for see: ['129384', '130534', '134006', '140367', '148963', '149251', '163021', '173224', '181367', '184271', '184419', '185191', '188508', '194373', '195505', '204089', '205772', '206278', '206505', '209320', '212796', '221053', '223100', '226997', '230791', '233397', '233550', '234767', '235056', '241026', '241346', '251029', '252478', '256874']
N Asymmetry for see: 34

Asymmetric matches: for each match bellow the reverse one was not found
('181367', '214929', 'see-aka')
('225717', '173224', 'aka-see')
('148963', '182602', 'see-aka')
('256874', '141866', 'see-aka')
('164227', '248237', 'see-see')

Unnamed: 0_level_0,from,to
type,Unnamed: 1_level_1,Unnamed: 2_level_1
aka-aka,186,186
aka-see,1715,1715
see-aka,1737,1737
see-see,26,26


In [91]:
match_info.info()

<class 'pandas.core.frame.DataFrame'>
Index: 28 entries, aka to vide_plus
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   data        28 non-null     object
 1   sequential  15 non-null     object
 2   random      15 non-null     object
dtypes: object(3)
memory usage: 1.9+ KB


In [None]:
match

In [95]:
match_info.infer_objects(copy=False).fillna('')
# match_info.infer_objects(copy=False)

Unnamed: 0,data,sequential,random
aka,3046,,
aka_fac,3020,,
aka_geo,2958,,
aka_matched,1901,1910.0,1927.0
aka_matched_ok,1929,1894.0,1853.0
aka_pai,1617,,
matched_pairs,3821,3644.0,3739.0
matched_pairs_ok,3664,3620.0,3546.0
nodate,6284,,
nodate_novide,264,,


##### Check role of no date novide records in assymetries matches

Since these records have no "vide" expression they do not generate the symmetric name forookup.

In [96]:
see_with_no_vide = set(match_records['vide_plus']['data']) - set(match_records['vide']['data'])
print("Number of records with see and no vide: ",len(see_with_no_vide))
asymmetric_see_no_vide = list(set(asymmetric_records).intersection(see_with_no_vide))
print("See no vide part in asymmetric matches: ",len(asymmetric_see_no_vide),set(asymmetric_records).intersection(see_with_no_vide))

Number of records with see and no vide:  264
See no vide part in asymmetric matches:  21 {'164227', '184271', '185191', '204089', '256874', '148963', '223100', '140367', '194373', '130534', '129384', '251029', '134006', '209320', '195505', '136704', '149251', '181367', '252478', '234767', '233397'}


##### Check asymmetric pairs

In [99]:

match_list = asymmetric_pairs

pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

show_only = 6
for (o,d,t) in match_list[:show_only]:
    print(o,d,t)
    print(f"https://pesquisa.auc.uc.pt/details?id={o}")
    print(f"https://pesquisa.auc.uc.pt/details?id={d}")
    display_group_attributes([o,d],
                             entity_type='person',
                             header_elements=['id', 'name'],
                             header_attributes=['uc-entrada','nome-vide','naturalidade','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_date','the_type','the_value'],
                             cmap_name='Pastel1',
                             db=db
                             )

181367 214929 see-aka


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,181367,Gaspar de Macedo,uc-entrada,0000-00-00,0000-00-00,2063,2,,,,,"Cepões, Lamego",0000-00-00,Cepões (Lamego),Cânones,0000-00-00,Cânones,Gaspar Luís,0000-00-00,,181367
1,214929,Gaspar de Macedo Sampaio,uc-entrada,1660-11-23,1660-11-23,5077,2,,Macedo,1660-11-23,,"Cepões, Lamego",1660-11-23,Cepões (Lamego),Cânones,1660-11-23,Cânones,Gaspar Luís,1660-11-23,,214929


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,181367,Gaspar de Macedo,faculdade,Cânones,0000-00-00,Cânones,181367
1,181367,Gaspar de Macedo,naturalidade,"Cepões, Lamego",0000-00-00,Cepões (Lamego),181367
2,181367,Gaspar de Macedo,nome,Gaspar de Macedo,0000-00-00,,181367
3,181367,Gaspar de Macedo,nome-pai,Gaspar Luís,0000-00-00,,181367
4,181367,Gaspar de Macedo,uc-entrada,0000-00-00,0000-00-00,,181367
5,181367,Gaspar de Macedo,uc-saida,0000-00-00,0000-00-00,,181367
6,214929,Gaspar de Macedo Sampaio,faculdade,Cânones,1660-11-23,Cânones,214929
7,214929,Gaspar de Macedo Sampaio,instituta,1660-11-23,1660-11-23,23.11.1660 1660-11-23,214929
8,214929,Gaspar de Macedo Sampaio,naturalidade,"Cepões, Lamego",1660-11-23,Cepões (Lamego),214929
9,214929,Gaspar de Macedo Sampaio,nome,Gaspar de Macedo,1660-11-23,"Gaspar de Macedo Sampaio, vide Macedo",214929


225717 173224 aka-see


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,173224,João Nunes,uc-entrada,0000-00-00,0000-00-00,6105,2,,Rogado,0000-00-00,,Terena,0000-00-00,,Leis,0000-00-00,Leis,André Rogado,0000-00-00,,173224
1,225717,João Nunes Rogado,uc-entrada,1587-10-03,1587-10-03,9187,2,,Rogado,1587-10-03,,Terena,1587-10-03,,Leis,1587-10-03,Leis,André Rogado,1587-10-03,,225717


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,173224,João Nunes,faculdade,Leis,0000-00-00,Leis,173224
1,173224,João Nunes,naturalidade,Terena,0000-00-00,,173224
2,173224,João Nunes,nome,João Nunes,0000-00-00,,173224
3,173224,João Nunes,nome,João Nunes Rogado,0000-00-00,"João Nunes, vide Rogado",173224
4,173224,João Nunes,nome-pai,André Rogado,0000-00-00,,173224
5,173224,João Nunes,nome-vide,Rogado,0000-00-00,,173224
6,173224,João Nunes,uc-entrada,0000-00-00,0000-00-00,,173224
7,173224,João Nunes,uc-saida,0000-00-00,0000-00-00,,173224
8,225717,João Nunes Rogado,faculdade,Leis,1587-10-03,Leis,225717
9,225717,João Nunes Rogado,matricula-faculdade,Leis,1587-10-03,1587.10.03,225717


148963 182602 see-aka


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,148963,António José de Magalhães Feio e Azevedo,uc-entrada,0000-00-00,0000-00-00,740,2,,,,,Candoso,0000-00-00,,,,,,148963
1,182602,António José de Magalhães,uc-entrada,1761-10-01,1761-10-01,13905,2,,António José de Magalhães Feio e Azevedo,1761-10-01,,Candoso,1761-10-01,,Cursos jurídicos (Cânones ou Leis),1761-10-01,Faculdade inferida,,182602


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,148963,António José de Magalhães Feio e Azevedo,naturalidade,Candoso,0000-00-00,,148963
1,148963,António José de Magalhães Feio e Azevedo,nome,António José de Magalhães Feio e Azevedo,0000-00-00,,148963
2,148963,António José de Magalhães Feio e Azevedo,uc-entrada,0000-00-00,0000-00-00,,148963
3,148963,António José de Magalhães Feio e Azevedo,uc-saida,0000-00-00,0000-00-00,,148963
4,182602,António José de Magalhães,faculdade,Cursos jurídicos (Cânones ou Leis),1761-10-01,Faculdade inferida,182602
5,182602,António José de Magalhães,instituta,1761-10-01,1761-10-01,01.10.1761 1761-10-01,182602
6,182602,António José de Magalhães,naturalidade,Candoso,1761-10-01,,182602
7,182602,António José de Magalhães,nome,António José de Magalhães,1761-10-01,,182602
8,182602,António José de Magalhães,nome,António José de Magalhães Feio e Azevedo,1761-10-01,"António José de Magalhães, vide António José de Magalhães Feio e Azevedo",182602
9,182602,António José de Magalhães,nome-vide,António José de Magalhães Feio e Azevedo,1761-10-01,,182602


256874 141866 see-aka


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,256874,Manuel Aires de Almeida,uc-entrada,0000-00-00,0000-00-00,8609,2,,,,,Portalegre,0000-00-00,,,,,,,,256874
1,141866,Manuel Aires,uc-entrada,1612-10-24,1612-10-24,8595,2,,Almeida,1612-10-24,,Portalegre,1612-10-24,,Cânones,1612-10-24,Faculdade corrigida,Diogo Serrão,1612-10-24,,141866


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,256874,Manuel Aires de Almeida,naturalidade,Portalegre,0000-00-00,,256874
1,256874,Manuel Aires de Almeida,nome,Manuel Aires de Almeida,0000-00-00,,256874
2,256874,Manuel Aires de Almeida,uc-entrada,0000-00-00,0000-00-00,,256874
3,256874,Manuel Aires de Almeida,uc-saida,0000-00-00,0000-00-00,,256874
4,141866,Manuel Aires,faculdade,Cânones,1612-10-24,Faculdade corrigida,141866
5,141866,Manuel Aires,faculdade,Leis,1612-10-24,Faculdade corrigida,141866
6,141866,Manuel Aires,faculdade-original,Leis,1612-10-24,,141866
7,141866,Manuel Aires,instituta,1612-10-24,1612-10-24,1612/10/24 1612-10-24,141866
8,141866,Manuel Aires,naturalidade,Portalegre,1612-10-24,,141866
9,141866,Manuel Aires,nome,Manuel Aires,1612-10-24,,141866


164227 248237 see-see


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,164227,Aires Fernandes Freire,uc-entrada,0000-00-00,0000-00-00,8648,2,,,,,Lisboa,0000-00-00,,,,,,164227
1,248237,Aires Fernandes,uc-entrada,0000-00-00,0000-00-00,13908,2,,Freire,0000-00-00,,Lisboa,0000-00-00,,Cânones,0000-00-00,Faculdade inferida,,248237


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,248237,Aires Fernandes,faculdade,Cânones,0000-00-00,Faculdade inferida,248237
1,164227,Aires Fernandes Freire,naturalidade,Lisboa,0000-00-00,,164227
2,248237,Aires Fernandes,naturalidade,Lisboa,0000-00-00,,248237
3,248237,Aires Fernandes,nome,Aires Fernandes,0000-00-00,,248237
4,164227,Aires Fernandes Freire,nome,Aires Fernandes Freire,0000-00-00,,164227
5,248237,Aires Fernandes,nome,Aires Fernandes Freire,0000-00-00,"Aires Fernandes, vide Freire",248237
6,248237,Aires Fernandes,nome-vide,Freire,0000-00-00,,248237
7,164227,Aires Fernandes Freire,uc-entrada,0000-00-00,0000-00-00,,164227
8,248237,Aires Fernandes,uc-entrada,0000-00-00,0000-00-00,,248237
9,164227,Aires Fernandes Freire,uc-saida,0000-00-00,0000-00-00,,164227


206505 158689 see-aka


Unnamed: 0,id_1,name,uc-entrada.type,uc-entrada,uc-entrada.date,uc-entrada.line,uc-entrada.level,uc-entrada.obs,nome-vide,nome-vide.date,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,206505,Sebastião de Carvalho Coutinho,uc-entrada,0000-00-00,0000-00-00,11339,2,,Coutinho,0000-00-00,,Coimbra,0000-00-00,,,,,Manuel de Carvalho,0000-00-00,,206505
1,158689,Sebastião de Carvalho,uc-entrada,1658-10-03,1658-10-03,6015,2,,Coutinho,1658-10-03,,Coimbra,1658-10-03,,Cânones,1658-10-03,Cânones,Manuel de Carvalho,1658-10-03,,158689


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,206505,Sebastião de Carvalho Coutinho,naturalidade,Coimbra,0000-00-00,,206505
1,206505,Sebastião de Carvalho Coutinho,nome,Sebastião de Carvalho Coutinho,0000-00-00,"Sebastião de Carvalho Coutinho, vide Coutinho",206505
2,206505,Sebastião de Carvalho Coutinho,nome,Sebastião de Carvalho Coutinho,0000-00-00,,206505
3,206505,Sebastião de Carvalho Coutinho,nome-pai,Manuel de Carvalho,0000-00-00,,206505
4,206505,Sebastião de Carvalho Coutinho,nome-vide,Coutinho,0000-00-00,,206505
5,206505,Sebastião de Carvalho Coutinho,uc-entrada,0000-00-00,0000-00-00,,206505
6,206505,Sebastião de Carvalho Coutinho,uc-saida,0000-00-00,0000-00-00,,206505
7,158689,Sebastião de Carvalho,faculdade,Cânones,1658-10-03,Cânones,158689
8,158689,Sebastião de Carvalho,instituta,1658-10-03,1658-10-03,03.10.1658 1658-10-03,158689
9,158689,Sebastião de Carvalho,naturalidade,Coimbra,1658-10-03,,158689


Add match information to the records

In [100]:
import pandas as pd

pairs = match_records['matched_pairs_ok']['data']
def get_match(id,pairs):
    match_list = [(d,mtype) for (o,d,mtype) in pairs if o == id]
    if len(match_list) == 0:
        return (None,None)
    else:
        return match_list[0]

ids = vide_plus.index.values
matches = [get_match(id,pairs) for id in ids]
cols = pd.DataFrame(matches,columns=['match','match_type'], index=ids)
vide_plus = pd.concat([vide_plus,cols],axis=1)


## Match results

### Match general statistics

In [102]:
nvide_plus = match_info.loc['vide_plus','data']
match_info.infer_objects(copy=False).fillna("")
vars_perc_vide = ['aka','aka_fac','aka_geo','aka_pai',
                'nodate','nodate_novide',
                'records_matched','records_matched_ok',
                'records_see_aka','records_see_see',
                'records_aka_see','records_aka_aka',
                'records_transitive',
                'vide','vide_plus']

match_info.loc[vars_perc_vide,'perc_vide_plus'] = match_info.loc[vars_perc_vide,'data']/nvide_plus

nrec_matched = match_info.loc['records_matched_ok','data']
vars_perc_matches = ['records_matched_ok',
                     'records_see_aka','records_see_see',
                     'records_aka_see','records_aka_aka',
                     'records_transitive']
match_info.loc[vars_perc_matches,'perc_matched_ok'] = match_info.loc[vars_perc_matches,'data']/nrec_matched

nsee = match_info.loc['see','data']
vars_perc_see = ['see_matched','see_matched_ok','records_see_aka','records_see_see', 'see','see_fac','see_geo','see_pai']
match_info.loc[vars_perc_see,'perc_type'] = match_info.loc[vars_perc_see,'data']/nsee
match_info.loc[vars_perc_see,'type'] = 'see'


naka = match_info.loc['aka','data']
vars_perc_aka = ['aka','aka_matched','aka_matched_ok','aka_fac','aka_geo','aka_pai','records_aka_see','records_aka_aka']
match_info.loc[vars_perc_aka,'perc_type'] = match_info.loc[vars_perc_aka,'data']/naka
match_info.loc[vars_perc_aka,'type'] = 'aka'


nmatched_pairs = match_info.loc['matched_pairs','data']
vars_perc_matched = ['matched_pairs','matched_pairs_ok']
match_info.loc[vars_perc_matched,'perc_type'] = match_info.loc[vars_perc_matched,'data']/nmatched_pairs
match_info.loc[vars_perc_matched,'type'] = 'matched_pairs'

nrec_matched = match_info.loc['records_matched','data']
vars_perc_rec_matched = ['records_matched','records_matched_ok','records_transitive']
match_info.loc[vars_perc_rec_matched,'perc_type'] = match_info.loc[vars_perc_rec_matched,'data']/nrec_matched
match_info.loc[vars_perc_rec_matched,'type'] = 'records_mached'

In [103]:
match_info.infer_objects(copy=False).fillna(" ")

Unnamed: 0,data,sequential,random,perc_vide_plus,perc_matched_ok,perc_type,type
aka,3046,,,0.342632,,1.0,aka
aka_fac,3020,,,0.339708,,0.991464,aka
aka_geo,2958,,,0.332733,,0.97111,aka
aka_matched,1901,1910.0,1927.0,,,0.624097,aka
aka_matched_ok,1929,1894.0,1853.0,,,0.63329,aka
aka_pai,1617,,,0.18189,,0.53086,aka
matched_pairs,3821,3644.0,3739.0,,,1.0,matched_pairs
matched_pairs_ok,3664,3620.0,3546.0,,,0.958911,matched_pairs
nodate,6284,,,0.706862,,,
nodate_novide,264,,,0.029696,,,


### Generate matching file and dataframe

In [104]:

matching_view_cols = ['match','nome-geografico','uc-entrada','uc-saida','name','nome-vide','lookup','nome-pai','vide_type','faculdade','match_type','match_error','match_obs']

matched_error = vide_plus[vide_plus['match_error']==True]
matched_error_index = matched_error.index.unique()

matched_index = match_records['records_matched']['data']
matched_ok_index = list(set(matched_index)-set(matched_error_index))

matched = vide_plus.loc[matched_index].sort_values(['sort_key','nome-geografico','uc-entrada'])[matching_view_cols]
nmatched = len(matched_index)
print("Number of matched records:",nmatched)
matched.to_csv('../inferences/cross-references/vide_matched.csv',sep=',',)
matched.head(40)


Number of matched records: 3819


Unnamed: 0,match,nome-geografico,uc-entrada,uc-saida,name,nome-vide,lookup,nome-pai,vide_type,faculdade,match_type,match_error,match_obs
179898,151044.0,Pinheiro de Ázere,0000-00-00,0000-00-00,Adrião da Barca de Gouveia,Barca,Adrião da Barca,,cut,,see-aka,False,
151044,179898.0,Pinheiro de Ázere,1596-10-19,1620-07-11,Adrião da Barca,Gouveia,Adrião da Barca Gouveia,Baltasar Cardoso,add,Cânones,aka-see,False,
151589,131748.0,Viana,0000-00-00,0000-00-00,Afonso de Barros,Caminha,Afonso de Barros Caminha,,add,,see-aka,False,
131748,151589.0,Viana,1684-10-01,1687-10-01,Afonso de Barros Caminha,Barros,Afonso de Barros,,cut,Cânones,aka-see,False,
250325,151588.0,Estremoz,0000-00-00,0000-00-00,Afonso de Barros Preto,Barros,Afonso de Barros,Francisco Dias Zagalo,cut,,see-aka,False,
151588,250325.0,Estremoz,1563-11-16,1577-10-11,Afonso de Barros,Preto,Afonso de Barros Preto,,add,Leis,aka-see,False,
181618,186611.0,Viseu,0000-00-00,0000-00-00,Afonso Botelho Machado,Botelho,Afonso Botelho,,cut,Leis,see-aka,False,
186611,181618.0,Viseu,1609-10-01,1624-10-30,Afonso Botelho,Machado,Afonso Botelho Machado,António Machado,add,Cânones,aka-see,False,
186611,181618.0,Viseu,1609-10-01,1624-10-30,Afonso Botelho,Machado,Afonso Botelho Machado,António Machado,add,Leis,aka-see,False,
221796,164067.0,Elvas,0000-00-00,0000-00-00,Afonso Frausto Segurado,Frausto,Afonso Frausto,,cut,,see-see,False,


### Matches, excluding errors



In [105]:
matched_ok_index = match_records['records_matched_ok']['data']
matched.loc[matched_ok_index].sort_values(['name','nome-geografico','uc-entrada']).head(40).fillna('')

Unnamed: 0,match,nome-geografico,uc-entrada,uc-saida,name,nome-vide,lookup,nome-pai,vide_type,faculdade,match_type,match_error,match_obs
151044,179898,Pinheiro de Ázere,1596-10-19,1620-07-11,Adrião da Barca,Gouveia,Adrião da Barca Gouveia,Baltasar Cardoso,add,Cânones,aka-see,False,
179898,151044,Pinheiro de Ázere,0000-00-00,0000-00-00,Adrião da Barca de Gouveia,Barca,Adrião da Barca,,cut,,see-aka,False,
186611,181618,Viseu,1609-10-01,1624-10-30,Afonso Botelho,Machado,Afonso Botelho Machado,António Machado,add,Cânones,aka-see,False,
186611,181618,Viseu,1609-10-01,1624-10-30,Afonso Botelho,Machado,Afonso Botelho Machado,António Machado,add,Leis,aka-see,False,
181618,186611,Viseu,0000-00-00,0000-00-00,Afonso Botelho Machado,Botelho,Afonso Botelho,,cut,Leis,see-aka,False,
164067,221796,Elvas,0000-00-00,1627-02-12,Afonso Frausto,Segurado,Afonso Frausto Segurado,,add,Cânones,see-see,False,
221796,164067,Elvas,0000-00-00,0000-00-00,Afonso Frausto Segurado,Frausto,Afonso Frausto,,cut,,see-see,False,
177343,203124,Fronteira,1577-10-03,1585-11-23,Afonso Garcia,Tinoco,Afonso Garcia Tinoco,Pedro Garcia Tinoco,add,Leis,aka-see,False,
203124,177343,Fronteira,0000-00-00,0000-00-00,Afonso Garcia Tinoco,Garcia,Afonso Garcia,Pedro Garcia Tinoco,cut,Leis,see-aka,False,
187008,190253,Arruda,0000-00-00,0000-00-00,Afonso Henriques,Homem,Afonso Henriques Homem,,add,Cânones,see-aka,False,


### Show diferences in matching results

In [106]:
match_info.infer_objects(copy=False).fillna("")

Unnamed: 0,data,sequential,random,perc_vide_plus,perc_matched_ok,perc_type,type
aka,3046,,,0.342632,,1.0,aka
aka_fac,3020,,,0.339708,,0.991464,aka
aka_geo,2958,,,0.332733,,0.97111,aka
aka_matched,1901,1910.0,1927.0,,,0.624097,aka
aka_matched_ok,1929,1894.0,1853.0,,,0.63329,aka
aka_pai,1617,,,0.18189,,0.53086,aka
matched_pairs,3821,3644.0,3739.0,,,1.0,matched_pairs
matched_pairs_ok,3664,3620.0,3546.0,,,0.958911,matched_pairs
nodate,6284,,,0.706862,,,
nodate_novide,264,,,0.029696,,,


####  Only matched in random mode

The extra success of the random mode comes from a better tolerance to variations of geographic names.

This is because the random mode uses a similarity factor to find students of with the same birth place while the sequential method uses sorting on geographic name and names to get the matches adjacent.

Each methods manages to succeed in cases where the other fails, but random is more efficient.

In [109]:
matched_rand_index = match_records['records_matched_ok']['random']
matched_seq_index = match_records['records_matched_ok']['sequential']
matched_error_index = match_records['records_error']['data']

matched_rand_only = list(set(matched_rand_index)-set(matched_seq_index)-set(matched_error_index))
nmatched_rand_only = len(matched_rand_only)
print(f"Number of records matched only in random access mode (errors excluded): {nmatched_rand_only}")
print()
print("Sample:")
matched.loc[matched_rand_only].sort_values(['name','nome-geografico','uc-entrada',])[matching_view_cols].head(40)

Number of records matched only in random access mode (errors excluded): 74

Sample:


Unnamed: 0,match,nome-geografico,uc-entrada,uc-saida,name,nome-vide,lookup,nome-pai,vide_type,faculdade,match_type,match_error,match_obs
183928,222335,Nogoselo,1756-10-01,1756-10-01,Agostinho Manuel,Agostinho Manuel de Sequeira,Agostinho Manuel de Sequeira,,rep,Cursos jurídicos (Cânones ou Leis),aka-see,False,
222335,183928,Nagoselo,0000-00-00,0000-00-00,Agostinho Manuel de Sequeira,Agostinho Manuel,Agostinho Manuel,,cut,,see-aka,False,
148502,213090,Santiago do Cacém,1650-10-12,1658-03-30,André Ascenso,Salema,André Ascenso Salema,Manuel Raposo Pessanha,add,Cânones,aka-see,False,
148502,213090,Santiago do Cacém,1650-10-12,1658-03-30,André Ascenso,Salema,André Ascenso Salema,Manuel Raposo Pessanha,add,Leis,aka-see,False,
213090,148502,Santiago de Cacém,0000-00-00,0000-00-00,André Ascenso Salema,Ascenso,André Ascenso,,cut,Leis,see-aka,False,
178267,251037,Ilha Terceira,1567-10-01,1575-05-17,André Gomes,Monteiro,André Gomes Monteiro,António Vaz,add,Cânones,aka-see,False,
251037,178267,Ilha da Terceira,0000-00-00,0000-00-00,André Gomes Monteiro,Gomes,André Gomes,,cut,,see-aka,False,
222372,140377,Várzae de Meruge,1728-10-01,1731-10-01,André de Sequeira,Abranches,André de Sequeira Abranches,,add,Cânones,aka-see,False,
140377,222372,Várzea de Meruge,0000-00-00,0000-00-00,André de Sequeira Abranches,Sequeira,André de Sequeira,,cut,,see-aka,False,
238571,202253,Setã,1632-01-10,1659-11-08,António Lopes,Leitão,António Lopes Leitão,António André,add,Cânones,aka-see,False,


#### Only matched in sequential mode

A few cases sequential is more successful.

In [110]:
pd.set_option('display.max_rows',100)


matched_seq_only = list(set(matched_seq_index)-set(matched_rand_index)-set(matched_error_index))
nmatched_not_rand = len(matched_seq_only)
print(f"Number of records matched only in sequential mode (errors excluded): {nmatched_not_rand}")
print()
matched.loc[matched_seq_only].sort_values(['name','nome-geografico','uc-entrada',]).head(20)[matching_view_cols]


Number of records matched only in sequential mode (errors excluded): 82



Unnamed: 0,match,nome-geografico,uc-entrada,uc-saida,name,nome-vide,lookup,nome-pai,vide_type,faculdade,match_type,match_error,match_obs
147478,233523,,1700-11-23,1700-11-23,Alexandre de Araújo,Sodré,Alexandre de Araújo Sodré,António de Araújo de Macedo,add,Cursos jurídicos (Cânones ou Leis),aka-see,False,
233523,147478,,0000-00-00,0000-00-00,Alexandre de Araújo Sodré,Araújo,Alexandre de Araújo,,cut,,see-aka,False,
146869,132149,,0000-00-00,0000-00-00,António,Figueiredo,António Figueiredo,,add,,see-aka,False,
241634,250486,Algoso,0000-00-00,0000-00-00,António Pimentel,Morais,António Pimentel Morais,,add,Cânones,see-aka,False,
250486,241634,Algozo,1656-10-15,1665-03-24,António Pimentel Morais,Pimentel,António Pimentel,,cut,Leis,aka-see,False,
249475,163593,,0000-00-00,0000-00-00,António Plácido,Dourado,António Plácido Dourado,,add,,see-aka,False,
163593,249475,,1761-10-01,1761-10-01,António Plácido Dourado,Plácido,António Plácido,,cut,Teologia,aka-see,False,
147265,167316,,1738-10-01,1744-10-01,António da Anunciação,Evangelista,António da Anunciação Evangelista,,add,Teologia,aka-see,False,
167316,147265,,0000-00-00,0000-00-00,António da Anunciação Evangelista,Anunciação,António da Anunciação,,cut,,see-aka,False,
132149,146869,,1573-10-01,1582-04-30,António de Figueiredo,António,António,,cut,Teologia,aka-see,False,


#### Analyse Aka to Aka (see also) links.

These are true duplicates. Some of them could be prevented with a check on dates, but this serves to assess the extend of duplicate records in the data.

Analysis:
* 150364 265272: strange because the two records are exactly the same except for the name and vide **and the date on instituta** One of the vide should be a "see" record.
* 129553 191232: also strange: same father name, both records contain the same enrollment in "instituta" in 1601-10-14. the record 191232 
  has the faculdade "Medicina" and an enrollment date of 1613.01.12, while keeping the instituta date. 
* 207361 251998 This looks like a late addition to the "vide" scheme, a note on record 251998 states "Mudou o nome no ano de 1573 aos 03.06 - Atos e Graus 10, fl. 143, caderno 3º". To be in conformity 251988 should be a "see" records with no dates.
* 190606 248991 the record 248991 should be a "See" it retains a single enrolment date  in 1588.10.01 which also exists in the paired record. With no dates and the redundant enrollment removed 248991 this would have been a normal match
* 5 188413 193737: this is a true duplicate but the shorter record 188413 seems to contain redundant information except that the faculdade is 
  recorded as "Leis" while in 193737 is recorded as "Cânones". Note that except for the name of the faculdade 193737 always refers "Leis" in the various fields, including the degree.

In [111]:
vide_plus.columns

Index(['name', 'nome-vide.type', 'nome-vide', 'nome-vide.line',
       'nome-vide.level', 'nome-geografico', 'faculdade', 'faculdade.date',
       'faculdade.obs', 'nome-pai', 'uc-entrada', 'uc-saida', 'uc-saida.date',
       'uc-saida.obs', 'rec_type', 'uc-entrada.type', 'uc-entrada.line',
       'uc-entrada.level', 'data-registo', 'loookup', 'vide_type', 'lookup',
       'name_sp', 'lookup_sp', 'sort_key', 'match_error', 'match_obs', 'match',
       'match_type'],
      dtype='object')

In [120]:
matched.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4122 entries, 179898 to 224398
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   match            3953 non-null   object
 1   nome-geografico  4052 non-null   object
 2   uc-entrada       4122 non-null   object
 3   uc-saida         4122 non-null   object
 4   name             4122 non-null   object
 5   nome-vide        4097 non-null   object
 6   lookup           4122 non-null   object
 7   nome-pai         1663 non-null   object
 8   vide_type        4122 non-null   object
 9   faculdade        2712 non-null   object
 10  match_type       3953 non-null   object
 11  match_error      4122 non-null   bool  
 12  match_obs        206 non-null    object
dtypes: bool(1), object(12)
memory usage: 551.7+ KB


In [127]:
from timelink.pandas import display_group_attributes

date_threshold = 15  # difference in years for flagging false duplicate.
show_only = 20

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

pairs = match_records['matched_pairs_ok']['data']
show_pairs = [(o,d,t) for o,d,t in pairs if t == 'aka-aka' and o<d]

aka_aka_same_date = []
aka_aka_far_apart = []
aka_aka_possible_see = []
for o,d,t in show_pairs:
    if o == '141854':
        pass
    # get the dates of entry to filter those that cannot be the same
    date_o = matched.loc[o, 'uc-entrada']
    if type(date_o) != str:
        date_o = date_o.iloc[0]
    date_d = matched.loc[d, 'uc-entrada']
    if type(date_d) != str:
        date_d = date_d.iloc[0]
    date_s_o = matched.loc[o, 'uc-saida']
    if type(date_s_o) != str:
        date_s_o = date_s_o.iloc[0]
    date_s_d = matched.loc[d, 'uc-saida']
    if type(date_s_d) != str:
        date_s_d = date_s_d.iloc[0]

    if date_o == date_s_o:
        aka_aka_possible_see.append(o)

    if date_d == date_s_d:
        aka_aka_possible_see.append(d)

    if date_o == date_d and date_s_o == date_s_d:
        # print("aka-aka pair with same date:",date_o,(o,d,t))
        aka_aka_same_date.append((o,d,t))
    else:
        year_o = int(date_o[:4])
        year_d = int(date_d[:4])
        if max(year_o,year_d) - min(year_o,year_d) > date_threshold:
            # print(f"False aka-aka: records more than {date_threshold} years appart",(o,d,t),date_s_o,date_d)
            aka_aka_far_apart.append((o,d,t))

print(f"Number of aka-aka pairs with the same date:",len(aka_aka_same_date))
print(f"Number of aka-aka pairs more {date_threshold} years apart:",len(aka_aka_far_apart))
print(f"Number of possible false aka records (records with a single date, probably a see record)",len(aka_aka_possible_see))


print(f"aka-aka matches (show only {show_only}) of {len(show_pairs)}:")
i = 0
for o,d,t in show_pairs[:show_only]:
    i += 1
    print(i,o,d)
    if (o,d,t) in aka_aka_same_date:
        print("SAME DATES: Possible double registration of the same card")
    elif (o,d,t) in aka_aka_far_apart:
        print(f"FAR APART >{date_threshold} years: possible false match, records chronologically affar")
    if o in aka_aka_possible_see:
        print(f"{o} is a possible 'see' record")
    if d in aka_aka_possible_see:
        print(f"{d} is a possible 'see' record")

    display_group_attributes([o,d],
                             entity_type='person',
                             header_elements=['id', 'name'],
                             header_attributes=['nome-vide','naturalidade','uc-entrada','uc-saida','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_date','the_type','the_value'],
                             cmap_name='Pastel1',
                             db=db)


Number of aka-aka pairs with the same date: 5
Number of aka-aka pairs more 15 years apart: 9
Number of possible false aka records (records with a single date, probably a see record) 38
aka-aka matches (show only 20) of 93:
1 245698 252234


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,252234,Gonçalo Fernandes,nome-vide,Ferreira,1642-10-15,12403,2,,Campo Maior,1642-10-15,,1642-10-15,1642-10-15,,1644-10-09,1644-10-09,,Medicina,1642-10-15,Medicina,Manuel Ferreira,1642-10-15,,252234
1,245698,Gonçalo Fernandes Ferreira,nome-vide,Fernandes,1644-03-10,13842,2,,Campo Maior,1644-03-10,,1644-03-10,1644-03-10,,1645-05-04,1645-05-04,,Medicina,1644-03-10,Medicina,Manuel Ferreira,1644-03-10,,245698


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,252234,Gonçalo Fernandes,faculdade,Medicina,1642-10-15,Medicina,252234
1,252234,Gonçalo Fernandes,matricula-faculdade,Medicina,1642-10-15,15.10.1642,252234
2,252234,Gonçalo Fernandes,naturalidade,Campo Maior,1642-10-15,,252234
3,252234,Gonçalo Fernandes,nome,Gonçalo Fernandes,1642-10-15,,252234
4,252234,Gonçalo Fernandes,nome,Gonçalo Fernandes Ferreira,1642-10-15,"Gonçalo Fernandes, vide Ferreira",252234
5,252234,Gonçalo Fernandes,nome-pai,Manuel Ferreira,1642-10-15,,252234
6,252234,Gonçalo Fernandes,nome-vide,Ferreira,1642-10-15,,252234
7,252234,Gonçalo Fernandes,uc-entrada,1642-10-15,1642-10-15,,252234
8,252234,Gonçalo Fernandes,uc-entrada.ano,1642,1642-10-15,,252234
9,252234,Gonçalo Fernandes,matricula-faculdade,Medicina,1643-10-15,15.10.1643,252234


2 213910 253753


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,213910,Rui Lopes da Veiga,nome-vide,Lopes,1558-02-03,1588,2,,Coimbra,1558-02-03,,1558-02-03,1558-02-03,,1560-05-23,1560-05-23,,Artes,1558-02-03,Faculdade inferida,,213910
1,253753,Rui Lopes,nome-vide,da Veiga,1568-12-23,2329,2,,Coimbra,1568-12-23,,1568-12-23,1568-12-23,,1569-12-11,1569-12-11,,Leis,1568-12-23,Leis,,253753


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,213910,Rui Lopes da Veiga,faculdade,Artes,1558-02-03,Faculdade inferida,213910
1,213910,Rui Lopes da Veiga,grau,Bacharel em Artes,1558-02-03,ter o tempo que se requer para Bacharel em Artes: 03.02.1558,213910
2,213910,Rui Lopes da Veiga,naturalidade,Coimbra,1558-02-03,,213910
3,213910,Rui Lopes da Veiga,nome,Rui Lopes,1558-02-03,"Rui Lopes da Veiga, vide Lopes",213910
4,213910,Rui Lopes da Veiga,nome,Rui Lopes da Veiga,1558-02-03,,213910
5,213910,Rui Lopes da Veiga,nome-vide,Lopes,1558-02-03,,213910
6,213910,Rui Lopes da Veiga,uc-entrada,1558-02-03,1558-02-03,,213910
7,213910,Rui Lopes da Veiga,uc-entrada.ano,1558,1558-02-03,,213910
8,213910,Rui Lopes da Veiga,grau,Licenciado em Artes,1560-05-23,23.05.1560,213910
9,213910,Rui Lopes da Veiga,uc-saida,1560-05-23,1560-05-23,,213910


3 150411 252562
150411 is a possible 'see' record


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,150411,Manuel Lopes Batista,nome-vide,Lopes,1702-03-30,3492,2,,Coimbra,1702-03-30,,1702-03-30,1702-03-30,,1702-03-30,1702-03-30,,Medicina,1702-03-30,Medicina,Matias Lopes,1702-03-30,,150411
1,252562,Manuel Lopes,nome-vide,Batista,1702-10-01,11060,2,,Coimbra,1702-10-01,,1702-10-01,1702-10-01,,1707-11-14,1707-11-14,,Medicina,1702-10-01,Medicina,Matias Lopes,1702-10-01,,252562


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,150411,Manuel Lopes Batista,faculdade,Medicina,1702-03-30,Medicina,150411
1,150411,Manuel Lopes Batista,grau,Bacharel em Artes,1702-03-30,1702/03/30,150411
2,150411,Manuel Lopes Batista,naturalidade,Coimbra,1702-03-30,,150411
3,150411,Manuel Lopes Batista,nome,Manuel Lopes,1702-03-30,"Manuel Lopes Batista, vide Lopes",150411
4,150411,Manuel Lopes Batista,nome,Manuel Lopes Batista,1702-03-30,,150411
5,150411,Manuel Lopes Batista,nome-pai,Matias Lopes,1702-03-30,,150411
6,150411,Manuel Lopes Batista,nome-vide,Lopes,1702-03-30,,150411
7,150411,Manuel Lopes Batista,uc-entrada,1702-03-30,1702-03-30,,150411
8,150411,Manuel Lopes Batista,uc-entrada.ano,1702,1702-03-30,,150411
9,150411,Manuel Lopes Batista,uc-saida,1702-03-30,1702-03-30,,150411


4 149046 216902


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,149046,Bernardo José de Azevedo,nome-vide,Vieira,1725-10-01,3497,2,,Paredes,1725-10-01,,1725-10-01,1725-10-01,,1746-10-01,1746-10-01,,Cânones,1725-10-01,Cânones,José de Azevedo Vieira,1725-10-01,,149046
1,216902,Bernardo José de Azevedo Vieira,nome-vide,Azevedo,1726-10-01,1101,2,,Paredes,1726-10-01,,1726-10-01,1726-10-01,,1729-10-01,1729-10-01,,Cânones,1726-10-01,Cânones,,,,216902


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,149046,Bernardo José de Azevedo,faculdade,Cânones,1725-10-01,Cânones,149046
1,149046,Bernardo José de Azevedo,instituta,1725-10-01,1725-10-01,01.10.1725 1725-10-01,149046
2,149046,Bernardo José de Azevedo,naturalidade,Paredes,1725-10-01,,149046
3,149046,Bernardo José de Azevedo,nome,Bernardo José de Azevedo,1725-10-01,,149046
4,149046,Bernardo José de Azevedo,nome,Bernardo José de Azevedo Vieira,1725-10-01,"Bernardo José de Azevedo, vide Vieira",149046
5,149046,Bernardo José de Azevedo,nome-nota,padre,1725-10-01,,149046
6,149046,Bernardo José de Azevedo,nome-pai,José de Azevedo Vieira,1725-10-01,,149046
7,149046,Bernardo José de Azevedo,nome-vide,Vieira,1725-10-01,,149046
8,149046,Bernardo José de Azevedo,padre,sim,1725-10-01,padre,149046
9,149046,Bernardo José de Azevedo,uc-entrada,1725-10-01,1725-10-01,,149046


5 217492 247266


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,247266,Jerónimo Lopes,nome-vide,Vieira,1622-04-26,4847,2,,Santarém,1622-04-26,,1622-04-26,1622-04-26,,1623-07-13,1623-07-13,,Medicina,1622-04-26,Medicina,Francisco Lopes,1622-04-26,,247266
1,217492,Jerónimo Lopes Vieira,nome-vide,Lopes,1623-05-04,4693,2,,Santarém,1623-05-04,,1623-05-04,1623-05-04,,1623-05-05,1623-05-05,,Artes,1623-05-04,Faculdade inferida,Francisco Lopes,1623-05-04,,217492


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,247266,Jerónimo Lopes,faculdade,Medicina,1622-04-26,Medicina,247266
1,247266,Jerónimo Lopes,matricula-faculdade,Medicina,1622-04-26,26.04.1622,247266
2,247266,Jerónimo Lopes,naturalidade,Santarém,1622-04-26,,247266
3,247266,Jerónimo Lopes,nome,Jerónimo Lopes,1622-04-26,,247266
4,247266,Jerónimo Lopes,nome,Jerónimo Lopes Vieira,1622-04-26,"Jerónimo Lopes, vide Vieira",247266
5,247266,Jerónimo Lopes,nome-pai,Francisco Lopes,1622-04-26,,247266
6,247266,Jerónimo Lopes,nome-vide,Vieira,1622-04-26,,247266
7,247266,Jerónimo Lopes,uc-entrada,1622-04-26,1622-04-26,,247266
8,247266,Jerónimo Lopes,uc-entrada.ano,1622,1622-04-26,,247266
9,217492,Jerónimo Lopes Vieira,faculdade,Artes,1623-05-04,Faculdade inferida,217492


6 168704 205306


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,205306,António Mendes,nome-vide,Neto,1540-04-11,1068,2,,Santarém,1540-04-11,,1540-04-11,1540-04-11,,1549-07-10,1549-07-10,,Cânones,1540-04-11,Cânones,,205306
1,168704,António Mendes Neto,nome-vide,Mendes,1549-04-11,7628,2,,Santarém,1549-04-11,,1549-04-11,1549-04-11,,1553-05-31,1553-05-31,,Cânones,1549-04-11,Faculdade corrigida,,168704


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,205306,António Mendes,faculdade,Cânones,1540-04-11,Cânones,205306
1,205306,António Mendes,naturalidade,Santarém,1540-04-11,,205306
2,205306,António Mendes,nome,António Mendes,1540-04-11,,205306
3,205306,António Mendes,nome,António Mendes Neto,1540-04-11,"António Mendes, vide Neto",205306
4,205306,António Mendes,nome-vide,Neto,1540-04-11,,205306
5,205306,António Mendes,uc-entrada,1540-04-11,1540-04-11,,205306
6,205306,António Mendes,uc-entrada.ano,1540,1540-04-11,,205306
7,168704,António Mendes Neto,faculdade,Cânones,1549-04-11,Faculdade corrigida,168704
8,168704,António Mendes Neto,faculdade,Leis,1549-04-11,Faculdade corrigida,168704
9,168704,António Mendes Neto,faculdade-original,Cânones,1549-04-11,,168704


7 144662 204182


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,144662,Pedro Álvares,nome-vide,Costa,1613-11-09,7169,2,,Chaves,1613-11-09,,1613-11-09,1613-11-09,,1625-02-13,1625-02-13,,Cânones,1613-11-09,Cânones,Baltasar da Costa,1613-11-09,,144662
1,204182,Pedro Álvares da Costa,nome-vide,Álvares,1619-11-17,14009,2,,Chaves,1619-11-17,,1619-11-17,1619-11-17,,1625-02-13,1625-02-13,,Cânones,1619-11-17,Cânones,Baltasar da Costa,1619-11-17,,204182


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,144662,Pedro Álvares,faculdade,Cânones,1613-11-09,Cânones,144662
1,144662,Pedro Álvares,instituta,1613-11-09,1613-11-09,1613/11/09 1613-11-09,144662
2,144662,Pedro Álvares,naturalidade,Chaves,1613-11-09,,144662
3,144662,Pedro Álvares,nome,Pedro Álvares,1613-11-09,,144662
4,144662,Pedro Álvares,nome,Pedro Álvares Costa,1613-11-09,"Pedro Álvares, vide Costa",144662
5,144662,Pedro Álvares,nome-pai,Baltasar da Costa,1613-11-09,,144662
6,144662,Pedro Álvares,nome-vide,Costa,1613-11-09,,144662
7,144662,Pedro Álvares,uc-entrada,1613-11-09,1613-11-09,,144662
8,144662,Pedro Álvares,uc-entrada.ano,1613,1613-11-09,,144662
9,144662,Pedro Álvares,matricula-faculdade,Cânones,1615-10-16,1615/10/16,144662


8 133131 180160


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,180160,Manuel de Gouveia,nome-vide,Quintela,1656-10-01,12267,2,,Lisboa,1656-10-01,,1656-10-01,1656-10-01,,1657-10-01,1657-10-01,,Cânones,1656-10-01,Cânones,,,,180160
1,133131,Manuel de Gouveia Quintela,nome-vide,Gouveia,1656-10-07,1186,2,,Lisboa,1656-10-07,,1656-10-07,1656-10-07,,1664-02-22,1664-02-22,,Cânones,1656-10-07,Cânones,João de Gouveia,1656-10-07,,133131


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,180160,Manuel de Gouveia,faculdade,Cânones,1656-10-01,Cânones,180160
1,180160,Manuel de Gouveia,naturalidade,Lisboa,1656-10-01,,180160
2,180160,Manuel de Gouveia,nome,Manuel de Gouveia,1656-10-01,,180160
3,180160,Manuel de Gouveia,nome,Manuel de Gouveia Quintela,1656-10-01,"Manuel de Gouveia, vide Quintela",180160
4,180160,Manuel de Gouveia,nome-vide,Quintela,1656-10-01,,180160
5,180160,Manuel de Gouveia,uc-entrada,1656-10-01,1656-10-01,,180160
6,180160,Manuel de Gouveia,uc-entrada.ano,1656,1656-10-01,,180160
7,133131,Manuel de Gouveia Quintela,faculdade,Cânones,1656-10-07,Cânones,133131
8,133131,Manuel de Gouveia Quintela,instituta,1656-10-07,1656-10-07,07.10.1656 1656-10-07,133131
9,180160,Manuel de Gouveia,instituta,1656-10-07,1656-10-07,07.10.1656 1656-10-07,180160


9 149277 205921


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,149277,Jerónimo de Azevedo,nome-vide,Coutinho,1630-02-07,10593,2,,Lisboa,1630-02-07,,1630-02-07,1630-02-07,,1642-11-12,1642-11-12,,Cânones,1630-02-07,Cânones,Álvaro de Azevedo,1630-02-07,,149277
1,205921,Jerónimo de Azevedo Coutinho,nome-vide,Azevedo,1641-10-15,2558,2,,Lisboa,1641-10-15,,1641-10-15,1641-10-15,,1648-07-16,1648-07-16,,Cânones,1641-10-15,Cânones,Álvaro de Azevedo,1641-10-15,,205921


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,149277,Jerónimo de Azevedo,faculdade,Cânones,1630-02-07,Cânones,149277
1,149277,Jerónimo de Azevedo,instituta,1630-02-07,1630-02-07,07.02.1630 1630-02-07,149277
2,149277,Jerónimo de Azevedo,naturalidade,Lisboa,1630-02-07,,149277
3,149277,Jerónimo de Azevedo,nome,Jerónimo de Azevedo,1630-02-07,,149277
4,149277,Jerónimo de Azevedo,nome,Jerónimo de Azevedo Coutinho,1630-02-07,"Jerónimo de Azevedo, vide Coutinho",149277
5,149277,Jerónimo de Azevedo,nome-pai,Álvaro de Azevedo,1630-02-07,,149277
6,149277,Jerónimo de Azevedo,nome-vide,Coutinho,1630-02-07,,149277
7,149277,Jerónimo de Azevedo,uc-entrada,1630-02-07,1630-02-07,,149277
8,149277,Jerónimo de Azevedo,uc-entrada.ano,1630,1630-02-07,,149277
9,149277,Jerónimo de Azevedo,instituta,1631-10-01,1631-10-01,01.10.1631 1631-10-01,149277


10 151963 205524
151963 is a possible 'see' record


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,205524,Jorge Trigueiros,nome-vide,Barros,1626-11-18,11604,2,,Cano,1626-11-18,,1626-11-18,1626-11-18,,1633-06-25,1633-06-25,,Cânones,1626-11-18,Faculdade corrigida,Manuel Trigueiros,1626-11-18,,205524
1,151963,Jorge Trigueiros de Barros,nome-vide,Trigueiros,1632-10-11,8492,2,,Cano,1632-10-11,,1632-10-11,1632-10-11,,1632-10-11,1632-10-11,,Cânones,1632-10-11,Cânones,,,,151963


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,205524,Jorge Trigueiros,faculdade,Cânones,1626-11-18,Faculdade corrigida,205524
1,205524,Jorge Trigueiros,faculdade,Leis,1626-11-18,Faculdade corrigida,205524
2,205524,Jorge Trigueiros,faculdade-original,Cânones,1626-11-18,,205524
3,205524,Jorge Trigueiros,instituta,1626-11-18,1626-11-18,18.11.1626 1626-11-18,205524
4,205524,Jorge Trigueiros,naturalidade,Cano,1626-11-18,,205524
5,205524,Jorge Trigueiros,nome,Jorge Trigueiros,1626-11-18,,205524
6,205524,Jorge Trigueiros,nome,Jorge Trigueiros Barros,1626-11-18,"Jorge Trigueiros, vide Barros",205524
7,205524,Jorge Trigueiros,nome-pai,Manuel Trigueiros,1626-11-18,,205524
8,205524,Jorge Trigueiros,nome-vide,Barros,1626-11-18,,205524
9,205524,Jorge Trigueiros,uc-entrada,1626-11-18,1626-11-18,,205524


11 191348 237553


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,237553,Manuel Pereira,nome-vide,Castro,1597-10-13,6356,2,,Monção,1597-10-13,,1597-10-13,1597-10-13,,1604-11-02,1604-11-02,,Cânones,1597-10-13,Cânones,Manuel Pereira,1597-10-13,,237553
1,191348,Manuel Pereira de Castro,nome-vide,Pereira,1605-11-15,12478,2,,Monção,1605-11-15,,1605-11-15,1605-11-15,,1606-05-13,1606-05-13,,Cânones,1605-11-15,Cânones,Manuel Pereira,1605-11-15,,191348


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,237553,Manuel Pereira,faculdade,Cânones,1597-10-13,Cânones,237553
1,237553,Manuel Pereira,instituta,1597-10-13,1597-10-13,13.10.1597 1597-10-13,237553
2,237553,Manuel Pereira,naturalidade,Monção,1597-10-13,,237553
3,237553,Manuel Pereira,nome,Manuel Pereira,1597-10-13,,237553
4,237553,Manuel Pereira,nome,Manuel Pereira Castro,1597-10-13,"Manuel Pereira, vide Castro",237553
5,237553,Manuel Pereira,nome-pai,Manuel Pereira,1597-10-13,,237553
6,237553,Manuel Pereira,nome-vide,Castro,1597-10-13,,237553
7,237553,Manuel Pereira,uc-entrada,1597-10-13,1597-10-13,,237553
8,237553,Manuel Pereira,uc-entrada.ano,1597,1597-10-13,,237553
9,237553,Manuel Pereira,matricula-faculdade,Cânones,1599-02-08,08.02.1599,237553


12 143239 163231


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,143239,Jerónimo de Almeida,nome-vide,Ribeiro,1553-10-00,10428,2,,Ferreira,1553-10-00,,1553-10-00,1553-10-00,,1559-07-27,1559-07-27,,Leis,1553-10-00,Leis,,143239
1,163231,Jerónimo de Almeida Ribeiro,nome-vide,Almeida,1560-01-24,9738,2,,Ferreira,1560-01-24,,1560-01-24,1560-01-24,,1560-07-24,1560-07-24,,Cânones,1560-01-24,Faculdade inferida,,163231


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,143239,Jerónimo de Almeida,faculdade,Leis,1553-10-00,Leis,143239
1,143239,Jerónimo de Almeida,naturalidade,Ferreira,1553-10-00,,143239
2,143239,Jerónimo de Almeida,nome,Jerónimo de Almeida,1553-10-00,,143239
3,143239,Jerónimo de Almeida,nome,Jerónimo de Almeida Ribeiro,1553-10-00,"Jerónimo de Almeida, vide Ribeiro",143239
4,143239,Jerónimo de Almeida,nome-vide,Ribeiro,1553-10-00,,143239
5,143239,Jerónimo de Almeida,uc-entrada,1553-10-00,1553-10-00,,143239
6,143239,Jerónimo de Almeida,uc-entrada.ano,1553,1553-10-00,,143239
7,143239,Jerónimo de Almeida,instituta,1555-06-00,1555-06-00,"curso: 1 curso de Instituta, 1 de Código desde Outubro de 1553 a Junho de 1555",143239
8,143239,Jerónimo de Almeida,exame,Exame para Bacharel,1559-07-27,27.07.1559,143239
9,143239,Jerónimo de Almeida,grau,Bacharel em Leis,1559-07-27,1559/07/27,143239


13 197653 226783
197653 is a possible 'see' record


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,197653,Guilherme Tavares,nome-vide,Silva,1719-10-01,6370,2,,Lisboa,1719-10-01,,1719-10-01,1719-10-01,,1719-10-01,1719-10-01,,Cursos jurídicos (Cânones ou Leis),1719-10-01,Faculdade inferida,,,,197653
1,226783,Guilherme Tavares da Silva,nome-vide,Tavares,1720-10-01,4981,2,,Lisboa,1720-10-01,,1720-10-01,1720-10-01,,1726-06-28,1726-06-28,,Cânones,1720-10-01,Cânones,Manuel Tavares,1720-10-01,,226783


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,197653,Guilherme Tavares,faculdade,Cursos jurídicos (Cânones ou Leis),1719-10-01,Faculdade inferida,197653
1,197653,Guilherme Tavares,instituta,1719-10-01,1719-10-01,01.10.1719 1719-10-01,197653
2,197653,Guilherme Tavares,naturalidade,Lisboa,1719-10-01,,197653
3,197653,Guilherme Tavares,nome,Guilherme Tavares,1719-10-01,,197653
4,197653,Guilherme Tavares,nome,Guilherme Tavares Silva,1719-10-01,"Guilherme Tavares, vide Silva",197653
5,197653,Guilherme Tavares,nome-vide,Silva,1719-10-01,,197653
6,197653,Guilherme Tavares,uc-entrada,1719-10-01,1719-10-01,,197653
7,197653,Guilherme Tavares,uc-entrada.ano,1719,1719-10-01,,197653
8,197653,Guilherme Tavares,uc-saida,1719-10-01,1719-10-01,,197653
9,197653,Guilherme Tavares,uc-saida.ano,1719,1719-10-01,,197653


14 226759 252233


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,252233,Gonçalo Fernandes,nome-vide,Silva,1605-10-20,12373,2,,Braga,1605-10-20,,1605-10-20,1605-10-20,,1615-11-07,1615-11-07,,Cânones,1605-10-20,Cânones,Gonçalo Rodrigues,1605-10-20,,252233
1,226759,Gonçalo Fernandes da Silva,nome-vide,Fernandes,1609-12-15,4283,2,,Braga,1609-12-15,,1609-12-15,1609-12-15,,1616-04-22,1616-04-22,,Cânones,1609-12-15,Cânones,Gonçalo Rodrigues,1609-12-15,,226759


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,252233,Gonçalo Fernandes,faculdade,Cânones,1605-10-20,Cânones,252233
1,252233,Gonçalo Fernandes,instituta,1605-10-20,1605-10-20,20.10.1605 1605-10-20,252233
2,252233,Gonçalo Fernandes,naturalidade,Braga,1605-10-20,,252233
3,252233,Gonçalo Fernandes,nome,Gonçalo Fernandes,1605-10-20,,252233
4,252233,Gonçalo Fernandes,nome,Gonçalo Fernandes Silva,1605-10-20,"Gonçalo Fernandes, vide Silva",252233
5,252233,Gonçalo Fernandes,nome-pai,Gonçalo Rodrigues,1605-10-20,,252233
6,252233,Gonçalo Fernandes,nome-vide,Silva,1605-10-20,,252233
7,252233,Gonçalo Fernandes,uc-entrada,1605-10-20,1605-10-20,,252233
8,252233,Gonçalo Fernandes,uc-entrada.ano,1605,1605-10-20,,252233
9,226759,Gonçalo Fernandes da Silva,faculdade,Cânones,1609-12-15,Cânones,226759


15 151165 151167


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,151167,Luís Fernandes Barreiros,nome-vide,Luís Barreiros,1717-10-01,14191,2,,Fundão,1717-10-01,,1717-10-01,1717-10-01,,1724-01-31,1724-01-31,,Leis,1717-10-01,Leis,Luís Fernandes Barreiros,1717-10-01,,151167
1,151165,Luís Barreiros,nome-vide,Luís Fernandes Barreiros,1718-10-01,14135,2,,Fundão,1718-10-01,,1718-10-01,1718-10-01,,1723-10-01,1723-10-01,,Leis,1718-10-01,Leis,Luís Fernandes Barreiros,1718-10-01,,151165


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,151167,Luís Fernandes Barreiros,faculdade,Leis,1717-10-01,Leis,151167
1,151167,Luís Fernandes Barreiros,instituta,1717-10-01,1717-10-01,1717/10/01 1717-10-01,151167
2,151167,Luís Fernandes Barreiros,naturalidade,Fundão,1717-10-01,,151167
3,151167,Luís Fernandes Barreiros,nome,Luís Barreiros,1717-10-01,"Luís Fernandes Barreiros, vide Luís Barreiros",151167
4,151167,Luís Fernandes Barreiros,nome,Luís Fernandes Barreiros,1717-10-01,,151167
5,151167,Luís Fernandes Barreiros,nome-pai,Luís Fernandes Barreiros,1717-10-01,,151167
6,151167,Luís Fernandes Barreiros,nome-vide,Luís Barreiros,1717-10-01,,151167
7,151167,Luís Fernandes Barreiros,uc-entrada,1717-10-01,1717-10-01,,151167
8,151167,Luís Fernandes Barreiros,uc-entrada.ano,1717,1717-10-01,,151167
9,151165,Luís Barreiros,faculdade,Leis,1718-10-01,Leis,151165


16 147221 203285
203285 is a possible 'see' record


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,147221,Manuel Antunes,nome-vide,Costa,1662-10-23,8593,2,,Torres Novas,1662-10-23,,1662-10-23,1662-10-23,,1669-12-20,1669-12-20,,Cânones,1662-10-23,Cânones.,António Dias,1662-10-23,,147221
1,203285,Manuel Antunes da Costa,nome-vide,Antunes,1663-05-10,3865,2,,Torres Novas,1663-05-10,,1663-05-10,1663-05-10,,1663-05-10,1663-05-10,,Artes,1663-05-10,Faculdade inferida,,,,203285


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,147221,Manuel Antunes,faculdade,Cânones,1662-10-23,Cânones.,147221
1,147221,Manuel Antunes,matricula-faculdade,Cânones,1662-10-23,1662/10/23,147221
2,147221,Manuel Antunes,naturalidade,Torres Novas,1662-10-23,,147221
3,147221,Manuel Antunes,nome,Manuel Antunes,1662-10-23,,147221
4,147221,Manuel Antunes,nome,Manuel Antunes Costa,1662-10-23,"Manuel Antunes, vide Costa",147221
5,147221,Manuel Antunes,nome-pai,António Dias,1662-10-23,,147221
6,147221,Manuel Antunes,nome-vide,Costa,1662-10-23,,147221
7,147221,Manuel Antunes,uc-entrada,1662-10-23,1662-10-23,,147221
8,147221,Manuel Antunes,uc-entrada.ano,1662,1662-10-23,,147221
9,203285,Manuel Antunes da Costa,faculdade,Artes,1663-05-10,Faculdade inferida,203285


17 180510 251992


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,251992,Francisco Fernandes,nome-vide,Guedes,1607-10-20,8726,2,,Vila Real,1607-10-20,,1607-10-20,1607-10-20,,1612-10-19,1612-10-19,,Teologia,1607-10-20,Teologia,João Lourenço,1607-10-20,,251992
1,180510,Francisco Fernandes Guedes,nome-vide,Fernandes,1607-11-05,6398,2,,Vila Real,1607-11-05,,1607-11-05,1607-11-05,,1614-05-17,1614-05-17,,Teologia,1607-11-05,Teologia,João Lourenço,1607-11-05,,180510


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,251992,Francisco Fernandes,faculdade,Teologia,1607-10-20,Teologia,251992
1,251992,Francisco Fernandes,instituta,1607-10-20,1607-10-20,20.10.1607 1607-10-20,251992
2,251992,Francisco Fernandes,naturalidade,Vila Real,1607-10-20,,251992
3,251992,Francisco Fernandes,nome,Francisco Fernandes,1607-10-20,,251992
4,251992,Francisco Fernandes,nome,Francisco Fernandes Guedes,1607-10-20,"Francisco Fernandes, vide Guedes",251992
5,251992,Francisco Fernandes,nome-pai,João Lourenço,1607-10-20,,251992
6,251992,Francisco Fernandes,nome-vide,Guedes,1607-10-20,,251992
7,251992,Francisco Fernandes,uc-entrada,1607-10-20,1607-10-20,,251992
8,251992,Francisco Fernandes,uc-entrada.ano,1607,1607-10-20,,251992
9,180510,Francisco Fernandes Guedes,faculdade,Teologia,1607-11-05,Teologia,180510


18 167251 231579
167251 is a possible 'see' record


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,167251,Pedro Simões Esteves,nome-vide,Simões,1665-10-19,2040,2,,Campo Maior,1665-10-19,,1665-10-19,1665-10-19,,1665-10-19,1665-10-19,,Cursos jurídicos (Cânones ou Leis),1665-10-19,Faculdade inferida,,167251
1,231579,Pedro Simões,nome-vide,Pedro Simões Esteves,1665-10-19,2428,2,,Campo Maior,1665-10-19,,1665-10-19,1665-10-19,,1666-10-15,1666-10-15,,Cânones,1665-10-19,Cânones,,231579


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,231579,Pedro Simões,grau,Bacharel em Artes,1665-03-21,Bacharel em Artes 21.03.1665,231579
1,167251,Pedro Simões Esteves,faculdade,Cursos jurídicos (Cânones ou Leis),1665-10-19,Faculdade inferida,167251
2,231579,Pedro Simões,faculdade,Cânones,1665-10-19,Cânones,231579
3,167251,Pedro Simões Esteves,instituta,1665-10-19,1665-10-19,19.10.1665 1665-10-19,167251
4,231579,Pedro Simões,instituta,1665-10-19,1665-10-19,19.10.1665 1665-10-19,231579
5,167251,Pedro Simões Esteves,naturalidade,Campo Maior,1665-10-19,,167251
6,231579,Pedro Simões,naturalidade,Campo Maior,1665-10-19,,231579
7,167251,Pedro Simões Esteves,nome,Pedro Simões,1665-10-19,"Pedro Simões Esteves, vide Simões",167251
8,231579,Pedro Simões,nome,Pedro Simões,1665-10-19,,231579
9,167251,Pedro Simões Esteves,nome,Pedro Simões Esteves,1665-10-19,,167251


19 153709 187539


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,id
0,153709,Bento Carvalho,nome-vide,Brandão,1655-10-11,14371,2,,Lisboa,1655-10-11,,1655-10-11,1655-10-11,,1658-10-01,1658-10-01,,Cânones,1655-10-11,Cânones,,153709
1,187539,Bento Carvalho Brandão,nome-vide,Carvalho,1655-10-11,133,2,,Lisboa,1655-10-11,,1655-10-11,1655-10-11,,1664-03-31,1664-03-31,,Cânones,1655-10-11,Cânones,,187539


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,153709,Bento Carvalho,faculdade,Cânones,1655-10-11,Cânones,153709
1,187539,Bento Carvalho Brandão,faculdade,Cânones,1655-10-11,Cânones,187539
2,153709,Bento Carvalho,instituta,1655-10-11,1655-10-11,11.10.1655 1655-10-11,153709
3,187539,Bento Carvalho Brandão,instituta,1655-10-11,1655-10-11,11.10.1655 1655-10-11,187539
4,153709,Bento Carvalho,naturalidade,Lisboa,1655-10-11,,153709
5,187539,Bento Carvalho Brandão,naturalidade,Lisboa,1655-10-11,,187539
6,153709,Bento Carvalho,nome,Bento Carvalho,1655-10-11,,153709
7,187539,Bento Carvalho Brandão,nome,Bento Carvalho,1655-10-11,"Bento Carvalho Brandão, vide Carvalho",187539
8,153709,Bento Carvalho,nome,Bento Carvalho Brandão,1655-10-11,"Bento Carvalho, vide Brandão",153709
9,187539,Bento Carvalho Brandão,nome,Bento Carvalho Brandão,1655-10-11,,187539


20 153630 246901
FAR APART >15 years: possible false match, records chronologically affar


Unnamed: 0,id_1,name,nome-vide.type,nome-vide,nome-vide.date,nome-vide.line,nome-vide.level,nome-vide.obs,naturalidade,naturalidade.date,naturalidade.obs,uc-entrada,uc-entrada.date,uc-entrada.obs,uc-saida,uc-saida.date,uc-saida.obs,faculdade,faculdade.date,faculdade.obs,nome-pai,nome-pai.date,nome-pai.obs,id
0,246901,Gaspar Pinto,nome-vide,Fonseca,1591-01-04,5516,2,,Amarante,1591-01-04,,1591-01-04,1591-01-04,,1594-10-01,1594-10-01,,Cânones,1591-01-04,Cânones,Gonçalo Vaz,1591-01-04,,246901
1,153630,Gaspar Pinto da Fonseca,nome-vide,Pinto,1616-12-15,12270,2,,Amarante,1616-12-15,,1616-12-15,1616-12-15,,1623-07-04,1623-07-04,,Leis,1616-12-15,Leis,Pedro de Seixas,1616-12-15,,153630


Unnamed: 0,id_1,name,the_type,the_value,the_date,attr_obs,id
0,246901,Gaspar Pinto,faculdade,Cânones,1591-01-04,Cânones,246901
1,246901,Gaspar Pinto,instituta,1591-01-04,1591-01-04,04.01.1591 1591-01-04,246901
2,246901,Gaspar Pinto,naturalidade,Amarante,1591-01-04,,246901
3,246901,Gaspar Pinto,nome,Gaspar Pinto,1591-01-04,,246901
4,246901,Gaspar Pinto,nome,Gaspar Pinto Fonseca,1591-01-04,"Gaspar Pinto, vide Fonseca",246901
5,246901,Gaspar Pinto,nome-pai,Gonçalo Vaz,1591-01-04,,246901
6,246901,Gaspar Pinto,nome-vide,Fonseca,1591-01-04,,246901
7,246901,Gaspar Pinto,uc-entrada,1591-01-04,1591-01-04,,246901
8,246901,Gaspar Pinto,uc-entrada.ano,1591,1591-01-04,,246901
9,246901,Gaspar Pinto,matricula-faculdade,Cânones,1592-10-01,01.10.1592,246901


#### Types of transformations in matched records

In [73]:
vide_types_matches = matched.groupby('vide_type').count()[['name']]
vide_types_matches['perc'] = vide_types_matches['name']/ vide_types_matches['name'].sum()
vide_types_matches

Unnamed: 0_level_0,name,perc
vide_type,Unnamed: 1_level_1,Unnamed: 2_level_1
add,1897,0.460213
cut,1899,0.460699
novid,26,0.006308
rep,294,0.071325
repap,6,0.001456


In [74]:
match_info.fillna("")

Unnamed: 0,data,sequential,random,perc_vide_plus,perc_matched_ok,perc_type,type
aka,3062,,,0.349304,,1.0,aka
aka_fac,3035,,,0.346224,,0.991182,aka
aka_geo,2973,,,0.339151,,0.970934,aka
aka_matched,1910,1910.0,1970.0,,,0.623775,aka
aka_matched_ok,1940,1904.0,1897.0,,,0.633573,aka
aka_pai,1619,,,0.184691,,0.528739,aka
matched_pairs,3818,3638.0,3804.0,,,1.0,matched_pairs
matched_pairs_ok,3665,3622.0,3614.0,,,0.959927,matched_pairs
nodate,5763,,,0.657426,,,
nodate_novide,141,,,0.016085,,,


### Analysis of non matched records

In [128]:

pd.set_option('display.max_rows',250)
matched_index = match_records['records_matched']['data']
non_matched_index = set(vide_plus.index.unique())-set(matched_index)
vide_non_matched = vide_plus.loc[list(non_matched_index)].sort_values(['sort_key','nome-geografico'])[['nome-geografico','match','name','nome-vide','vide_type','lookup','faculdade','nome-pai','uc-entrada','match_error','match_obs']]
vide_non_matched.to_csv('../inferences/cross-references/vide_non_matched.csv',sep=',')

In [129]:
vide_types_non_matches = vide_non_matched.groupby('vide_type').count()[['name']]
vide_types_non_matches['perc'] = vide_types_non_matches['name']/ vide_types_non_matches['name'].sum()
vide_types_non_matches

Unnamed: 0_level_0,name,perc
vide_type,Unnamed: 1_level_1,Unnamed: 2_level_1
add,2160,0.396476
cut,2226,0.40859
novid,259,0.04754
rep,763,0.140051
rep+,20,0.003671
repap,20,0.003671


### Sample of non-matched records


In [130]:
vide_non_matched.head(31)

Unnamed: 0,nome-geografico,match,name,nome-vide,vide_type,lookup,faculdade,nome-pai,uc-entrada,match_error,match_obs
271719,Abreiro,,Abel de Mendonça Machado de Araújo,Abel de Mendonça,cut,Abel de Mendonça,,,0000-00-00,False,
271719,Mirandela,,Abel de Mendonça Machado de Araújo,Abel de Mendonça,cut,Abel de Mendonça,,,0000-00-00,False,
182548,Eiró,,Abel Xavier Teixeira de Magalhães,José Joaquim Xavier Teixeira de Magalhães,rep,José Joaquim Xavier Teixeira de Magalhães,Cursos jurídicos (Cânones ou Leis),,0000-00-00,False,
285686,Oliveira de Frades,,Abílio Ribeiro de Almeida Campos de Melo,Abílio Ribeiro de Almeida,cut,Abílio Ribeiro de Almeida,Cursos jurídicos (Cânones ou Leis),António de Almeida Silva Campos de Melo,0000-00-00,False,
285686,Pinheiro,,Abílio Ribeiro de Almeida Campos de Melo,Abílio Ribeiro de Almeida,cut,Abílio Ribeiro de Almeida,Cursos jurídicos (Cânones ou Leis),António de Almeida Silva Campos de Melo,0000-00-00,False,
286149,Amoreira da Gandra,,Adelino Pinto Tavares Ferrão de Mendonça,Ferrão,cut,Adelino Pinto Tavares Ferrão,,,0000-00-00,False,
226700,Marvão,,Adolfo Augusto Rôlo,Adolfo António Rôlo,rep,Adolfo António Rôlo,Medicina,,1871-06-06,False,
226683,Marvão,,Adolfo António Rôlo,Adolfo Augusto Zuzarte Rôlo,rep,Adolfo Augusto Zuzarte Rôlo,,,0000-00-00,False,
273326,Lisboa,,Adriano Ernesto de Castilho Barreto,Castilho,cut,Adriano Ernesto de Castilho,,,0000-00-00,False,
230176,Arcos,,Tomás Joaquim Lopes de Mariz e Silva,Adriano Joaquim Lopes Mariz e Silva Monteiro,rep+,Adriano Joaquim Lopes Mariz e Silva Monteiro,Cursos jurídicos (Cânones ou Leis),,0000-00-00,False,



Analysis:
1. 220890	Portalegre	"Pedro Rodrigues, vide; Abreu" links with 140806 __problem in vide expression__
2. 271719	Abreiro/Mirandela	Abel de Mendonça Machado de Araújo	Abel de Mendonça links with 286147 __no back vide expression__
3. 182548	Eiró	Abel Xavier Teixeira de Magalhães	José Joaquim Xavier Teixeira de Magalhães links with 182950  __no back vide expression__
4. 285686	Oliveira de Frades	Abílio Ribeiro de Almeida Campos de Melo	Abílio Ribeiro de Almeida links with 142075 __no back vide expression__
5. 286149	Amoreira da Gandra	Adelino Pinto Tavares Ferrão de Mendonça	Ferrão links with 248088 __no back vide expression__ and __typo in geo name__
6. 273326	Lisboa	Adriano Ernesto de Castilho Barreto	Castilho links with 189993 __no back vide expression__
7. 230176	Arcos	Tomás Joaquim Lopes de Mariz e Silva	Adriano Joaquim Lopes Mariz e Silva Monteiro links with 250994 __variation in the vide name (Maris/Mariz)__
8. 282429	NaN	Adriano Osório Pereira Gouveia	Adriano Osório Pereira Cerenato	rep	Adriano Osório Pereira Cerenato	links with 291196 __no back vide expression__
9. 296930	Almarge	Adriano Sisnando Brotero de Avelar Quintino	Adriano Sisnando Brotero Quintino de Avelar	rep	Adriano Sisnando Brotero Quintino de Avelar links with 133134 __no back vide expression__
10. 225520	Lisboa	Adrião Pereira	Gomes	add	Adrião Pereira Gomes	Cânones, links with 178240 __no back vide expression__
11. 147465	Trancoso	Afonso Tavares de Araújo	Afonso de Araújo Tavares	rep	Afonso de Araújo Tavares links with 197047 __no back vide expression__
12. 169888	Lisboa	Afonso Furtado	Mendonça	add	Afonso Furtado Mendonça link with 214147 (see) or 169890__ __ambiguity__ 
13. 251547	Baía	Afonso Luís	da Fonseca	add	Afonso Luís da Fonseca	links with 139362 __no back vide expression__
14. 225529	Monção	Afonso Pereira	Pimenta	add	Afonso Pereira Pimenta	 links with 241162 __no back vide expression__
15. 129050	Elvas	Afonso Rodrigues Caldas	Rodrigues	cut	Afonso Rodrigues __no link found__
16. 221241	Elvas	Afonso Sardinha	Afonso Vaz Sardinha	rep	Afonso Vaz Sardinha	Cânones see link missing	__no link found__
17. 235544	Elvas	Afonso Soares da Mota	Afonso Soares de Lemos	rep	Afonso Soares de Lemos	link 211794 	__no back vide expression__ 
18. 225535	Aldeia Nova do Cabo	Afonso de Sá Pereira	Sá	cut	Afonso de Sá links with 211378 __no back vide expression__ 
19. 199294	Vila Real	Afonso Teixeira	   Mendonça e Azevedo	add	Afonso Teixeira Mendonça e Azevedo	Cânones	 links with 148819/See  214149/see __ambiguity__
20. 316331	Quinta do Alqueidão	Agostinho António de Sousa Brito Resende	Soutomaior	add	Agostinho António de Sousa Brito Resende Souto...
	link to 224178 	__no back vide expression_ NO match on geoname Alqueidão, quinta do Alqueidão__
21. 234238 Lisboa	Agostinho Armando de Vasconcelos e Sousa	Agostinho Armando Vasconcelos	rep	Agostinho Armando Vasconcelos	
        Links to 148028 __fail lookup not matching linked record name: both lookup the same though__

### Aka Records non matched

There is an inbalance of "see" and "aka" numbers, so a high number of unmatched "see"  is expected.

Aka records should be more easily matched with corresponding see. That is the case in fact with around 55% of aka records matched

Let's see the reason why Aka records do not find a matching "see".



In [131]:
aka_see_not_matched_index = vide_non_matched[vide_non_matched['uc-entrada']!='0000-00-00'].index.unique()
print("Number of aka records not matched:", {len(aka_see_not_matched_index)})
print("Partial list, change head parameter for more:")

vide_non_matched.loc[aka_see_not_matched_index].head(20)

Number of aka records not matched: {1048}
Partial list, change head parameter for more:


Unnamed: 0,nome-geografico,match,name,nome-vide,vide_type,lookup,faculdade,nome-pai,uc-entrada,match_error,match_obs
226700,Marvão,,Adolfo Augusto Rôlo,Adolfo António Rôlo,rep,Adolfo António Rôlo,Medicina,,1871-06-06,False,
250994,Arcos,,Adriano Joaquim de Mariz e Silva Monteiro,Tomás Joaquim Lopes de Maris e Silva,rep+,Tomás Joaquim Lopes de Maris e Silva,Cursos jurídicos (Cânones ou Leis),,1794-10-14,False,
250994,Aveiro,,Adriano Joaquim de Mariz e Silva Monteiro,Tomás Joaquim Lopes de Maris e Silva,rep+,Tomás Joaquim Lopes de Maris e Silva,Cursos jurídicos (Cânones ou Leis),,1794-10-14,False,
180061,Salgueiro,,João António Osório Pereira Gouveia,Adriano Osório Pereira Guerra,rep+,Adriano Osório Pereira Guerra,Cursos jurídicos (Cânones ou Leis),,1800-10-31,False,
180742,Salgueiro,,Adriano Osório Pereira Guerra,João António Pereira Cerenato,rep+,João António Pereira Cerenato,Leis,,1799-10-07,False,
129050,Elvas,,Afonso Rodrigues Caldas,Rodrigues,cut,Afonso Rodrigues,Leis,,1657-11-02,False,
221241,Elvas,,Afonso Sardinha,Afonso Vaz Sardinha,rep,Afonso Vaz Sardinha,Cânones,Gonçalo Rodrigues,1706-10-01,False,
199294,Vila Real,,Afonso Teixeira,Mendonça e Azevedo,add,Afonso Teixeira Mendonça e Azevedo,Cânones,,1650-11-08,False,
199294,Vila Real,,Afonso Teixeira,Mendonça e Azevedo,add,Afonso Teixeira Mendonça e Azevedo,Leis,,1650-11-08,False,
187458,Santa Olaia,,Agostinho Brandão,Pinto,add,Agostinho Brandão Pinto,Cursos jurídicos (Cânones ou Leis),,1688-01-21,False,


##### Analysis

1. 226700 Marvão Adolfo Augusto Rôlo, vide Adolfo António Rôlo matches 226683 Adolfo António Rôlo, vide Adolfo Augusto Zuzarte Rôlo __back vide does not match__
2. 250994 Arcos	Tomás Joaquim Lopes de Mariz e Silva vide Adriano Joaquim Lopes Mariz e Silva Monteiro links with  230176 __variation in the vide name (Maris/Mariz)__
3. 180061	Salgueiro	_João António Osório Pereira Gouveia_, vide Adriano Osório Pereira Guerra	rep+	Adriano Osório Pereira Guerra	Direito (Cânones ou Leis) 1800-10-31	__variation in the vide name__
  * 180742	Salgueiro   Adriano Osório Pereira Guerra, vide _João António Pereira Cerenato_	Leis 1799-10-07
  * Other possible matches 291196, 191903 complex case
4. 129050	Elvas	Afonso Rodrigues Caldas	vide Rodrigues	cut	Afonso Rodrigues	Leis	NaN	1657-11-02	__see record not found manualy__
5. 221241	Elvas	Afonso Sardinha	vide Afonso Vaz Sardinha	rep	Afonso Vaz Sardinha	Cânones	Gonçalo Rodrigues	1706-10-01	__see record not found manualy__
6. 199294	Vila Real	Afonso Teixeira	vide Mendonça e Azevedo	add	Afonso Teixeira Mendonça e Azevedo	Cânones	1650-11-08
  * links with see record 214149 Afonso Teixeira de Mendonça, vide Teixeira  __vide in aka record does not match name in see record__
  * links also with 148819 Afonso Teixeira de Azevedo, vide Teixeira __vide in aka record does not match name in see record__
  * so the vide expression in 199924 should be __vide Mendonça e vide Azevedo__ to link with Afonso Teixeira de Mendonça and Afonso Teixeira de Azevedo
7. 187458	Santa Olaia	Agostinho Brandão	vide Pinto	add	Agostinho Brandão Pinto	Direito (Cânones ou Leis)	NaN	1688-01-21	__matching record is aka, not see__
  * links with 245344 which is not a see record nor a vide record. __187458 and 245344 are dupicates__ __matching record is aka, not see__
8.  152599	Lisboa	Agostinho José de Carvalho vide	Agostinho José de Figueiredo Carvalho e Oliveira	Leis	1791-10-27
   *  links with 174123 Agostinho José de Figueiredo Carvalho e Oliveira, but it is not a see record. __152599 and 174213 are duplicates__ __matching record is aka, not see__
9. 149805	Lisboa Aires Correia Baharém vide	Correia	cut	Aires Correia	Teologia	pai Manuel Correia de Menezes	1594-10-18 
  * links with see record 196492 slight variation in the vide expression __variation in the vide name (Baharém/Baharem)__


10. 192844	Ovfmatsen	Alberto Chremer	vide Cremert	add	Alberto Chremer Cremert	Cânones	
   *  links with 207263 Alberto Cremert no vide expression __matching record is aka, not see__ __duplicate__


### Matched records

Sucessive lines are matches. Sometime more than one line per record when there is more than one geographic name or faculty.

In [132]:
vide_plus.loc[matched_index].sort_values(['nome-geografico','sort_key','uc-entrada'])[['uc-entrada','nome-geografico','name','lookup','nome-pai','faculdade','faculdade.obs','match_obs']].head(30)

Unnamed: 0,uc-entrada,nome-geografico,name,lookup,nome-pai,faculdade,faculdade.obs,match_obs
202622,0000-00-00,Constância,Fernão de Álvares Temudo,Fernão de Álvares,,,,
144388,1573-11-13,Constância,Fernão de Álvares,Fernão de Álvares Temudo,Pantaleão Rosado,Cânones,Cânones,
171438,0000-00-00,Constância,João da Veiga Mendes Nogueira,João da Veiga,,Leis,Leis,
213495,1757-10-01,Constância,João da Veiga,João da Veiga Mendes Nogueira,,Leis,Leis,
214577,0000-00-00,Constância,Julião Velho,Julião Velho Almeida,,,,
143676,1663-07-10,Constância,Julião Velho de Almeida,Julião Velho,,Cânones,Cânones,
203159,0000-00-00,Constância,Manuel da Costa,Manuel da Costa Oliveira,Manuel da Costa,,,
176277,1672-01-24,Constância,Manuel da Costa de Oliveira,Manuel da Costa,Manuel da Costa,Cânones,Cânones,
243351,0000-00-00,Constância,Manuel Ribeiro Pinhão,Manuel Ribeiro,Pedro Ribeiro,Cânones,Cânones,
165844,1623-10-09,Constância,Manuel Ribeiro,Manuel Ribeiro Pinhão,Pedro Ribeiro,Cânones,. Cânones,


# Save current stats on cross reference processing

This allows later in "git" see how the situation evolves.

In [135]:
from datetime import datetime

# save status to file
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
fname = '../inferences/cross-references/015-remissivas_info.txt'

with open(fname,'w+') as f:
    print(f"Cross references, current stats: {current_time}",file=f)
    print(file=f)

    vide_plus.info(buf=f)

    print(match_info.infer_objects(copy=False).fillna(""), file=f)






### Focus on specific records

Use this to check specific records.


Define a column and a pattern to search for. Pattern is a _regular expression_.
For more information on the patterns and alternative searches see https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html

Examples:

* column='nome', pattern='André': 'André' anywhere in column 'name' (will also get 166395 Manuel André Ribeiro)
* column='nome', pattern='André$': names ending in 'André' (e.g. 146664 Manuel André )
* column='nome', pattern='^André': names starting with 'André'
* column='nome', pattern='André|Joaquim': names containing either 'André' or 'Joaquim'
* column='naturalidade', pattern='Alcácer|Alcacer':  naturalidade contains either 'Alcácer' or 'Alcacer'

In [81]:
import pandas as pd
column = 'name'
pattern = '^Estevão'
pd.set_option('display.max_rows',1000)
#                                      na=False prevents errors column is missing
vide_selection = vide_plus[vide_plus[column].str.contains(pattern,na=False)]
vide_selection.sort_values([column]).head(10)

Unnamed: 0,name,sex,nome-vide,nome-geografico,faculdade,faculdade.date,faculdade.obs,nome-pai,uc-entrada,uc-saida,...,loookup,vide_type,lookup,name_sp,lookup_sp,sort_key,match_error,match_obs,match,match_type
200659,Estevão Afonso da Costa,m,Afonso,Bragança,Cânones,1664-10-15,Cânones,,1664-10-15,1671-07-03,...,,cut,Estevão Afonso,Estevão Afonso Costa,Estevão Afonso,Estevão Afonso-Estevão Afonso Costa,False,,,
276913,Estevão Anacleto Duarte,m,Estevão Anacleto,Vila Viçosa,Leis,0000-00-00,Leis,António Duarte,0000-00-00,0000-00-00,...,,cut,Estevão Anacleto,Estevão Anacleto Duarte,Estevão Anacleto,Estevão Anacleto-Estevão Anacleto Duarte,False,,,
236147,Estevão Barreto de Magalhães e Menezes,m,Estevão de Magalhães e Menezes,Braga,,,,,0000-00-00,0000-00-00,...,,rep,Estevão de Magalhães e Menezes,Estevão Barreto Magalhães Menezes,Estevão Magalhães Menezes,Estevão Barreto Magalhães Menezes-Estevão Maga...,False,,236150.0,see-aka
129044,Estevão Caetano,m,de Araújo Rangel,Porto,Cursos jurídicos (Cânones ou Leis),1724-10-01,Faculdade inferida,,1724-10-01,1725-10-01,...,,add,Estevão Caetano de Araújo Rangel,Estevão Caetano,Estevão Caetano Araújo Rangel,Estevão Caetano-Estevão Caetano Araújo Rangel,False,,,
134106,Estevão Caetano de Araújo Rangel,m,Caetanao,Porto,,,,,0000-00-00,0000-00-00,...,,add,Estevão Caetano de Araújo Rangel Caetanao,Estevão Caetano Araújo Rangel,Estevão Caetano Araújo Rangel Caetanao,Estevão Caetano Araújo Rangel-Estevão Caetano ...,False,,,
133510,Estevão Cardoso,m,da Silveira,Vila Viçosa,Leis,1615-10-02,Leis,,1615-10-02,1623-05-24,...,,add,Estevão Cardoso da Silveira,Estevão Cardoso,Estevão Cardoso Silveira,Estevão Cardoso-Estevão Cardoso Silveira,False,,230444.0,aka-see
230444,Estevão Cardoso da Silveira,m,Cardoso,Vila Viçosa,,,,,0000-00-00,0000-00-00,...,,cut,Estevão Cardoso,Estevão Cardoso Silveira,Estevão Cardoso,Estevão Cardoso-Estevão Cardoso Silveira,False,,133510.0,see-aka
152876,Estevão Dias,m,Pereira,Cascais,,,,,0000-00-00,0000-00-00,...,,add,Estevão Dias Pereira,Estevão Dias,Estevão Dias Pereira,Estevão Dias-Estevão Dias Pereira,False,,233458.0,see-aka
233458,Estevão Dias Pereira,m,Dias,Cascais,Cânones,1619-10-24,Cânones,Álvaro Pereira,1619-10-24,1623-10-03,...,,cut,Estevão Dias,Estevão Dias Pereira,Estevão Dias,Estevão Dias-Estevão Dias Pereira,False,,152876.0,aka-see
293823,Estevão Falcão Cota,m,Menezes,,,,,,0000-00-00,0000-00-00,...,,add,Estevão Falcão Cota Menezes,Estevão Falcão Cota,Estevão Falcão Cota Menezes,Estevão Falcão Cota-Estevão Falcão Cota Menezes,False,,,



## Sorted lists

In [82]:
vide_selection[['nome-geografico','match','match_type','match_obs','name','nome-vide','faculdade','uc-entrada','uc-saida']].sort_values(['nome-geografico','name','uc-entrada']).head(20)

Unnamed: 0,nome-geografico,match,match_type,match_obs,name,nome-vide,faculdade,uc-entrada,uc-saida
173368,Abrantes,,,,Estevão Lopes Galvão,Lopes,,0000-00-00,0000-00-00
149184,Arrifana de Sousa,,,,Estevão de Freitas e Azevedo,Freitas,,0000-00-00,0000-00-00
206520,Beco,,,,Estevão Mendes,Vasconcelos,Cânones,0000-00-00,0000-00-00
233530,Beja,,,,Estevão Lopes Pereira,Lopes,,0000-00-00,0000-00-00
236147,Braga,236150.0,see-aka,,Estevão Barreto de Magalhães e Menezes,Estevão de Magalhães e Menezes,,0000-00-00,0000-00-00
236150,Braga,236147.0,aka-see,,Estevão de Magalhães e Menezes,Estevão Barreto de Magalhães e Menezes,Cânones,1738-10-01,1740-10-01
200659,Bragança,,,,Estevão Afonso da Costa,Afonso,Cânones,1664-10-15,1671-07-03
309989,Brasil,,,,Estevão Mauricio de Velasco e Tavora,Estevão Mauricio de Velasco Molina,Cânones,0000-00-00,0000-00-00
286832,Brasil,,,,Estevão Maurício de Velasco Molina,Estevão Maurício de Velasco e Távora,Cânones,1761-11-05,1761-11-05
198947,Brasil,,,,Estevão Maurício de Velasco e Távora,Velasco,,0000-00-00,0000-00-00


### Examine individual records

In [137]:
from timelinknb import pperson,Session
pd.set_option('display.max_rows',250)

with db.session() as session:
    session.begin()
    pperson(219458)


UnboundExecutionError: Could not locate a bind configured on mapper Mapper[Person(persons)], SQL expression or this Session.

### Examine groups of records in a single chrological frame



In [None]:
from timelinknb.pandas import display_group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

dup_ids = ['215193','182145']

display_group_attributes(dup_ids,
                             header_cols=['uc-entrada','naturalidade','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

Unnamed: 0,id,uc-entrada,naturalidade,faculdade,nome-pai
0,182145,1604-10-11,Abrantes,Cânones,Manuel Machado
1,182145,1604-10-11,Abrantes,Leis,Manuel Machado
2,215193,1604-10-11,Abrantes,Leis,Manuel Machado


Unnamed: 0,date,id,type,value,attr_obs
0,1604-10-11,182145,faculdade,Cânones,Faculdade corrigida
1,1604-10-11,182145,faculdade,Leis,Faculdade corrigida
2,1604-10-11,215193,faculdade,Leis,Leis
3,1604-10-11,182145,faculdade-original,Leis,
4,1604-10-11,182145,instituta,1604-10-11,11.10.1604 1604-10-11
5,1604-10-11,215193,instituta,1604-10-11,1604.10.11 1604-10-11
6,1604-10-11,182145,naturalidade,Abrantes,
7,1604-10-11,215193,naturalidade,Abrantes,
8,1604-10-11,182145,nome,Manuel Rodrigues,"Manuel Rodrigues Machado, vide Rodrigues"
9,1604-10-11,215193,nome,Manuel Rodrigues,
