**[PT]** Português

---

**[EN]** English

# Explorando o ficheiro de autoridades da Biblioteca Nacional de Portugal

---

# Exploring the authority records of the Portuguese National Library




## Referências

--

## References

* https://dados.gov.pt/pt/datasets/catalogo-bnp-registos-de-autoridade/ (download)
* https://purl.pt/11442/1/


Note: In VIAF co-authors sometime show, with a BNP ID. See https://viaf.org/viaf/61933031/viaf.xml . Where does that information comes from?

Load wikidata information see (995-linked_data_sandbox.ipynb)[995-linked_data_sandbox.ipynb]

In [1]:
import pandas as pd

students = pd.read_csv('../inferences/wikidata/students_wikidata_matched.csv')

In [2]:
students.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 691 entries, 0 to 690
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    691 non-null    int64  
 1   wikidata      691 non-null    object 
 2   name          691 non-null    object 
 3   alias         464 non-null    object 
 4   bnp_id        458 non-null    float64
 5   naturalidade  654 non-null    object 
 6   placeID       654 non-null    object 
 7   longitude     652 non-null    float64
 8   latitude      652 non-null    float64
 9   birth_date    691 non-null    object 
 10  fauc_id       257 non-null    float64
dtypes: float64(4), int64(1), object(6)
memory usage: 59.5+ KB


In [3]:
print("Número de autores na BN:",len(students.bnp_id.unique()))

Número de autores na BN: 259


## Ficheiros de autoridade disponíveis localmente

---

## Authority records available locally


Download from https://dados.gov.pt/pt/datasets/catalogo-bnp-registos-de-autoridade/

into `extras/bnp/catalogoautoridades.marcxchange`

In [1]:
from pathlib import Path

path = '../extras/bnp/catalogoautoridades.marcxchange'
authority_records = [f for f in list(Path(path).rglob('*.xml'))]
print([f.name for f in authority_records])


['authorities_1723900_to_1844400.xml', 'authorities_456204_to_913891.xml', 'authorities_1290322_to_1444155.xml', 'authorities_1_to_100936.xml', 'authorities_1444156_to_1586439.xml', 'authorities_1586454_to_1723898.xml', 'authorities_1152445_to_1290321.xml', 'authorities_100937_to_184478.xml', 'authorities_913896_to_1152444.xml', 'authorities_264875_to_456203.xml', 'authorities_184479_to_264874.xml']


In [4]:
!pip install lxml



In [2]:
from lxml import etree

auth_file = etree.parse(authority_records[0])

In [3]:
recs = auth_file.getroot()
recs.tag

'{info:lc/xmlns/marcxchange-v1}collection'

In [9]:
xsl_file = '../extras/bnp/visbd-fauc.xsl'
xsl = etree.parse(xsl_file) 


marxchange_ns = "info:lc/xmlns/marcxchange-v1"
nsmap = {None: marxchange_ns}
for rec in recs[:120]:
    cf001 = rec.find("controlfield[@tag = '001']",namespaces=nsmap)

    bnp_id = cf001.text
    url = f"http://urn.bn.pt/bibliografia/unimarc/xml?id={bnp_id}"

    country = rec.find("datafield[@tag = '102']/subfield[@code='a']",namespaces=nsmap)
    if country is not None:
        if country.text == 'PT':
            # Portuguese author
            dates = rec.find("datafield[@tag = '200']/subfield[@code='f']",namespaces=nsmap)
            if dates is not None and len(dates.text)>=4:
                century = dates.text[:2]
                try:
                    icentury = int(century)
                except ValueError as ve:
                    print(f"Could not understand date: |{dates.text}| on record id: {bnp_id}")
                    continue
                    
                if icentury < 19:
                    print("Record of Portuguese author pre-1900:")
                    for cf in rec.findall("controlfield",namespaces=nsmap):
                        print(cf.get('tag'),cf.text)
                    print("...")

                    tags=['101','102','123','160','200','300','305','310','320','330','340','356']
                    for tag in tags: 
                        df = rec.find(f"datafield[@tag = '{tag}']",namespaces=nsmap)
                        if df is not None:
                            print(f"{df.get('tag'):3s} {df.get('ind1'):1s}{df.get('ind2'):1s}")
                            for sf in list(df):
                                print(f"   ${sf.get('code')} {sf.text}",)

                    records = etree.parse(url)
                    transform = etree.XSLT(xsl)

                    print(str(transform(records)))
                    pass
                    




Record of Portuguese author pre-1900:
001 1724018
...
102   
   $a PT
200  1
   $a Sarmento,
   $b Pedro Mariz de Sousa,
   $f 1745-1822

Preceitos de construcção de navios e da sua mastriação e nomenclatura portugueza... / Pedro de Mariz de Souza Sarmento. - Lisboa : na Offic. de Antonio Gomes, 1789. - 187 p. ; 15 cm. - Na p. de tít.: Com licenca da Real Meza da Commis. Geral sobre o Exame, e Censura dos Livros
http://id.bnportugal.gov.pt/bib/catbnp/637223

Memorias Das Gloriozas, e Immortaes Acções das Armas Anglo-Luzas, nas Comquistas das duas importantes Praças de Rodrigo, e Badajoz, Commandadas pello Marechal General, o Grande Artur Welleslei, Cavalheiro da muito honrada Ordem do Banho, Gram Cruz da Ordem da Torre, e Espada, e da de S[ão] Fernando das Hespanhas; Lord, Cond'Wellington; Barão do Douro, Visconde de Talavera, conde do Vimeiro, Marques de Torres Vedras, Duque de Cidade Rodrigo, e Grande das Hespanhas da primeira classe; dedicadas e so offerecidas, ao seu mais intimo e 